Tag Banner

All news with #open-weight models tag

Thu, September 25, 2025

Enabling AI Sovereignty Through Choice and Openness Globally

🌐 Cloudflare argues that AI sovereignty should mean choice: the ability for nations to control data, select models, and deploy applications without vendor lock-in. Through its distributed edge network and serverless Workers AI, Cloudflare promotes accessible, low-cost deployment and inference close to users. The company hosts regional open-source models—India’s IndicTrans2, Japan’s PLaMo-Embedding-1B, and Singapore’s SEA-LION v4-27B—and offers an AI Gateway to connect diverse models. Open standards, interoperability, and pay-as-you-go economics are presented as central to resilient national AI strategies.

read more →

Thu, September 18, 2025

Amazon Bedrock Adds Four Qwen3 Open-Weight Models Now

🤖 Amazon Web Services added four Qwen3 open-weight foundation models to Amazon Bedrock as fully managed, serverless offerings. The lineup—Qwen3-Coder-480B-A35B-Instruct, Qwen3-Coder-30B-A3B-Instruct, Qwen3-235B-A22B-Instruct-2507, and Qwen3-32B—covers both dense and Mixture-of-Experts (MoE) architectures. The coder variants specialize in agentic coding, function calling, and tool use, while the 235B and 32B models provide general reasoning and efficient dense computation. These models are available now across multiple AWS regions, enabling developers to build advanced AI applications without managing infrastructure.

read more →

Thu, September 18, 2025

DeepSeek-V3.1 Available as Fully Managed in Bedrock

🔍 DeepSeek-V3.1 is now available as a fully managed foundation model in Amazon Bedrock, offering an open-weight option designed for enterprise deployment. The model supports a selectable 'thinking' mode for step-by-step analysis and a faster non-thinking mode for quicker replies, with improved multilingual accuracy and reduced hallucinations. Enhanced tool-calling, transparent reasoning, and strong coding and analytical performance make it well suited for building AI agents, automating workflows, and tackling complex technical tasks. DeepSeek-V3.1 is available in US West (Oregon), Asia Pacific (Tokyo, Mumbai), and Europe (London, Stockholm).

read more →

Thu, September 18, 2025

AWS Bedrock Adds OpenAI Open‑Weight Models in Eight Regions

🚀 AWS has expanded availability of OpenAI open weight models on AWS Bedrock to eight additional AWS Regions worldwide. The update brings the models to US East (N. Virginia), Asia Pacific (Tokyo, Mumbai), Europe (Stockholm, Ireland, London, Milan) and South America (São Paulo), alongside existing US West (Oregon) support. This broader footprint aims to lower latency, improve model performance and help customers meet data residency requirements. To get started, use the Amazon Bedrock console or consult the documentation.

read more →

Tue, September 16, 2025

Gemini and Open-Source Text Embeddings Now in BigQuery ML

🚀 Google expanded BigQuery ML to generate embeddings from Gemini and over 13,000 open-source text-embedding models via Hugging Face, all callable with simple SQL. The post summarizes model tiers to help teams trade off quality, cost, and scalability, and introduces Gemini's Tokens Per Minute (TPM) quota for throughput control. It shows a practical workflow to deploy OSS models to Vertex AI endpoints, run ML.GENERATE_EMBEDDING for batch jobs, and undeploy to minimize idle costs, plus a Colab tutorial and cost/scale guidance.

read more →

Thu, August 28, 2025

Gemini Available On-Premises with Google Distributed Cloud

🚀 Gemini on Google Distributed Cloud (GDC) is now generally available for customers, bringing Google’s advanced Gemini models on‑premises with GA for air‑gapped deployments and a connected preview. The solution provides managed Gemini endpoints with zero‑touch updates, automatic load balancing and autoscaling, and integrates with Vertex AI and preview agents. It pairs Gemini 2.5 Flash and Pro with NVIDIA Hopper and Blackwell accelerators and includes audit logging, access controls, and support for Confidential Computing (Intel TDX and NVIDIA) to meet strict data residency, sovereignty, and compliance requirements.

read more →

Wed, August 27, 2025

AI-Generated Ransomware 'PromptLock' Uses OpenAI Model

🔒 ESET disclosed a new proof-of-concept ransomware called PromptLock that uses OpenAI's gpt-oss:20b model via the Ollama API to generate malicious Lua scripts in real time. Written in Golang, the strain produces cross-platform scripts that enumerate files, exfiltrate selected data, and encrypt targets using SPECK 128-bit. ESET warned that AI-generated scripts can vary per execution, complicating detection and IoC reuse.

read more →

Wed, August 27, 2025

ESET Finds PromptLock: First AI-Powered Ransomware

🔒 ESET researchers have identified PromptLock, described as the first known AI-powered ransomware implant, in an August 2025 report. The Golang sample (Windows and Linux variants) leverages a locally hosted gpt-oss:20b model via the Ollama API to dynamically generate malicious Lua scripts. Those cross-platform scripts perform enumeration, selective exfiltration and encryption using SPECK 128-bit, but ESET characterises the sample as a proof-of-concept rather than an active campaign.

read more →

Wed, August 27, 2025

Cloudflare's Edge-Optimized LLM Inference Engine at Scale

⚡ Infire is Cloudflare’s new, Rust-based LLM inference engine built to run large models efficiently across a globally distributed, low-latency network. It replaces Python-based vLLM in scenarios where sandboxing and dynamic co-hosting caused high CPU overhead and reduced GPU utilization, using JIT-compiled CUDA kernels, paged KV caching, and fine-grained CUDA graphs to cut startup and runtime cost. Early benchmarks show up to 7% lower latency on H100 NVL hardware, substantially higher GPU utilization, and far lower CPU load while powering models such as Llama 3.1 8B in Workers AI.

read more →

Wed, August 20, 2025

Logit-Gap Steering Reveals Limits of LLM Alignment

⚠️ Unit 42 researchers Tony Li and Hongliang Liu introduce Logit-Gap Steering, a new framework that exposes how alignment training produces a measurable refusal-affirmation logit gap rather than eliminating harmful outputs. Their paper demonstrates efficient short-path suffix jailbreaks that achieved high success rates on open-source models including Qwen, LLaMA, Gemma and the recently released gpt-oss-20b. The findings argue that internal alignment alone is insufficient and recommend a defense-in-depth approach with external safeguards and content filters.

read more →

Mon, August 18, 2025

Bedrock Batch Inference: Claude Sonnet 4 and GPT-OSS

🚀 Amazon Bedrock now supports Batch inference for Anthropic Claude Sonnet 4 and OpenAI GPT-OSS (120B, 20B), enabling asynchronous processing of large workloads at approximately 50% of on-demand inference cost. The update targets bulk scenarios such as document analysis, large-scale summarization, content generation, and structured data extraction, and is optimized to deliver higher overall batch throughput on these newer models. Batch progress and workload metrics — including pending and processed records, tokens per minute, and Claude-specific pending tokens — are exposed at the AWS account level via Amazon CloudWatch.

read more →

Tue, August 12, 2025

The AI Fix Episode 63: Robots, GPT-5 and Ethics Debate

🎧 In episode 63 of The AI Fix, hosts Graham Cluley and Mark Stockley dissect a wide range of AI developments and controversies. Topics include Unitree Robotics referencing Black Mirror to market its A2 robot dog, concerns over shared ChatGPT conversations appearing in Google, and OpenAI releasing gpt-oss, its first open-weight model since GPT-2. The show also examines ethical issues around AI-created avatars of deceased individuals and separates the hype from the reality of GPT-5 claims.

read more →