All news with #open-weight models tag
Wed, August 27, 2025
ESET Finds PromptLock: First AI-Powered Ransomware
🔒 ESET researchers have identified PromptLock, described as the first known AI-powered ransomware implant, in an August 2025 report. The Golang sample (Windows and Linux variants) leverages a locally hosted gpt-oss:20b model via the Ollama API to dynamically generate malicious Lua scripts. Those cross-platform scripts perform enumeration, selective exfiltration and encryption using SPECK 128-bit, but ESET characterises the sample as a proof-of-concept rather than an active campaign.
Wed, August 27, 2025
Cloudflare's Edge-Optimized LLM Inference Engine at Scale
⚡ Infire is Cloudflare’s new, Rust-based LLM inference engine built to run large models efficiently across a globally distributed, low-latency network. It replaces Python-based vLLM in scenarios where sandboxing and dynamic co-hosting caused high CPU overhead and reduced GPU utilization, using JIT-compiled CUDA kernels, paged KV caching, and fine-grained CUDA graphs to cut startup and runtime cost. Early benchmarks show up to 7% lower latency on H100 NVL hardware, substantially higher GPU utilization, and far lower CPU load while powering models such as Llama 3.1 8B in Workers AI.
Wed, August 20, 2025
Logit-Gap Steering Reveals Limits of LLM Alignment
⚠️ Unit 42 researchers Tony Li and Hongliang Liu introduce Logit-Gap Steering, a new framework that exposes how alignment training produces a measurable refusal-affirmation logit gap rather than eliminating harmful outputs. Their paper demonstrates efficient short-path suffix jailbreaks that achieved high success rates on open-source models including Qwen, LLaMA, Gemma and the recently released gpt-oss-20b. The findings argue that internal alignment alone is insufficient and recommend a defense-in-depth approach with external safeguards and content filters.
Mon, August 18, 2025
Bedrock Batch Inference: Claude Sonnet 4 and GPT-OSS
🚀 Amazon Bedrock now supports Batch inference for Anthropic Claude Sonnet 4 and OpenAI GPT-OSS (120B, 20B), enabling asynchronous processing of large workloads at approximately 50% of on-demand inference cost. The update targets bulk scenarios such as document analysis, large-scale summarization, content generation, and structured data extraction, and is optimized to deliver higher overall batch throughput on these newer models. Batch progress and workload metrics — including pending and processed records, tokens per minute, and Claude-specific pending tokens — are exposed at the AWS account level via Amazon CloudWatch.
Tue, August 12, 2025
The AI Fix Episode 63: Robots, GPT-5 and Ethics Debate
🎧 In episode 63 of The AI Fix, hosts Graham Cluley and Mark Stockley dissect a wide range of AI developments and controversies. Topics include Unitree Robotics referencing Black Mirror to market its A2 robot dog, concerns over shared ChatGPT conversations appearing in Google, and OpenAI releasing gpt-oss, its first open-weight model since GPT-2. The show also examines ethical issues around AI-created avatars of deceased individuals and separates the hype from the reality of GPT-5 claims.