All news with #hugging face tag
Thu, November 13, 2025
Google Cloud expands Hugging Face support for AI developers
🤝 Google Cloud and Hugging Face are deepening their partnership to speed developer workflows and strengthen enterprise model deployments. A new gateway will cache Hugging Face models and datasets on Google Cloud so downloads take minutes, not hours, across Vertex AI and Google Kubernetes Engine. The collaboration adds native TPU support for open models and integrates Google Cloud’s threat intelligence and Mandiant scanning for models served through Vertex AI.
Tue, November 11, 2025
AI startups expose API keys on GitHub, risking models
🔐 New research by cloud security firm Wiz found verified secret leaks in 65% of the Forbes AI 50, with API keys and access tokens exposed on GitHub. Some credentials were tied to vendors such as Hugging Face, Weights & Biases, and LangChain, potentially granting access to private models, training data, and internal details. Nearly half of Wiz’s disclosure attempts failed or received no response. The findings highlight urgent gaps in secret management and DevSecOps practices.
Mon, November 10, 2025
65% of Top Private AI Firms Exposed Secrets on GitHub
🔒 A Wiz analysis of 50 private companies from the Forbes AI 50 found that 65% had exposed verified secrets such as API keys, tokens and credentials across GitHub and related repositories. Researchers employed a Depth, Perimeter and Coverage approach to examine commit histories, deleted forks, gists and contributors' personal repos, revealing secrets standard scanners often miss. Affected firms are collectively valued at over $400bn.
Thu, November 6, 2025
AI-Powered Malware Emerges: Google Details New Threats
🛡️ Google Threat Intelligence Group (GTIG) reports that cybercriminals are actively integrating large language models into malware campaigns, moving beyond mere tooling to generate, obfuscate, and adapt malicious code. GTIG documents new families — including PROMPTSTEAL, PROMPTFLUX, FRUITSHELL, and PROMPTLOCK — that query commercial APIs to produce or rewrite payloads and evade detection. Researchers also note attackers use social‑engineering prompts to trick LLMs into revealing sensitive guidance and that underground marketplaces increasingly offer AI-enabled “malware-as-a-service,” lowering the bar for less skilled threat actors.
Thu, November 6, 2025
Google Warns: AI-Enabled Malware Actively Deployed
⚠️ Google’s Threat Intelligence Group has identified a new class of AI-enabled malware that leverages large language models at runtime to generate and obfuscate malicious code. Notable families include PromptFlux, which uses the Gemini API to rewrite its VBScript dropper for persistence and lateral spread, and PromptSteal, a Python data miner that queries Qwen2.5-Coder-32B-Instruct to create on-demand Windows commands. GTIG observed PromptSteal used by APT28 in Ukraine, while other examples such as PromptLock, FruitShell and QuietVault demonstrate varied AI-driven capabilities. Google warns this "just-in-time AI" approach could accelerate malware sophistication and democratize cybercrime.
Thu, November 6, 2025
Google: LLMs Employed Operationally in Malware Attacks
🤖 Google’s Threat Intelligence Group (GTIG) reports attackers are using “just‑in‑time” AI—LLMs queried during execution—to generate and obfuscate malicious code. Researchers identified two families, PROMPTSTEAL and PROMPTFLUX, which query Hugging Face and Gemini APIs to craft commands, rewrite source code, and evade detection. GTIG also documents social‑engineering prompts that trick models into revealing red‑teaming or exploit details, and warns the underground market for AI‑enabled crime is maturing. Google says it has disabled related accounts and applied protections.
Wed, November 5, 2025
GTIG: Threat Actors Shift to AI-Enabled Runtime Malware
🔍 Google Threat Intelligence Group (GTIG) reports an operational shift from adversaries using AI for productivity to embedding generative models inside malware to generate or alter code at runtime. GTIG details “just-in-time” LLM calls in families like PROMPTFLUX and PROMPTSTEAL, which query external models such as Gemini to obfuscate, regenerate, or produce one‑time functions during execution. Google says it disabled abusive assets, strengthened classifiers and model protections, and recommends monitoring LLM API usage, protecting credentials, and treating runtime model calls as potential live command channels.
Wed, November 5, 2025
Cloud CISO: Threat Actors' Growing Use of AI Tools
⚠️Google's Threat Intelligence team reports a shift from experimentation to operational use of AI by threat actors, including AI-enabled malware and prompt-based command generation. GTIG highlighted PROMPTSTEAL, linked to APT28 (FROZENLAKE), which queries a Hugging Face LLM to generate scripts for reconnaissance, document collection, and exfiltration, while adopting greater obfuscation and altered C2 methods. Google disabled related assets, strengthened model classifiers and safeguards with DeepMind, and urges defenders to update threat models, monitor anomalous scripting and C2, and incorporate threat intelligence into model- and classifier-level protections.
Thu, October 23, 2025
Hugging Face and VirusTotal: Integrating Security Insights
🔒 VirusTotal and Hugging Face have announced a collaboration to surface security insights directly within the Hugging Face platform. When browsing model files, datasets, or related artifacts, users will now see multi‑scanner results including VirusTotal detections and links to public reports so potential risks can be reviewed before downloading. VirusTotal is also enhancing its analysis portfolio with AI-driven tools such as Code Insight and format‑aware scanners (picklescan, safepickle, ModelScan) to highlight unsafe deserialization flows and other risky patterns. The integration aims to increase visibility across the AI supply chain and help researchers, developers, and defenders build more secure models and workflows.
Tue, October 14, 2025
Microsoft launches ExCyTIn-Bench to benchmark AI security
🛡️ Microsoft released ExCyTIn-Bench, an open-source benchmarking tool to evaluate how well AI systems perform realistic cybersecurity investigations. It simulates a multistage Azure SOC using 57 Microsoft Sentinel log tables and measures multistep reasoning, tool usage, and evidence synthesis. The benchmark offers fine-grained, actionable metrics for CISOs, product owners, and researchers.
Fri, October 3, 2025
Dataproc ML library: Connect Spark to Gemini and Vertex
🔗 Google has released an open-source Python library, Dataproc ML, to streamline running ML and generative-AI inference from Apache Spark on Dataproc. The library uses a SparkML-style builder pattern so users can configure a model handler (for example, GenAiModelHandler) and call .transform() to apply Gemini or other Vertex AI models directly to DataFrames. It also supports loading PyTorch and TensorFlow model artifacts from GCS for large-scale batch inference and includes performance optimizations such as vectorized data transfer, connection reuse, and automatic retry/backoff.
Thu, September 25, 2025
Enabling AI Sovereignty Through Choice and Openness Globally
🌐 Cloudflare argues that AI sovereignty should mean choice: the ability for nations to control data, select models, and deploy applications without vendor lock-in. Through its distributed edge network and serverless Workers AI, Cloudflare promotes accessible, low-cost deployment and inference close to users. The company hosts regional open-source models—India’s IndicTrans2, Japan’s PLaMo-Embedding-1B, and Singapore’s SEA-LION v4-27B—and offers an AI Gateway to connect diverse models. Open standards, interoperability, and pay-as-you-go economics are presented as central to resilient national AI strategies.
Tue, September 16, 2025
Gemini and Open-Source Text Embeddings Now in BigQuery ML
🚀 Google expanded BigQuery ML to generate embeddings from Gemini and over 13,000 open-source text-embedding models via Hugging Face, all callable with simple SQL. The post summarizes model tiers to help teams trade off quality, cost, and scalability, and introduces Gemini's Tokens Per Minute (TPM) quota for throughput control. It shows a practical workflow to deploy OSS models to Vertex AI endpoints, run ML.GENERATE_EMBEDDING for batch jobs, and undeploy to minimize idle costs, plus a Colab tutorial and cost/scale guidance.
Wed, September 3, 2025
Model Namespace Reuse: Supply-Chain RCE in Cloud AI
🔒 Unit 42 describes a widespread flaw called Model Namespace Reuse that lets attackers reclaim abandoned Hugging Face Author/ModelName namespaces and distribute malicious model code. The technique can lead to remote code execution and was demonstrated against major platforms including Google Vertex AI and Azure AI Foundry, as well as thousands of open-source projects. Recommended mitigations include version pinning, cloning models to trusted storage, and scanning repositories for reusable references.
Wed, August 27, 2025
Cloudflare's Edge-Optimized LLM Inference Engine at Scale
⚡ Infire is Cloudflare’s new, Rust-based LLM inference engine built to run large models efficiently across a globally distributed, low-latency network. It replaces Python-based vLLM in scenarios where sandboxing and dynamic co-hosting caused high CPU overhead and reduced GPU utilization, using JIT-compiled CUDA kernels, paged KV caching, and fine-grained CUDA graphs to cut startup and runtime cost. Early benchmarks show up to 7% lower latency on H100 NVL hardware, substantially higher GPU utilization, and far lower CPU load while powering models such as Llama 3.1 8B in Workers AI.
Mon, August 25, 2025
vLLM Performance Tuning for xPU Inference Configs Guide
⚙️ This guide from Google Cloud authors Eric Hanley and Brittany Rockwell explains how to tune vLLM deployments for xPU inference, covering accelerator selection, memory sizing, configuration, and benchmarking. It shows how to gather workload parameters, estimate HBM/VRAM needs (example: gemma-3-27b-it ≈57 GB), and run vLLM’s auto_tune to find optimal gpu_memory_utilization and throughput. The post compares GPU and TPU options and includes practical troubleshooting tips, cost analyses, and resources to reproduce benchmarks and HBM calculations.