All news with #training data leakage tag

Thu, November 20, 2025

CrowdStrike: Political Triggers Reduce AI Code Security

#AI Security #Open-Weight Models #DeepSeek #CrowdStrike #Training Data Leakage #Dataset Integrity #Safety Guardrails

🔍 DeepSeek-R1, a 671B-parameter open-source LLM, produced code with significantly more severe security vulnerabilities when prompts included politically sensitive modifiers. CrowdStrike found baseline vulnerable outputs at 19%, rising to 27.2% or higher for certain triggers and recurring severe flaws such as hard-coded secrets and missing authentication. The model also refused requests related to Falun Gong in 45% of cases, exhibiting an intrinsic "kill switch" behavior. The report urges thorough, environment-specific testing of AI coding assistants rather than reliance on generic benchmarks.

Mon, November 17, 2025

When Romantic AI Chatbots Can't Keep Your Secrets Safe

#AI Security #Data Leak #AI Data Leakage #Training Data Leakage

🤖 AI companion apps can feel intimate and conversational, but many collect, retain, and sometimes inadvertently expose highly sensitive information. Recent breaches — including a misconfigured Kafka broker that leaked hundreds of thousands of photos and millions of private conversations — underline real dangers. Users should avoid sharing personal, financial or intimate material, enable two-factor authentication, review privacy policies, and opt out of data retention or training when possible. Parents should supervise teen use and insist on robust age verification and moderation.

Mon, November 10, 2025

65% of Top Private AI Firms Exposed Secrets on GitHub

#AI Security #Hardcoded Secrets #Token Leakage #Key Leakage #Training Data Leakage #Weights & Biases #Hugging Face #Disclosure

🔒 A Wiz analysis of 50 private companies from the Forbes AI 50 found that 65% had exposed verified secrets such as API keys, tokens and credentials across GitHub and related repositories. Researchers employed a Depth, Perimeter and Coverage approach to examine commit histories, deleted forks, gists and contributors' personal repos, revealing secrets standard scanners often miss. Affected firms are collectively valued at over $400bn.

Wed, October 29, 2025

BSI Warns of Growing AI Governance Gap in Business

#AI Security #AI Risk Management #Safety Guardrails #Training Data Leakage #Dataset Integrity #Prompt Logs

⚠️ The British Standards Institution warns of a widening AI governance gap as many organisations accelerate AI adoption without adequate controls. An AI-assisted review of 100+ annual reports and two polls of 850+ senior leaders found strong investment intent but sparse governance: only 24% have a formal AI program and 47% use formal processes. The report highlights weaknesses in incident management, training-data oversight and inconsistent approaches across markets.

Wed, September 17, 2025

Quarter of UK and US Firms Hit by Data Poisoning Attacks

#Data Poisoning #Training Data Leakage #Shadow AI #AI Security

🛡️ New IO research reports that 26% of surveyed UK and US organisations have experienced data poisoning, and 37% observe employees using generative AI tools without permission. The third annual State of Information Security Report highlights rising concern around AI-generated phishing, misinformation, deepfakes and shadow AI. Despite the risks, most respondents say they feel prepared and are adopting acceptable use policies to curb unsanctioned tool use.

Thu, September 11, 2025

AI-Powered Browsers: Security and Privacy Risks in 2026

#AI Security #Agentic AI #Prompt Injection #AI Data Leakage #Training Data Leakage #PII #Data Exfil via Tools

🔒 An AI-integrated browser embeds large multimodal models into standard web browsers, allowing agents to view pages and perform actions—opening links, filling forms, downloading files—directly on a user’s device. This enables faster, context-aware automation and access to subscription or blocked content, but raises substantial privacy and security risks, including data exfiltration, prompt-injection and malware delivery. Users should demand features like per-site AI controls, choice of local models, explicit confirmation for sensitive actions, and OS-level file restrictions, though no browser currently implements all these protections.

Wed, September 3, 2025

Managing Shadow AI: Three Practical Corporate Policies

#AI Data Leakage #AI Governance #AI Security #Prompt Injection #Training Data Leakage

🔒 The MIT report "The GenAI Divide: State of AI in Business 2025" exposes a pervasive shadow AI economy—90% of employees use personal AI while only 40% of organizations buy LLM subscriptions. This article translates those findings into three realistic policy paths: a complete ban, unrestricted use with hygiene controls, and a balanced, role-based model. Each option is paired with concrete technical controls (DLP, NGFW, CASB, EDR), organizational steps, and enforcement measures to help security teams align risk management with real-world employee behaviour.

Wed, September 3, 2025

EMBER2024: Advancing ML Benchmarks for Evasive Malware

#AI Security #CrowdStrike #Model Evaluation Coverage #Dataset Integrity #Training Data Leakage

🛡️ The EMBER2024 release modernizes the popular EMBER malware benchmark by providing metadata, labels, and computed features for over 3.2 million files spanning six file formats. It supplies a 6,315-sample challenge set of initially evasive malware, updated feature extraction code using pefile, and supplemental raw bytes and disassembly for 16.3 million functions. The package also includes source code to reproduce feature calculation, labeling, and dataset construction so researchers can replicate and extend benchmarks.

Thu, August 28, 2025

AI Crawler Traffic: Purpose and Industry Breakdown

#AI Governance #AI Security #Cloudflare #Cloudflare Bot Management #Content Provenance #Dataset Integrity #Retrieval-Augmented Generation #Training Data Leakage

🔍 Cloudflare Radar introduces industry-focused AI crawler insights and a new crawl purpose selector that classifies bots as Training, Search, User action, or Undeclared. The update surfaces top bot trends, crawl-to-refer ratios, and per-industry views so publishers can see who crawls their content and why. Data shows Training drives nearly 80% of crawl requests, while User action and Undeclared exhibit smaller, cyclical patterns.