Tag Banner

All news with #dataset integrity tag

Thu, November 20, 2025

CrowdStrike: Political Triggers Reduce AI Code Security

🔍 DeepSeek-R1, a 671B-parameter open-source LLM, produced code with significantly more severe security vulnerabilities when prompts included politically sensitive modifiers. CrowdStrike found baseline vulnerable outputs at 19%, rising to 27.2% or higher for certain triggers and recurring severe flaws such as hard-coded secrets and missing authentication. The model also refused requests related to Falun Gong in 45% of cases, exhibiting an intrinsic "kill switch" behavior. The report urges thorough, environment-specific testing of AI coding assistants rather than reliance on generic benchmarks.

read more →

Fri, November 14, 2025

Agent Factory Recap: Building Open Agentic Models End-to-End

🤖 This recap of The Agent Factory episode summarizes a conversation between Amit Maraj and Ravin Kumar (DeepMind) about building open-source agentic models. It highlights how agent training differs from standard ML, emphasizing trajectory-based data, a two-stage approach of supervised fine-tuning followed by reinforcement learning, and the paramount role of evaluation. Practical guidance includes defining a 50-example final exam up front and considering hybrid setups that use a powerful API like Gemini as a router alongside specialized open models.

read more →

Fri, November 7, 2025

Ericsson Secures Data Integrity with Dataplex Governance

🔒 Ericsson has implemented a global data governance framework using Dataplex Universal Catalog on Google Cloud to ensure data integrity, discoverability, and compliance across its Managed Services operation. The program standardized a business glossary, automated quality checks with incident-driven alerts, and visualized column-level lineage to support analytics, AI, and automation at scale. It balances defensive compliance with offensive innovation and embeds stewardship through Ericsson’s Data Operating Model.

read more →

Thu, November 6, 2025

Digital Health Needs Security at Its Core to Scale AI

🔒 The article argues that AI-driven digital health initiatives proved essential during COVID-19 but simultaneously exposed critical cybersecurity gaps that threaten pandemic preparedness. It warns that expansive data ecosystems, IoT devices and cloud pipelines multiply attack surfaces and that subtle AI-specific threats — including data poisoning, model inversion and adversarial inputs — can undermine public-health decisions. The author urges security by design, including zero-trust architectures, data provenance, encryption, model governance and cross-disciplinary drills so AI can deliver trustworthy, resilient public health systems.

read more →

Thu, November 6, 2025

Lessons from ERP Failures for Security Platformization

🔐 CISOs are urged to learn from 1990s ERP migrations as they evaluate vendor-led security platforms from Cisco, CrowdStrike, Microsoft, Palo Alto Networks and others. Research shows many enterprises run 40–80 discrete security tools, driving silos, integration headaches, and alert fatigue. The article warns that platformization can repeat ERP mistakes—data inconsistency, excessive customization, political resistance, and costly timelines—and recommends executive sponsorship, phased implementations, a modern data pipeline, team retraining, and process reengineering to succeed.

read more →

Wed, October 29, 2025

BSI Warns of Growing AI Governance Gap in Business

⚠️ The British Standards Institution warns of a widening AI governance gap as many organisations accelerate AI adoption without adequate controls. An AI-assisted review of 100+ annual reports and two polls of 850+ senior leaders found strong investment intent but sparse governance: only 24% have a formal AI program and 47% use formal processes. The report highlights weaknesses in incident management, training-data oversight and inconsistent approaches across markets.

read more →

Mon, October 27, 2025

Challenges and Best Practices in Internet Measurement

📊 Cloudflare explains why measuring the Internet is uniquely difficult and how rigorous methodology, ethics, and clear representation make findings reliable. An internal February 2022 Lviv traffic spike illustrates how context and complementary data can prevent misclassification of benign events as attacks. The post contrasts active and passive techniques and direct versus indirect measurement, outlines a lifecycle of curation, modeling, and validation, and stresses low-impact, ethical approaches. It concludes by inviting collaboration and continued exploration of passive measurement methods.

read more →

Mon, October 20, 2025

Agentic AI and the OODA Loop: The Integrity Problem

🛡️ Bruce Schneier and Barath Raghavan argue that agentic AIs run repeated OODA loops—Observe, Orient, Decide, Act—over web-scale, adversarial inputs, and that current architectures lack the integrity controls to handle untrusted observations. They show how prompt injection, dataset poisoning, stateful cache contamination, and tool-call vectors (e.g., MCP) let attackers embed malicious control into ordinary inputs. The essay warns that fixing hallucinations is insufficient: we need architectural integrity—semantic verification, privilege separation, and new trust boundaries—rather than surface patches.

read more →

Tue, October 14, 2025

UK Firms Lose Average $3.9M to Unmanaged AI Risk in UK

⚠️ EY polling of 100 UK firms finds that nearly all respondents (98%) experienced financial losses from AI-related risks over the past year, with an average loss of $3.9m per company. The most common issues were regulatory non-compliance, inaccurate or poor-quality training data and high energy usage affecting sustainability goals. The report highlights governance shortfalls — only 17% of C-suite leaders could identify appropriate controls — and warns about the risks posed by unregulated “citizen developer” AI activity. EY recommends adopting comprehensive responsible AI governance, targeted C-suite training and formal policies for agentic AI.

read more →

Fri, October 10, 2025

Six steps for disaster recovery and business continuity

🔒 Modernize disaster recovery and continuity with six practical steps for CISOs. Secure executive funding and form a cross-functional team, map risks and locate data across cloud, SaaS, OT, and edge devices, and conduct a Business Impact Analysis to define a Minimal Viable Business (MVB). Evolve backups to 3-2-1-1-0 with immutable or air-gapped copies, adopt BaaS/DRaaS and AI-driven tools for discovery and autonomous backups, and run realistic, gamified tests followed by post-mortems.

read more →

Thu, October 2, 2025

Trustworthy Oracle Architecture for Enterprise DLT

🔒 DZ BANK and Google Cloud present a blueprint for delivering trustworthy off‑chain data to smart contracts, addressing a key barrier to enterprise DLT adoption. The design pairs Google Cloud secure global infrastructure with DZ BANK’s deterministic financial protocols to guarantee data correctness at source, integrity in transit, and timely delivery. The Smart Derivative Contract (SDC) use case demonstrates deterministic valuation, automated margining, and cryptographic attestation of oracle outputs. Production controls such as Binary Authorization, Private Service Connect, Confidential Space (TEE), and TLS are used to mitigate software supply‑chain, transport, and runtime threats.

read more →

Thu, September 11, 2025

CISA Publishes Strategic Roadmap for the CVE Program

🔒 CISA has published a strategic focus document, “CVE Quality for a Cyber Secure Future,” signaling federal support for the Common Vulnerabilities and Exposures (CVE) program and a shift from a growth-focused expansion to a defined Quality Era. The agency reaffirmed that the program should remain public and vendor‑neutral while evaluating potential mechanisms for diversified funding and taking a more active leadership role. The roadmap prioritizes automation, strengthened CNA services and CNAs of Last Resort, expanded API support, improved CVE.org capabilities, minimum data-quality standards and federated enrichment approaches such as Vulnrichment.

read more →

Wed, September 3, 2025

EMBER2024: Advancing ML Benchmarks for Evasive Malware

🛡️ The EMBER2024 release modernizes the popular EMBER malware benchmark by providing metadata, labels, and computed features for over 3.2 million files spanning six file formats. It supplies a 6,315-sample challenge set of initially evasive malware, updated feature extraction code using pefile, and supplemental raw bytes and disassembly for 16.3 million functions. The package also includes source code to reproduce feature calculation, labeling, and dataset construction so researchers can replicate and extend benchmarks.

read more →

Thu, August 28, 2025

AI Crawler Traffic: Purpose and Industry Breakdown

🔍 Cloudflare Radar introduces industry-focused AI crawler insights and a new crawl purpose selector that classifies bots as Training, Search, User action, or Undeclared. The update surfaces top bot trends, crawl-to-refer ratios, and per-industry views so publishers can see who crawls their content and why. Data shows Training drives nearly 80% of crawl requests, while User action and Undeclared exhibit smaller, cyclical patterns.

read more →

Fri, August 22, 2025

Data Integrity Must Be Core for AI Agents in Web 3.0

🔐 In this essay Bruce Schneier (with Davi Ottenheimer) argues that data integrity must be the foundational trust mechanism for autonomous AI agents operating in Web 3.0. He frames integrity as distinct from availability and confidentiality, and breaks it into input, processing, storage, and contextual dimensions. The piece describes decentralized protocols and cryptographic verification as ways to restore stewardship to data creators and offers practical controls such as signatures, DIDs, formal verification, compartmentalization, continuous monitoring, and independent certification to make AI behavior verifiable and accountable.

read more →

Mon, August 11, 2025

Preventing ML Data Leakage Through Strategic Splitting

🔐 CrowdStrike explains how inadvertent 'leakage' — when dependent or correlated observations are included in training — can inflate machine learning performance and undermine threat detection. The article shows that blocked or grouped data splits and blocked cross-validation produce more realistic performance estimates than random splits. It also highlights trade-offs, such as reduced predictor-space coverage and potential underfitting, and recommends careful partitioning and continuous evaluation to improve cybersecurity ML outcomes.

read more →