All news with #retrieval-augmented generation tag
Tue, November 18, 2025
Microsoft Foundry: Modular, Interoperable Secure Agent Stack
🔧 Microsoft today expanded Foundry, its platform for building production AI apps and agents, with new models, developer tools, and governance controls. Key updates include broader model access (Anthropic, Cohere, NVIDIA), a generally available model router, and public previews for Foundry IQ, Agent Service features (hosted agents, memory, multi-agent workflows), and the Foundry Control Plane. Foundry Tools and Foundry Local bring real-time connectors and edge inference, while Managed Instance on Azure App Service eases .NET cloud migrations.
Tue, November 18, 2025
Microsoft Databases and Fabric: Unified AI Data Estate
🧠 Microsoft details a broad expansion of its database portfolio and deeper integration with Microsoft Fabric to simplify data architectures and accelerate AI. Key launches include general availability of SQL Server 2025, GA of Azure DocumentDB (MongoDB-compatible), the preview of Azure HorizonDB, and Fabric-hosted SaaS databases for SQL and Cosmos DB. OneLake mirroring, Fabric IQ semantic modeling, expanded agent capabilities, and partner integrations (SAP, Salesforce, Databricks, Snowflake, dbt) are positioned to deliver zero-ETL analytics and operational AI at scale.
Mon, November 17, 2025
A Methodical Approach to Agent Evaluation: Quality Gate
🧭 Hugo Selbie presents a practical framework for evaluating modern multi-step AI agents, emphasizing that final-output metrics alone miss silent failures arising from incorrect reasoning or tool use. He recommends defining clear, measurable success criteria up front and assessing agents across three pillars: end-to-end quality, process/trajectory analysis, and trust & safety. The piece outlines mixed evaluation methods—human review, LLM-as-a-judge, programmatic checks, and adversarial testing—and prescribes operationalizing these checks in CI/CD with production monitoring and feedback loops.
Mon, November 17, 2025
Production-Ready AI with Google Cloud Learning Path
🚀 Google Cloud has launched the Production-Ready AI Learning Path, a free curriculum designed to guide developers from prototype to production. Drawing on an internal playbook, the series pairs Gemini models with production-grade tools like Vertex AI, Google Kubernetes Engine, and Cloud Run. Modules cover LLM app development, open model deployment, agent building, security, RAG, evaluation, and fine-tuning. New modules will be added weekly through mid-December.
Tue, November 11, 2025
How BigQuery Brought Vector Search to Analytics at Scale
🔍 In early 2024 Google introduced native vector search in BigQuery, embedding semantic search directly into the data warehouse to remove the need for separate vector databases. Users can create indexes with a simple CREATE VECTOR INDEX statement and run semantic queries via the VECTOR_SEARCH function or through Python integrations like LangChain. BigQuery provides serverless scaling, asynchronous index refreshes, model rebuilds with no downtime, partitioned indexes, and ScaNN-based TreeAH for improved price/performance, while retaining row- and column-level security and a pay-as-you-go pricing model.
Fri, November 7, 2025
Tiered KV Cache Boosts LLM Performance on GKE with HBM
🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.
Thu, October 30, 2025
OpenAI Updates GPT-5 to Better Handle Emotional Distress
🧭 OpenAI rolled out an October 5 update that enables GPT-5 to better recognize and respond to mental and emotional distress in conversations. The change specifically upgrades GPT-5 Instant—the fast, low-end default—so it can detect signs of acute distress and route sensitive exchanges to reasoning models when needed. OpenAI says it developed the update with mental-health experts to prioritize de-escalation and provide appropriate crisis resources while retaining supportive, grounding language. The update is available broadly and complements new company-context access via connected apps.
Wed, October 29, 2025
AI-targeted Cloaking Tricks Agentic Browsers, Warns SPLX
⚠ Researchers report a new form of context-poisoning called AI-targeted cloaking that serves different content to agentic browsers and AI crawlers. SPLX shows attackers can use a trivial user-agent check to deliver alternate pages to crawlers from ChatGPT and Perplexity, turning retrieved content into manipulated ground truth. The technique mirrors search engine cloaking but targets AI overviews and autonomous reasoning, creating a potent misinformation vector. A concurrent hTAG analysis also found many agents execute risky actions with minimal safeguards, amplifying potential harm.
Wed, October 29, 2025
Amazon Web Grounding for Nova Models Now Generally Available
🌐 Web Grounding is now generally available as a built-in tool for Nova models, usable today with Nova Premier via the Amazon Bedrock tool use API. It retrieves and incorporates publicly available information with citations to support responses, enabling a turnkey RAG solution that reduces hallucinations and improves accuracy. Cross-region inference makes the tool available in US East (N. Virginia), US East (Ohio), and US West (Oregon). Support for additional Nova models will follow.
Tue, October 28, 2025
Amazon Nova Multimodal Embeddings — Unified Cross-Modal
🚀 Amazon announces general availability of Amazon Nova Multimodal Embeddings, a unified embedding model designed for agentic RAG and semantic search across text, documents, images, video, and audio. The model handles inputs up to 8K tokens and video/audio segments up to 30 seconds, with segmentation for larger files and selectable embedding dimensions. Both synchronous and asynchronous APIs are supported to balance latency and throughput, and Nova is available in Amazon Bedrock in US East (N. Virginia).
Thu, October 23, 2025
Agent Factory Recap: Securing AI Agents in Production
🛡️ This recap of the Agent Factory episode explains practical strategies for securing production AI agents, demonstrating attacks like prompt injection, invisible Unicode exploits, and vector DB context poisoning. It highlights Model Armor for pre- and post-inference filtering, sandboxed execution, network isolation, observability, and tool safeguards via the Agent Development Kit (ADK). The team demonstrates a secured DevOps assistant that blocks data-exfiltration attempts while preserving intended functionality and provides operational guidance on multi-agent authentication, least-privilege IAM, and compliance-ready logging.
Wed, October 22, 2025
Four Bottlenecks Slowing Enterprise GenAI Adoption
🔒 Since ChatGPT’s 2022 debut, enterprises have rapidly launched GenAI pilots but struggle to convert experimentation into measurable value — only 3 of 37 pilots succeed. The article identifies four critical bottlenecks: security & data privacy, observability, evaluation & migration readiness, and secure business integration. It recommends targeted controls such as confidential compute, fine‑grained agent permissions, distributed tracing and replay environments, continuous evaluation pipelines and dual‑run migrations, plus policy‑aware integrations and impact analytics to move pilots into reliable production.
Tue, October 21, 2025
Digital Sovereignty Sessions at AWS re:Invent 2025 Guide
📘 The AWS re:Invent 2025 attendee guide highlights the conference's digital sovereignty program, detailing sessions, workshops, and code talks focused on data residency, hybrid and edge deployments, and sovereign infrastructure. Key topics include the AWS European Sovereign Cloud, AWS Outposts, Local Zones, and security features such as the Nitro System. Practical workshops and chalk talks demonstrate RAG, agentic AI, and low-latency SLM deployments with operational controls and compliance patterns. Reserve seating via the attendee portal or access sessions with the free virtual pass.
Tue, October 21, 2025
SmarterX Builds Custom LLMs with Google Cloud Tools
🔍 SmarterX uses Google Cloud to build custom LLMs that help retailers, manufacturers, and logistics companies manage regulatory compliance across product lifecycles. Using BigQuery, Cloud Storage, Gemini, and Vertex AI, the company ingests, normalizes, and indexes unstructured regulatory and product data, applies RAG and grounding, and trains customer-specific models. The integrated platform empowers subject matter experts to evaluate, correct, and deploy model updates without heavy engineering overhead.
Mon, October 20, 2025
Agentic AI and the OODA Loop: The Integrity Problem
🛡️ Bruce Schneier and Barath Raghavan argue that agentic AIs run repeated OODA loops—Observe, Orient, Decide, Act—over web-scale, adversarial inputs, and that current architectures lack the integrity controls to handle untrusted observations. They show how prompt injection, dataset poisoning, stateful cache contamination, and tool-call vectors (e.g., MCP) let attackers embed malicious control into ordinary inputs. The essay warns that fixing hallucinations is insufficient: we need architectural integrity—semantic verification, privilege separation, and new trust boundaries—rather than surface patches.
Tue, October 14, 2025
Scaling Customer Experience with AI on Google Cloud
🤖 LiveX AI outlines a Google Cloud blueprint to scale conversational customer experiences across chat, voice, and avatar interfaces. The post details how Cloud Run hosts elastic front-end microservices while GKE provides GPU-backed AI inference, and how AgentFlow orchestrates conversational state, knowledge retrieval, and human escalation. Reported customer outcomes include a >90% self-service rate for Wyze and a 3× conversion uplift for Pictory. The design emphasizes cost efficiency, sub-second latency, multilingual support, and secure integrations with platforms such as Stripe, Zendesk, and Salesforce.
Tue, October 14, 2025
Google Cloud NetApp Volumes: iSCSI, FlexCache, Gemini
🚀 Google Cloud announced enhancements to NetApp Volumes, adding unified iSCSI block and file storage to support SAN migrations and NetApp FlexCache for high-performance local caching in hybrid environments. The service integrates with Gemini Enterprise as a data store for retrieval-augmented generation, and includes large-capacity volumes, SnapMirror replication, and auto-tiering to optimize performance and costs.
Mon, October 13, 2025
Amazon ElastiCache Adds Vector Search with Valkey 8.2
🚀 Amazon ElastiCache now offers vector search generally available with Valkey 8.2, enabling indexing, searching, and updating billions of high-dimensional embeddings from providers such as Amazon Bedrock, Amazon SageMaker, Anthropic, and OpenAI with microsecond latency and up to 99% recall. Key use cases include semantic caching for LLMs, multi-turn conversational agents, and RAG-enabled agentic systems to reduce latency and cost. Vector search runs on node-based clusters in all AWS Regions at no additional cost, and existing Valkey or Redis OSS clusters can be upgraded to Valkey 8.2 with no downtime.
Mon, October 13, 2025
Amazon CloudWatch Adds Generative AI Observability
🔍 Amazon CloudWatch is generally available with Generative AI Observability, providing end-to-end telemetry for AI applications and AgentCore-managed agents. It expands monitoring beyond model runtime to include Built-in Tools, Gateways, Memory, and Identity, surfacing latency, token usage, errors, and performance across components. The capability integrates with orchestration frameworks like LangChain, LangGraph, and Strands Agents, and works with existing CloudWatch features and pricing for underlying telemetry.
Thu, October 9, 2025
Amazon Quick Suite: Agentic AI Workspace for Business
🤖 Amazon Quick Suite is now generally available as an agentic, AI-powered workspace that retrieves insights across the public internet and your enterprise data stores — including Slack, Salesforce, Snowflake, databases, and other documents — and moves instantly from answers to actions. Quick Suite can execute or trigger tasks in popular applications like Salesforce, Jira, and ServiceNow, and automate workflows from RFP responses to invoice processing and account reconciliation. AWS highlights customer privacy — queries and data are not used to train models — and administrators can enable and tailor the experience quickly; new customers receive a 30-day trial for up to 25 users.