All news with #retrieval-augmented generation tag
Wed, December 10, 2025
Microsoft Ignite 2025: Building with Agentic AI and Azure
🚀 Microsoft Ignite 2025 showcased a suite of Azure and AI updates aimed at accelerating production use of agentic systems. Anthropic's Claude models are now available in Microsoft Foundry alongside OpenAI GPTs, and Azure HorizonDB adds PostgreSQL compatibility with built-in vector indexing for RAG. New Azure Copilot agents automate migration, operations, and optimization, while refreshed hardware (Blackwell Ultra GPUs, Cobalt CPUs, Azure Boost DPU) targets scalable training and secure inference.
Wed, December 10, 2025
Google Patches Zero-Click Gemini Enterprise Vulnerability
🔒 Google has patched a zero-click vulnerability in Gemini Enterprise and Vertex AI Search that could have allowed attackers to exfiltrate corporate data via hidden instructions embedded in shared Workspace content. Discovered by Noma Security in June 2025 and dubbed "GeminiJack," the flaw exploited Retrieval-Augmented Generation (RAG) retrieval to execute indirect prompt injection without any user interaction. Google updated how the systems interact, separated Vertex AI Search from Gemini Enterprise, and changed retrieval and indexing workflows to mitigate the issue.
Thu, December 4, 2025
PubMed Data in BigQuery to Accelerate Medical Research
🔬 Google Cloud has made PubMed content available as a BigQuery public dataset with integrated vector search via Vertex AI, enabling semantic search across more than 35 million biomedical articles. Both BigQuery and Vertex AI Vector Search are FedRAMP High authorized, allowing organizations to run embedding models and VECTOR_SEARCH queries inside BigQuery. Early adopters like The Princess Máxima Center report literature reviews reduced from hours to minutes, and example SQL plus a demo repo are provided to help teams get started.
Tue, December 2, 2025
Mistral Large 3 Now Available in Microsoft Foundry
🚀 Microsoft has added Mistral Large 3 to Foundry on Azure, offering a high-capability, Apache 2.0–licensed open-weight model optimized for production workloads. The model focuses on reliable instruction following, extended-context comprehension, strong multimodal reasoning, and reduced hallucination for enterprise scenarios. Foundry packages unified governance, observability, and agent-ready tooling, and allows weight export for hybrid or on-prem deployment.
Tue, December 2, 2025
Amazon S3 Vectors GA: Scalable, Cost‑Optimized Vector Store
🚀 Amazon S3 Vectors is now generally available, delivering native, purpose-built vector storage and query capabilities in cloud object storage. It supports up to two billion vectors per index, 10,000 indexes per vector bucket, and offers up to 90% lower costs to upload, store, and query vectors. S3 Vectors integrates with Amazon Bedrock, SageMaker Unified Studio, and OpenSearch Service, supports SSE-S3 and optional SSE-KMS encryption with per-index keys, and provides tagging for ABAC and cost allocation.
Sun, November 30, 2025
Amazon Connect adds Bedrock knowledge base integration
📘 Amazon Connect now supports connecting existing Amazon Bedrock Knowledge Bases directly to AI agents and allows multiple knowledge bases per agent. You can attach Bedrock KBs in a few clicks with no additional setup or data duplication, and leverage Bedrock connectors such as Adobe Experience Manager, Confluence, SharePoint, and OneDrive. With multiple KBs per agent, AI agents can query several sources in parallel for more comprehensive responses. This capability is available in all AWS Regions where both services are offered.
Sun, November 30, 2025
AWS Marketplace adds Agent Mode and AI-Enhanced Search
🔎 AWS Marketplace introduced Agent mode and AI-enhanced search to speed solution discovery across 30,000+ listings. Agent mode provides a conversational procurement assistant that ingests use cases and uploaded requirements to deliver tailored recommendations and dynamic side-by-side comparisons. Users can refine results through dialogue, generate downloadable purchasing proposals, and initiate purchases directly. AI-enhanced search supplies contextual results with AI-generated summaries, adaptive categories, and AWS Specializations badges to spotlight validated partners.
Sun, November 30, 2025
AWS Bedrock Knowledge Bases Adds Multimodal Retrieval
🔍 AWS has announced general availability of multimodal retrieval in Amazon Bedrock Knowledge Bases, enabling unified search across text, images, audio, and video. The managed Retrieval Augmented Generation (RAG) workflow provides developers full control over ingestion, parsing, chunking, embedding (including Amazon Nova multimodal), and vector storage. Users can submit text or image queries and receive relevant text, image, audio, and video segments back, which can be combined with the LLM of their choice to generate richer, lower-latency responses. Region availability varies by feature set and is documented by AWS.
Wed, November 26, 2025
AWS Knowledge MCP Server Adds Topic-Based Search for Domains
🔎 The AWS Knowledge MCP Server now supports topic-based search across specialized documentation domains, enabling more precise queries against areas such as Troubleshooting, AWS Amplify, AWS CDK, CDK Constructs, and AWS CloudFormation. This enhancement lets MCP clients and agentic frameworks target domain-specific resources to reduce noise and improve relevance. The capability complements existing API reference and general documentation search features and is available immediately at no additional cost, subject to standard rate limits.
Mon, November 24, 2025
Amazon Quick Suite Embedded Chat Now Generally Available
💬 AWS announced general availability of Amazon Quick Suite Embedded Chat, a ready-made conversational AI you can embed into applications via one-click embedding or API-based iframes. The agent unifies structured data and unstructured knowledge in a single conversation so users can reference KPIs, pull file details, check customer feedback, and trigger actions without leaving the app. Connectors include SharePoint, websites, Slack, and Jira, and enterprises retain control over data access and action scopes. Embedded Chat is available in select Regions with no additional charge beyond existing Quick Suite pricing.
Fri, November 21, 2025
Google: Leader in 2025 Gartner Magic Quadrant for CDBMS
📈 Google announces it was named a Leader in the 2025 Gartner Magic Quadrant for Cloud Database Management Systems for the sixth consecutive year and positioned furthest in vision. The post presents the company's AI-native Data Cloud—a unified stack integrating BigQuery, Spanner, AlloyDB, Looker, and Dataplex—to support agentic AI. Google highlights embedded specialized agents, developer tooling (Data Agents API, ADK, Gemini CLI) and Agent Analytics in BigQuery to accelerate AI-driven applications while asserting cost and governance benefits on a single, open platform.
Fri, November 21, 2025
Agentic AI Framework for Life Sciences R&D on Google Cloud
🔬 Google Cloud outlines an agentic AI framework to accelerate life sciences R&D by orchestrating specialized, fine-tunable models into modular workflows. It describes four agents—MedGemma for deep literature and data synthesis, TxGemma for in-silico preclinical prediction, Gemini 2.5 Pro as the cognitive orchestrator, and AlphaFold-2 plus docking tools for molecular design. The architecture maps data flows, tooling, and cloud services (Vertex AI, HPC, search) to move from target discovery through iterative Design→Dock→Predict→Refine cycles toward lab-ready lead nomination while preserving version control and compliance.
Fri, November 21, 2025
BigQuery AI: Unified ML, Generative AI, and Agents
🤖 BigQuery AI consolidates BigQuery’s built-in ML, generative AI functions, vector search, and agent tools into a unified platform. It enables users to apply generative models and embeddings directly via SQL, perform semantic vector search, and run end-to-end ML workflows without moving data. Role-specific data agents and assistive features like a data canvas and code completion accelerate work for engineers, data scientists, and business users.
Fri, November 21, 2025
AWS preview: Fully managed MCP servers for EKS and ECS
🔔 Amazon EKS and ECS now offer fully managed MCP servers in preview, providing a cloud-hosted Model Context Protocol endpoint to enrich AI-powered development and operations. These servers remove local installation and maintenance, and deliver enterprise features such as automatic updates and patching, centralized security via AWS IAM, and audit logging through AWS CloudTrail. Developers can connect AI coding assistants like Kiro CLI, Cursor, or Cline for context-aware code generation and debugging, while operators gain access to a knowledge base of best practices and troubleshooting guidance.
Tue, November 18, 2025
Microsoft Databases and Fabric: Unified AI Data Estate
🧠 Microsoft details a broad expansion of its database portfolio and deeper integration with Microsoft Fabric to simplify data architectures and accelerate AI. Key launches include general availability of SQL Server 2025, GA of Azure DocumentDB (MongoDB-compatible), the preview of Azure HorizonDB, and Fabric-hosted SaaS databases for SQL and Cosmos DB. OneLake mirroring, Fabric IQ semantic modeling, expanded agent capabilities, and partner integrations (SAP, Salesforce, Databricks, Snowflake, dbt) are positioned to deliver zero-ETL analytics and operational AI at scale.
Tue, November 18, 2025
Microsoft Foundry: Modular, Interoperable Secure Agent Stack
🔧 Microsoft today expanded Foundry, its platform for building production AI apps and agents, with new models, developer tools, and governance controls. Key updates include broader model access (Anthropic, Cohere, NVIDIA), a generally available model router, and public previews for Foundry IQ, Agent Service features (hosted agents, memory, multi-agent workflows), and the Foundry Control Plane. Foundry Tools and Foundry Local bring real-time connectors and edge inference, while Managed Instance on Azure App Service eases .NET cloud migrations.
Mon, November 17, 2025
A Methodical Approach to Agent Evaluation: Quality Gate
🧭 Hugo Selbie presents a practical framework for evaluating modern multi-step AI agents, emphasizing that final-output metrics alone miss silent failures arising from incorrect reasoning or tool use. He recommends defining clear, measurable success criteria up front and assessing agents across three pillars: end-to-end quality, process/trajectory analysis, and trust & safety. The piece outlines mixed evaluation methods—human review, LLM-as-a-judge, programmatic checks, and adversarial testing—and prescribes operationalizing these checks in CI/CD with production monitoring and feedback loops.
Mon, November 17, 2025
Production-Ready AI with Google Cloud Learning Path
🚀 Google Cloud has launched the Production-Ready AI Learning Path, a free curriculum designed to guide developers from prototype to production. Drawing on an internal playbook, the series pairs Gemini models with production-grade tools like Vertex AI, Google Kubernetes Engine, and Cloud Run. Modules cover LLM app development, open model deployment, agent building, security, RAG, evaluation, and fine-tuning. New modules will be added weekly through mid-December.
Tue, November 11, 2025
How BigQuery Brought Vector Search to Analytics at Scale
🔍 In early 2024 Google introduced native vector search in BigQuery, embedding semantic search directly into the data warehouse to remove the need for separate vector databases. Users can create indexes with a simple CREATE VECTOR INDEX statement and run semantic queries via the VECTOR_SEARCH function or through Python integrations like LangChain. BigQuery provides serverless scaling, asynchronous index refreshes, model rebuilds with no downtime, partitioned indexes, and ScaNN-based TreeAH for improved price/performance, while retaining row- and column-level security and a pay-as-you-go pricing model.
Fri, November 7, 2025
Tiered KV Cache Boosts LLM Performance on GKE with HBM
🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.