Tag Banner

All news with #retrieval-augmented generation tag

Wed, December 10, 2025

Microsoft Ignite 2025: Building with Agentic AI and Azure

🚀 Microsoft Ignite 2025 showcased a suite of Azure and AI updates aimed at accelerating production use of agentic systems. Anthropic's Claude models are now available in Microsoft Foundry alongside OpenAI GPTs, and Azure HorizonDB adds PostgreSQL compatibility with built-in vector indexing for RAG. New Azure Copilot agents automate migration, operations, and optimization, while refreshed hardware (Blackwell Ultra GPUs, Cobalt CPUs, Azure Boost DPU) targets scalable training and secure inference.

read more →

Wed, December 10, 2025

Google Patches Zero-Click Gemini Enterprise Vulnerability

🔒 Google has patched a zero-click vulnerability in Gemini Enterprise and Vertex AI Search that could have allowed attackers to exfiltrate corporate data via hidden instructions embedded in shared Workspace content. Discovered by Noma Security in June 2025 and dubbed "GeminiJack," the flaw exploited Retrieval-Augmented Generation (RAG) retrieval to execute indirect prompt injection without any user interaction. Google updated how the systems interact, separated Vertex AI Search from Gemini Enterprise, and changed retrieval and indexing workflows to mitigate the issue.

read more →

Thu, December 4, 2025

PubMed Data in BigQuery to Accelerate Medical Research

🔬 Google Cloud has made PubMed content available as a BigQuery public dataset with integrated vector search via Vertex AI, enabling semantic search across more than 35 million biomedical articles. Both BigQuery and Vertex AI Vector Search are FedRAMP High authorized, allowing organizations to run embedding models and VECTOR_SEARCH queries inside BigQuery. Early adopters like The Princess Máxima Center report literature reviews reduced from hours to minutes, and example SQL plus a demo repo are provided to help teams get started.

read more →

Tue, December 2, 2025

Mistral Large 3 Now Available in Microsoft Foundry

🚀 Microsoft has added Mistral Large 3 to Foundry on Azure, offering a high-capability, Apache 2.0–licensed open-weight model optimized for production workloads. The model focuses on reliable instruction following, extended-context comprehension, strong multimodal reasoning, and reduced hallucination for enterprise scenarios. Foundry packages unified governance, observability, and agent-ready tooling, and allows weight export for hybrid or on-prem deployment.

read more →

Tue, December 2, 2025

Amazon S3 Vectors GA: Scalable, Cost‑Optimized Vector Store

🚀 Amazon S3 Vectors is now generally available, delivering native, purpose-built vector storage and query capabilities in cloud object storage. It supports up to two billion vectors per index, 10,000 indexes per vector bucket, and offers up to 90% lower costs to upload, store, and query vectors. S3 Vectors integrates with Amazon Bedrock, SageMaker Unified Studio, and OpenSearch Service, supports SSE-S3 and optional SSE-KMS encryption with per-index keys, and provides tagging for ABAC and cost allocation.

read more →

Sun, November 30, 2025

Amazon Connect adds Bedrock knowledge base integration

📘 Amazon Connect now supports connecting existing Amazon Bedrock Knowledge Bases directly to AI agents and allows multiple knowledge bases per agent. You can attach Bedrock KBs in a few clicks with no additional setup or data duplication, and leverage Bedrock connectors such as Adobe Experience Manager, Confluence, SharePoint, and OneDrive. With multiple KBs per agent, AI agents can query several sources in parallel for more comprehensive responses. This capability is available in all AWS Regions where both services are offered.

read more →

Sun, November 30, 2025

AWS Marketplace adds Agent Mode and AI-Enhanced Search

🔎 AWS Marketplace introduced Agent mode and AI-enhanced search to speed solution discovery across 30,000+ listings. Agent mode provides a conversational procurement assistant that ingests use cases and uploaded requirements to deliver tailored recommendations and dynamic side-by-side comparisons. Users can refine results through dialogue, generate downloadable purchasing proposals, and initiate purchases directly. AI-enhanced search supplies contextual results with AI-generated summaries, adaptive categories, and AWS Specializations badges to spotlight validated partners.

read more →

Sun, November 30, 2025

AWS Bedrock Knowledge Bases Adds Multimodal Retrieval

🔍 AWS has announced general availability of multimodal retrieval in Amazon Bedrock Knowledge Bases, enabling unified search across text, images, audio, and video. The managed Retrieval Augmented Generation (RAG) workflow provides developers full control over ingestion, parsing, chunking, embedding (including Amazon Nova multimodal), and vector storage. Users can submit text or image queries and receive relevant text, image, audio, and video segments back, which can be combined with the LLM of their choice to generate richer, lower-latency responses. Region availability varies by feature set and is documented by AWS.

read more →

Wed, November 26, 2025

AWS Knowledge MCP Server Adds Topic-Based Search for Domains

🔎 The AWS Knowledge MCP Server now supports topic-based search across specialized documentation domains, enabling more precise queries against areas such as Troubleshooting, AWS Amplify, AWS CDK, CDK Constructs, and AWS CloudFormation. This enhancement lets MCP clients and agentic frameworks target domain-specific resources to reduce noise and improve relevance. The capability complements existing API reference and general documentation search features and is available immediately at no additional cost, subject to standard rate limits.

read more →

Mon, November 24, 2025

Amazon Quick Suite Embedded Chat Now Generally Available

💬 AWS announced general availability of Amazon Quick Suite Embedded Chat, a ready-made conversational AI you can embed into applications via one-click embedding or API-based iframes. The agent unifies structured data and unstructured knowledge in a single conversation so users can reference KPIs, pull file details, check customer feedback, and trigger actions without leaving the app. Connectors include SharePoint, websites, Slack, and Jira, and enterprises retain control over data access and action scopes. Embedded Chat is available in select Regions with no additional charge beyond existing Quick Suite pricing.

read more →

Fri, November 21, 2025

Google: Leader in 2025 Gartner Magic Quadrant for CDBMS

📈 Google announces it was named a Leader in the 2025 Gartner Magic Quadrant for Cloud Database Management Systems for the sixth consecutive year and positioned furthest in vision. The post presents the company's AI-native Data Cloud—a unified stack integrating BigQuery, Spanner, AlloyDB, Looker, and Dataplex—to support agentic AI. Google highlights embedded specialized agents, developer tooling (Data Agents API, ADK, Gemini CLI) and Agent Analytics in BigQuery to accelerate AI-driven applications while asserting cost and governance benefits on a single, open platform.

read more →

Fri, November 21, 2025

Agentic AI Framework for Life Sciences R&D on Google Cloud

🔬 Google Cloud outlines an agentic AI framework to accelerate life sciences R&D by orchestrating specialized, fine-tunable models into modular workflows. It describes four agents—MedGemma for deep literature and data synthesis, TxGemma for in-silico preclinical prediction, Gemini 2.5 Pro as the cognitive orchestrator, and AlphaFold-2 plus docking tools for molecular design. The architecture maps data flows, tooling, and cloud services (Vertex AI, HPC, search) to move from target discovery through iterative Design→Dock→Predict→Refine cycles toward lab-ready lead nomination while preserving version control and compliance.

read more →

Fri, November 21, 2025

BigQuery AI: Unified ML, Generative AI, and Agents

🤖 BigQuery AI consolidates BigQuery’s built-in ML, generative AI functions, vector search, and agent tools into a unified platform. It enables users to apply generative models and embeddings directly via SQL, perform semantic vector search, and run end-to-end ML workflows without moving data. Role-specific data agents and assistive features like a data canvas and code completion accelerate work for engineers, data scientists, and business users.

read more →

Fri, November 21, 2025

AWS preview: Fully managed MCP servers for EKS and ECS

🔔 Amazon EKS and ECS now offer fully managed MCP servers in preview, providing a cloud-hosted Model Context Protocol endpoint to enrich AI-powered development and operations. These servers remove local installation and maintenance, and deliver enterprise features such as automatic updates and patching, centralized security via AWS IAM, and audit logging through AWS CloudTrail. Developers can connect AI coding assistants like Kiro CLI, Cursor, or Cline for context-aware code generation and debugging, while operators gain access to a knowledge base of best practices and troubleshooting guidance.

read more →

Tue, November 18, 2025

Microsoft Databases and Fabric: Unified AI Data Estate

🧠 Microsoft details a broad expansion of its database portfolio and deeper integration with Microsoft Fabric to simplify data architectures and accelerate AI. Key launches include general availability of SQL Server 2025, GA of Azure DocumentDB (MongoDB-compatible), the preview of Azure HorizonDB, and Fabric-hosted SaaS databases for SQL and Cosmos DB. OneLake mirroring, Fabric IQ semantic modeling, expanded agent capabilities, and partner integrations (SAP, Salesforce, Databricks, Snowflake, dbt) are positioned to deliver zero-ETL analytics and operational AI at scale.

read more →

Tue, November 18, 2025

Microsoft Foundry: Modular, Interoperable Secure Agent Stack

🔧 Microsoft today expanded Foundry, its platform for building production AI apps and agents, with new models, developer tools, and governance controls. Key updates include broader model access (Anthropic, Cohere, NVIDIA), a generally available model router, and public previews for Foundry IQ, Agent Service features (hosted agents, memory, multi-agent workflows), and the Foundry Control Plane. Foundry Tools and Foundry Local bring real-time connectors and edge inference, while Managed Instance on Azure App Service eases .NET cloud migrations.

read more →

Mon, November 17, 2025

A Methodical Approach to Agent Evaluation: Quality Gate

🧭 Hugo Selbie presents a practical framework for evaluating modern multi-step AI agents, emphasizing that final-output metrics alone miss silent failures arising from incorrect reasoning or tool use. He recommends defining clear, measurable success criteria up front and assessing agents across three pillars: end-to-end quality, process/trajectory analysis, and trust & safety. The piece outlines mixed evaluation methods—human review, LLM-as-a-judge, programmatic checks, and adversarial testing—and prescribes operationalizing these checks in CI/CD with production monitoring and feedback loops.

read more →

Mon, November 17, 2025

Production-Ready AI with Google Cloud Learning Path

🚀 Google Cloud has launched the Production-Ready AI Learning Path, a free curriculum designed to guide developers from prototype to production. Drawing on an internal playbook, the series pairs Gemini models with production-grade tools like Vertex AI, Google Kubernetes Engine, and Cloud Run. Modules cover LLM app development, open model deployment, agent building, security, RAG, evaluation, and fine-tuning. New modules will be added weekly through mid-December.

read more →

Tue, November 11, 2025

How BigQuery Brought Vector Search to Analytics at Scale

🔍 In early 2024 Google introduced native vector search in BigQuery, embedding semantic search directly into the data warehouse to remove the need for separate vector databases. Users can create indexes with a simple CREATE VECTOR INDEX statement and run semantic queries via the VECTOR_SEARCH function or through Python integrations like LangChain. BigQuery provides serverless scaling, asynchronous index refreshes, model rebuilds with no downtime, partitioned indexes, and ScaNN-based TreeAH for improved price/performance, while retaining row- and column-level security and a pay-as-you-go pricing model.

read more →

Fri, November 7, 2025

Tiered KV Cache Boosts LLM Performance on GKE with HBM

🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.

read more →