All news with #vertex ai tag

97 articles · page 4 of 5

November 13, 2025

Four Steps for Startups to Build Multi-Agent Systems

🤖 This post outlines a concise four-step framework for startups to design and deploy multi-agent systems, illustrated through a Sales Intelligence Agent example. It recommends choosing between pre-built, partner, or custom agents and describes using Google's Agent Development Kit (ADK) for code-first control. The guide covers hybrid architectures, tool-based state isolation, secure data access, and a three-step deployment blueprint to run agents on Vertex AI Agent Engine and Cloud Run.

Google Cloud Agentic AI Vertex AI

November 7, 2025

AlloyDB AI: Auto Vector Embeddings and Indexing Capabilities

🔍 AlloyDB AI launches two preview features—Auto Vector Embeddings and Auto Vector Index—that let teams convert operational databases into AI-native stores using simple SQL. Auto Vector Embeddings generates and incrementally refreshes vectors in-database, batching calls to Vertex AI and running as a background process. The Auto Vector Index (ScaNN) self-configures, self-tunes, and maintains vector indexes to accelerate filtered semantic search and reduce ETL and tuning overhead for production workloads.

Google Cloud Vertex AI AI Security Product Update

November 5, 2025

Vertex AI Agent Builder: Build, Scale, Govern Agents

🚀 Vertex AI Agent Builder is Google Cloud's integrated platform to build, scale, and govern production AI agents. The update expands the Agent Development Kit (ADK) and Agent Engine with configurable context layers to reduce token usage, an adaptable plugins framework, and new language SDK support including Go. Production features include observability, evaluation tools, simplified deployment via the ADK CLI, and strengthened governance with native agent identities and Model Armor protections.

Google Cloud Vertex AI Agentic AI Model Governance

October 31, 2025

Choosing Google Cloud Managed Lustre for External KV Cache

🚀 This post explains how an external KV Cache backed by Google Cloud Managed Lustre can accelerate transformer inference and lower costs by offloading expensive prefill compute to I/O. In experiments with a 50K token context and ~75% cache-hit, Managed Lustre increased inference throughput by 75% and cut mean time-to-first-token by 44%. The analysis projects a 35% TCO reduction and up to ~43% fewer GPUs for the same workload, and the article summarizes practical steps: provision Managed Lustre in the same zone, deploy an inference server that supports external caching (for example vLLM), enable o_direct, and tune I/O parallelism.

Google Cloud Vertex AI Infrastructure Security

October 28, 2025

Giles AI on Google Cloud: Transforming Medical Research

🚀 Giles AI migrated its healthcare-focused platform to Google Cloud to reduce latency, improve scalability, and accelerate developer velocity. Using Google Kubernetes Engine, Cloud Run, and Compute Engine, the company orchestrates complex clinical data flows and routes prompts through Vertex AI and Model Garden to remain model-agnostic. Data storage and extraction are handled with Cloud SQL, Cloud Storage, and Document AI, while Cloud Armor and Security Command Center bolster security and compliance. Early customer results include dramatic reductions in research time and improvements in response accuracy.

Google Cloud Google Kubernetes Engine Cloud Run Vertex AI

October 28, 2025

A4X Max, GKE Networking, and Vertex AI Training Now Shipping

🚀 Google Cloud is expanding its NVIDIA collaboration with the new A4X Max instances powered by NVIDIA GB300 NVL72, delivering 72 GPUs with high‑bandwidth NVLink and shared memory for demanding multimodal reasoning. GKE now supports DRANET for topology‑aware RDMA scheduling and integrates NVIDIA NeMo Guardrails into GKE Inference Gateway, while Vertex AI Model Garden will host NVIDIA Nemotron models. Vertex AI Training adds NeMo and NeMo‑RL recipes and a managed Slurm environment to accelerate large‑scale training and deployment.

Google Cloud Google Kubernetes Engine Vertex AI Nvidia

October 27, 2025

Vertex AI Training Expands Large-Scale Training Capabilities

🚀 Vertex AI Training introduces managed features designed for large-scale model development, simplifying cluster provisioning, job orchestration, and resiliency across hundreds to thousands of accelerators. The offering integrates Cluster Director, Dynamic Workload Scheduler, optimized checkpointing, and curated training recipes, including NVIDIA NeMo support. These capabilities reduce operational overhead and accelerate transitions from pretraining to fine-tuning while improving cost and uptime efficiency.

Google Cloud Vertex AI Model Governance Product Update

October 21, 2025

SmarterX Builds Custom LLMs with Google Cloud Tools

🔍 SmarterX uses Google Cloud to build custom LLMs that help retailers, manufacturers, and logistics companies manage regulatory compliance across product lifecycles. Using BigQuery, Cloud Storage, Gemini, and Vertex AI, the company ingests, normalizes, and indexes unstructured regulatory and product data, applies RAG and grounding, and trains customer-specific models. The integrated platform empowers subject matter experts to evaluate, correct, and deploy model updates without heavy engineering overhead.

Google Cloud BigQuery Vertex AI Gemini

October 20, 2025

AI Hypercomputer Update: vLLM on TPUs and Tooling Advances

🔧 Google Cloud’s Q3 AI Hypercomputer update highlights inference improvements and expanded tooling to accelerate model serving and diagnostics. The release integrates vLLM with Cloud TPUs via the new tpu-inference plugin, unifying JAX and PyTorch runtimes and boosting TPU inference for models such as Gemma, Llama, and Qwen. Additional launches include improved XProf profiling and Cloud Diagnostics XProf, an AI inference recipe for NVIDIA Dynamo, NVIDIA NeMo RL recipes, and GA of the GKE Inference Gateway and Quickstart to help optimize latency and cost.

Google Cloud Vertex AI Product Update

October 20, 2025

Google Named Leader in 2025 IDC MarketScape for GenAI

🏆 Google Cloud announced it was named a Leader in the 2025 IDC MarketScape for Worldwide GenAI Life-Cycle Foundation Model Software, spotlighting the Gemini model family and the Vertex AI platform. The post highlights Gemini 2.5’s expanded “thinking” capabilities and new cost controls such as thinking budgets and thought summaries for improved auditability. It also underscores native multimodality, creative variants like Nano Banana, developer tooling including the Gemini CLI, and enterprise features for customization, grounding, security, and governance.

Google Cloud Gemini Vertex AI Product Update

October 20, 2025

Agent Factory Recap: Evaluating Agents, Tooling, and MAS

📡 This recap of the Agent Factory podcast episode, hosted by Annie Wang with guest Ivan Nardini, explains how to evaluate autonomous agents using a practical, full-stack approach. It outlines what to measure — final outcomes, chain-of-thought, tool use, and memory — and contrasts measurement techniques: ground truth, LLM-as-a-judge, and human review. The post demonstrates a 5-step debugging loop using the Agent Development Kit (ADK) and describes how to scale evaluation to production with Vertex AI.

Agentic AI Agent Security Vertex AI How-To

October 17, 2025

Moloco and Google Cloud Power AI Vector Search in Retail

🔎 Moloco’s AI-native retail media platform, integrated with Vertex AI Vector Search on Google Cloud, delivers semantic, real-time ad retrieval and personalized recommendations. The joint architecture uses TPUs and GPUs for model training and scoring while vector search runs efficiently on CPUs, enabling outcomes-based bidding at scale. Internal benchmarks report ~10x capacity, up to ~25% lower p95 latency, and a ~4% revenue uplift. The managed service reduces operational overhead and accelerates time-to-value for retailers.

Google Cloud Vertex AI AI Security

October 16, 2025

Vertex AI SDK Adds Prompt Management for Enterprises

🛠️ Google Cloud announced General Availability of Prompt Management in the Vertex AI SDK, enabling teams to programmatically create, version, and manage prompts as first-class assets. The capability bridges Vertex AI Studio’s visual prompt design with SDK-driven automation to improve collaboration, reproducibility, and lifecycle control. Enterprise security and compliance are supported via CMEK and VPCSC, and the SDK exposes simple Python methods to create, list, update, and delete prompt resources tied to models such as gemini-2.5-flash. Get started using the documented code examples to centralize prompt governance and scale generative AI workflows.

Google Vertex AI Product Update

October 15, 2025

Ultimate Prompting Guide for Veo 3.1 on Vertex AI Preview

🎬 This guide introduces Veo 3.1, Google Cloud's improved generative video model available in preview on Vertex AI, and explains how to move beyond "prompt and pray" toward deliberate creative control. It highlights core capabilities—high-fidelity 720p/1080p output, variable clip lengths, synchronized dialogue and sound effects, and stronger image-to-video fidelity. The article presents a five-part prompting formula and detailed techniques for cinematography, soundstage direction, negative prompting, and timestamped scenes. It also describes advanced multi-step workflows that combine Gemini 2.5 Flash Image to produce consistent characters and controlled transitions, and notes SynthID watermarking and certain current limitations.

Google Cloud Vertex AI Gemini How-To

October 15, 2025

Vertex AI Context Caching: Reduce Cost and Latency

⚡ Vertex AI context caching saves and reuses precomputed input tokens so developers avoid repeatedly sending and recomputing long contextual content, reducing latency and cost for large-context AI applications. It provides implicit caching — automatic, default, short-lived KV caches (deleted within 24 hours) integrated with Provisioned Throughput — and explicit CachedContent objects that are paid once and then reused at a deep discount with optional CMEK protection. Caches support multimodal inputs and very large context windows.

Google Vertex AI AI Security

October 13, 2025

Google Introduces LLM-Evalkit for Prompt Engineering

🧭 LLM-Evalkit is an open-source, lightweight application from Google that centralizes and streamlines prompt engineering using Vertex AI SDKs. It provides a no-code interface for creating, versioning, testing, and benchmarking prompts while tracking objective performance metrics. The tool promotes a dataset-driven evaluation workflow—define the task, assemble representative test cases, and score outputs against clear metrics—to replace ad-hoc iteration and subjective comparisons. Documentation and a guided console tutorial are available to help teams adopt the framework and reproduce experiments.

Google Vertex AI LLM Security Tool Abuse

October 9, 2025

Partners Powering the Gemini Enterprise Agent Ecosystem

🚀 Gemini Enterprise launches a curated ecosystem of partner-built AI agents that integrate with Google Cloud to deliver validated, secure solutions for enterprise workflows. The platform supports Agent2Agent (A2A) communication and includes a Gemini-powered AI agent finder for natural language discovery and filtering by industry, use case, and validation status. A broad set of technology and consulting partners — from Box and Salesforce to ServiceNow, Workday, and Accenture — are bringing agents and services to the Google Cloud Marketplace to accelerate deployment and adoption.

Google Vertex AI

October 7, 2025

150 AI Use Cases from Startups Leveraging Google Cloud

🤖 At the AI Builders Forum, Google Cloud highlighted 150 startups using its generative AI stack—Vertex AI, Gemini, GKE, and Cloud Storage—to build agentic systems, healthcare models, developer tools, and media pipelines. The post catalogs companies across sectors (healthcare, finance, retail, security, creative) and describes technical integrations such as fine-tuning with Gemini, inference on GKE, and scalable analytics with BigQuery. It encourages startups to join Google for Startups Cloud and references a new Startup Technical Guide: AI Agents for building and scaling agentic applications.

Google Cloud Vertex AI Agentic AI

October 3, 2025

Dataproc ML library: Connect Spark to Gemini and Vertex

🔗 Google has released an open-source Python library, Dataproc ML, to streamline running ML and generative-AI inference from Apache Spark on Dataproc. The library uses a SparkML-style builder pattern so users can configure a model handler (for example, GenAiModelHandler) and call .transform() to apply Gemini or other Vertex AI models directly to DataFrames. It also supports loading PyTorch and TensorFlow model artifacts from GCS for large-scale batch inference and includes performance optimizations such as vectorized data transfer, connection reuse, and automatic retry/backoff.

Google Vertex AI Gemini

October 2, 2025

Google Cloud Releases Generative Media Models on Vertex AI

🎨Google Cloud announced General Availability and feature updates for its generative media models on Vertex AI, including Gemini 2.5 Flash Image, Veo 3, Imagen 4, and Gemini 2.5 TTS. The release emphasizes production readiness and enterprise security while adding multi‑aspect ratio image generation, batch image processing, vertical 9:16 video formats with precise duration controls, and studio‑quality multi‑speaker text‑to‑speech across 70+ languages. These enhancements target teams seeking faster, controlled, and scalable cross‑format media workflows for sight, sound, and motion.

Google Cloud Vertex AI Gemini