< ciso
brief />
Tag Banner

All news with #cloud run tag

15 articles

Deploy a Multi-Agent System on Cloud Run with Terraform

๐Ÿ“ฃ This article describes how the Dev Signal team transitioned a multi-agent prototype into production on Google Cloud by combining a FastAPI service, a Vertex AI memory bank, and the Agent Developer Kit. It highlights production-ready concerns including OpenTelemetry traces exported to Cloud Trace for visibility into agent reasoning, and secure secret handling via Secret Manager so credentials never appear in environment variables. The guide also demonstrates reproducible infrastructure using Terraform to provision Artifact Registry, service accounts, Cloud Run, and related APIs, and outlines containerization and Cloud Build steps to deploy new revisions.
read more โ†’

Cloud Run Worker Pools at Estรฉe Lauder Companies: Use Cases

๐Ÿ” Google Cloud's Cloud Run worker pools provide an always-on, pull-based execution model that Estรฉe Lauder Companies used to scale LLM-powered services. The company's Rostrum platform migrated from a request-driven service to a producer-consumer architecture: a FastAPI web tier publishes user messages to Pub/Sub and worker pools consume them for LLM inference. This decoupling improved message durability, UI latency SLAs, and reduced operational overhead while enabling GPU-backed distributed workloads and cost improvements for long-running background tasks.
read more โ†’

Orchestrator Pattern for Distributed AI Agents at Scale

๐Ÿค– The post proposes the orchestrator pattern to turn monolithic AI scripts into a team of specialized, distributed microservices that integrate directly with existing frontends. It demonstrates using Google's Agent Development Kit (ADK), the Agent-to-Agent (A2A) protocol, and Cloud Run to host separate researcher, judge, and orchestrator services. The design enables independent scaling, strict JSON contracts for reliable decision-making, and language-agnostic implementations. The authors emphasize production hardening: secure agent endpoints, mitigate latency across hops, and implement robust retries and error handling.
read more โ†’

Updated Spend-Based Committed Use Discounts Guide Overview

๐Ÿ’ก Google Cloud updated its spend-based Committed Use Discounts (CUDs), moving from a credit-based model to a direct discounted price model that makes net costs and savings visible at a glance. The rollout began in July 2025 and is now generally available, expanding SKU coverage to include Cloud Run and H3/M-series VMs and correcting reporting gaps for mixed Flex CUD environments. The unified CUD Analysis provides hourly granularity (up to 30 days), CSV exports, and a metadata export for programmatic joins with Billing BigQuery Export datasets. Enhanced recommendation and scenario modeling let FinOps teams size commitments, tune coverage thresholds, and validate pre/post migration savings.
read more โ†’

Cloud Run Adds NVIDIA RTX PRO 6000 Blackwell GPUs for AI

๐Ÿš€ Cloud Run now supports NVIDIA RTX PRO 6000 Blackwell GPUs in preview, enabling serverless deployment of large inference models such as Gemma 3 27B and Llama 3.1 70B. The GPUs provide 96GB vGPU memory, 1.6 TB/s bandwidth and support for FP4 and FP6 precision. Cloud Run pre-installs drivers, offers rapid GPU startup and autoscaling to zero, and integrates with Cloud Storage and IAP for production use.
read more โ†’

Full-Stack Dart Architecture: Flutter on Cloud Run

๐Ÿš€ This article demonstrates a full-stack architecture that uses Flutter for the web frontend and Dart for the backend, enabling shared models and business logic across client and server. It walks through a To-Do example that places the domain model in a shared package, uses Shelf to serve both API routes and static web files, and compiles the server to a native executable for fast startup. Deployment options include Cloud Run's OS-only runtime for mounting precompiled artifacts or a Dockerfile-based multi-stage build for portable containers, and the article includes CI guidance using GitHub Actions to automate analysis, tests, and web builds.
read more โ†’

Deploy Gemini 3 Apps Quickly with Google Cloud Run

๐Ÿš€ This guide demonstrates how to create and deploy a public web app using Gemini 3 Flash Preview via Google AI Studio and Google Cloud Run. In Build mode you describe the application in natural language and let the model "vibe code" a complete app, which appears instantly in the Preview panel for testing. When satisfied, a single Deploy App action pushes the app to Cloud Run, exports your API key as an environment variable, and provides a shareable URL. Note that deployment requires a Google Cloud project with billing enabled.
read more โ†’

Hands-on with Gemma 3: Deploying Open Models on GCP

๐Ÿš€ Google Cloud introduces hands-on labs for Gemma 3, a family of lightweight open models offering multimodal (text and image) capabilities and efficient performance on smaller hardware footprints. The labs present two deployment paths: a serverless approach using Cloud Run with GPU support, and a platform approach using GKE for scalable production environments. Choose Cloud Run for simplicity and cost-efficiency or GKE Autopilot for control and robust orchestration to move models from local testing to production.
read more โ†’

Deploy n8n on Cloud Run for Serverless AI Workflows

๐Ÿš€ Deploy the official n8n Docker image to Cloud Run in minutes to run scalable, serverless AI workflows. Cloud Run scales from zero and persists data in Cloud SQL while you only pay for active usage. The post shows how to call Gemini as the agent LLM and optionally connect workflows to Google Workspace via OAuth for Gmail, Calendar, and Drive. For production, follow the n8n docs to add Secrets Manager, Cloud SQL, and Terraform-based deployment.
read more โ†’

Giles AI on Google Cloud: Transforming Medical Research

๐Ÿš€ Giles AI migrated its healthcare-focused platform to Google Cloud to reduce latency, improve scalability, and accelerate developer velocity. Using Google Kubernetes Engine, Cloud Run, and Compute Engine, the company orchestrates complex clinical data flows and routes prompts through Vertex AI and Model Garden to remain model-agnostic. Data storage and extraction are handled with Cloud SQL, Cloud Storage, and Document AI, while Cloud Armor and Security Command Center bolster security and compliance. Early customer results include dramatic reductions in research time and improvements in response accuracy.
read more โ†’

Scaling Customer Experience with AI on Google Cloud

๐Ÿค– LiveX AI outlines a Google Cloud blueprint to scale conversational customer experiences across chat, voice, and avatar interfaces. The post details how Cloud Run hosts elastic front-end microservices while GKE provides GPU-backed AI inference, and how AgentFlow orchestrates conversational state, knowledge retrieval, and human escalation. Reported customer outcomes include a >90% self-service rate for Wyze and a 3ร— conversion uplift for Pictory. The design emphasizes cost efficiency, sub-second latency, multilingual support, and secure integrations with platforms such as Stripe, Zendesk, and Salesforce.
read more โ†’

Google Cloud launches advanced AI training suite for roles

๐Ÿš€ Google Cloud announced a new suite of AI training courses for intermediate and advanced learners across technical and non-technical roles. The curriculum covers designing and managing AI infrastructure using GCE and GKE, fine-tuning models like Gemini, serverless inference with Cloud Run, and securing generative AI deployments. Hands-on labs teach building AI agents that securely connect to enterprise databases and rapid prototyping in Google AI Studio. Courses are available on Google Cloud Skills Boost to help learners future-proof their AI skills.
read more โ†’

Gemini CLI Extensions: Security and Cloud Run Tools

๐Ÿš€ Google is previewing two Gemini CLI extensions that bring security analysis and Cloud Run deployment directly into your terminal. The security extension introduces /security:analyze to scan local git diffs for issues such as hardcoded secrets, injection flaws, broken access control, and insecure data handling, and returns clear remediation guidance or optional fixes. The Cloud Run extension adds /deploy, a one-command flow to build, containerize, push, and configure services on Cloud Run, returning a public URL and supporting terminal, VS Code agent mode, and Cloud Shell.
read more โ†’

Google Cloud Expands Coverage for Compute Flex CUDs

๐Ÿ”” Google Cloud has expanded its Compute Flexible Committed Use Discounts (Flex CUDs) to cover additional VM families and serverless offerings, delivering broader savings and greater deployment flexibility. The update adds enhanced discounts for memoryโ€‘optimized M1โ€“M4 instances and HPCโ€‘optimized H3 and H4D families, and extends coverage to Cloud Run request-based billing and Cloud Functions. A new spend-based billing model applies discounts directly to eligible usage rather than issuing credits, and introduces changes to the Billing UI, Cloud Billing export to BigQuery schema, and Cloud Commerce Consumer Procurement APIs. Customers can opt in immediately; those who do not will be auto-transitioned to the new model on January 21, 2026, while new Billing Accounts created on or after July 15, 2025 will default to the updated model.
read more โ†’

High-Availability Multi-Regional Services on Cloud Run

๐Ÿš€ This Cloud Next 2025 talk explains how to build fault-tolerant, multi-region services using Cloud Run, highlighting autoscaling, decoupled control/data planes, and N+1 zonal redundancy. The post previews an upcoming Service Health feature that automates cross-region failover by relying on container readiness probes and minimum-instance settings. It also outlines deployment patterns (global external ALB with Serverless NEGs) and shows a live demo of automated traffic failover.
read more โ†’