Deploying a Multi-Agent System with Terraform and Cloud Run

By Codcompass Team·2026-05-09·6 min read

Current Situation Analysis

Transitioning a multi-agent system from a local prototype to a production-grade service introduces critical architectural and operational challenges that traditional deployment patterns fail to address. Local environments lack persistent state management, making it impossible to maintain user preferences or cross-session memory. Manual cloud provisioning leads to configuration drift, inconsistent IAM policies, and severe security vulnerabilities when API credentials are hardcoded or passed via plain environment variables. Furthermore, traditional microservice deployments treat agents as stateless HTTP endpoints, ignoring the complex reasoning paths, tool invocations, and memory retrieval cycles inherent to LLM-based architectures. Without structured telemetry, debugging cognitive failures versus system timeouts becomes nearly impossible, and the absence of automated infrastructure-as-code results in non-reproducible environments that cannot scale securely.

WOW Moment: Key Findings

By adopting the Agent Starter Pack patterns combined with Terraform provisioning and Cloud Run deployment, teams achieve a standardized, secure, and observable production backbone. The integration of Vertex AI Memory Bank with ADK telemetry transforms opaque agent behavior into actionable, visualized reasoning paths.

Approach	Deployment Time	Secret Management	Observability Coverage	State Persistence	Security Posture
Local/Manual Script	45-60 mins	Hardcoded/Env Vars	None/Basic Logs	In-Memory Only	Low (Broad IAM)
Cloud Run + Terraform + ADK	<10 mins	Secret Manager Injection	Full Agent Traces	Vertex AI Memory Bank	High (Least-Privilege IAM)

Key Findings:

Terraform reduces infrastructure provisioning time by ~80% while enforcing reproducible, version-controlled state.
ADK's otel_to_cloud=True flag automatically exports structured "Agent Traces" to Cloud Trace, enabling visual waterfall analysis of LLM invocations and MCP tool calls.
Runtime secret injection via Secret Manager eliminates credential leakage risks and supports dynamic rotation without container rebuilds.
Vertex AI Memory Bank provides persistent, cross-session state management, critical for personalized multi-agent interactions.

Core Solution

The production deployment relies on three interconnected layers: a FastAPI application server for request routing and memory binding, OpenTelemetry-based telemetry for reasoning visibility, and Terraform-driven infrastructure provisioning for secure, scalable cloud re

sources.

The Application Server

The fast_api_app.py file transforms core agent logic into a production-ready FastAPI server. It establishes the critical connection to the Vertex AI memory bank via MEMORY_URI, enabling the ADK framework to persist and retrieve user preferences across production sessions. The server also initializes production-grade telemetry and securely unpacks runtime secrets without polluting the environment namespace.

cd ..

Paste the following code in dev_signal_agent/fast_api_app.py:

import os
from fastapi import FastAPI
from google.adk.cli.fast_api import get_fast_api_app
from google.cloud import logging as cloud_logging
from vertexai import agent_engines
from dev_signal_agent.app_utils.env import init_environment

# --- Initialization & Secure Secret Retrieval ---
# We now unpack the SECRETS dictionary returned by our updated env.py
PROJECT_ID, MODEL_LOC, SERVICE_LOC, SECRETS = init_environment()
logger = cloud_logging.Client().logger(__name__)

# Access sensitive credentials from the SECRETS dictionary
# These keys stay in memory and are NOT injected into os.environ
REDDIT_CLIENT_ID = SECRETS.get("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = SECRETS.get("REDDIT_CLIENT_SECRET")
REDDIT_USER_AGENT = SECRETS.get("REDDIT_USER_AGENT")
DK_API_KEY = SECRETS.get("DK_API_KEY")

# --- Configuration & Sessions ---
AGENT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Non-sensitive configuration uses environment variables
BUCKET = os.environ.get("AI_ASSETS_BUCKET")
USE_IN_MEMORY = os.environ.get("USE_IN_MEMORY_SESSION", "").lower() in ("true", "1")

# --- MEMORY BANK CONNECTION ---
def _get_memory_bank_uri():
    if USE_IN_MEMORY: return None, None
    # We use 'dev_signal_agent' as the display name for the Vertex AI memory bank
    name = os.environ.get("AGENT_ENGINE_MEMORY_BANK_NAME", "dev_signal_agent")
    existing = list(agent_engines.list(filter=f"display_name={name}"))
    ae = existing[0] if existing else agent_engines.create(display_name=name)
    uri = f"agentengine://{ae.resource_name}"
    print(f"DEBUG: Connecting to Memory Bank: {uri} (display_name={name})")
    return uri, uri

SESSION_URI, MEMORY_URI = _get_memory_bank_uri()

# --- Initialize FastAPI with ADK ---
app: FastAPI = get_fast_api_app(
    agents_dir=AGENT_DIR,
    web=True,
    artifact_service_uri=f"gs://{BUCKET}" if BUCKET else None,
    allow_origins=os.getenv("ALLOW_ORIGINS", "").split(",") if os.getenv("ALLOW_ORIGINS") else None,
    session_service_uri=SESSION_URI,
    memory_service_uri=MEMORY_URI, # <--- Connects the Memory Bank
    otel_to_cloud=True, # <--- Enables production telemetry
)

if __name__ == "__main__":
    import uvicorn
    # Standard Cloud Run port is 8080
    uvicorn.run(app, host="0.0.0.0", port=8080)

Implementing Telemetry

Production visibility requires structured tracing of agent reasoning. Setting otel_to_cloud=True in the ADK initialization automatically instruments the application, exporting "Agent Traces" to Google Cloud Console. These traces render a visual waterfall of cognitive operations, LLM invocations, and MCP tool calls, enabling precise differentiation between reasoning failures and infrastructure bottlenecks.

Monitoring vs. Targeted Evaluation: Cloud Run applies trace sampling to balance performance and cost. System traces monitor aggregate behavior (latency, timeouts), while reasoning traces require targeted evaluation calls to capture full request details for quality assessment.

Viewing the Trace: Navigate to Trace Explorer in Google Cloud Console, filter by service name (e.g., dev-signal), and open specific Trace IDs to view Gantt-style breakdowns. This reveals cognitive decision paths versus physical system constraints.

Infrastructure as Code: Provisioning Secure Cloud Resources

Terraform automates the creation of a security-first platform, enforcing least-privilege IAM, automated secret injection, and reproducible resource provisioning. The infrastructure is modularized into logical blocks:

Resources & Variables: Project, region, and secret mappings
Core Infrastructure: API enablement and private Artifact Registry
IAM: Specialized service accounts with scoped permissions
Secret Management: Secure ingestion into Google Secret Manager
Cloud Run Configuration: Container environment, resource limits, and runtime secret binding

To begin provisioning, return to the root folder and create the deployment structure:

cd ..
mkdir deployment
cd deployment
mkdir terraform
cd terraform

Terraform Resources and Variables

The variables.tf file defines configurable deployment parameters, enabling environment customization without logic modification. It includes project/region settings, service naming, and a secrets map for secure runtime credential injection.

variable "project_id" {
  description = "The Google Cloud Project ID"
  type        = string
}
variable "region" {
  description = "The Google Cloud region to deploy to"
  type        = string
  default     = "us-central1"
}
variable "service_name" {
  description = "The name of the Cloud Run service"
  type        = string
  default     = "dev-signal"
}
variable "secrets" {
  description = "A map of secret names and

Pitfall Guide

Hardcoding Secrets in Environment Variables: Injecting API keys directly into os.environ or Dockerfiles exposes credentials in logs and container metadata. Always use Secret Manager with runtime injection via Terraform, keeping secrets isolated in memory.
Ignoring Trace Sampling Limits: Cloud Run samples traces by default. Assuming every request is captured leads to false negatives during debugging. Use targeted evaluation calls for full reasoning trace capture, and rely on system traces for aggregate monitoring.
Over-Provisioning IAM Permissions: Granting broad roles (e.g., roles/editor) to Cloud Run service accounts violates zero-trust principles. Use specialized, least-privilege service accounts scoped to specific APIs (Secret Manager, Vertex AI, Artifact Registry).
Skipping Local Validation Before Cloud Deployment: Deploying untested agents to Cloud Run amplifies debugging complexity. Always run the dedicated test runner (from local verification phases) to synchronize research, content creation, and memory retrieval before cloud provisioning.
Incorrect Memory Bank URI Construction: The agentengine:// URI format requires exact resource name matching. Misconfigured display_name filters or missing agent_engines initialization will cause silent memory failures. Validate URI construction with debug prints before production rollout.
Overlooking Cloud Run Resource Limits: LLM agents and MCP tool calls are CPU/memory intensive. Default Cloud Run limits may cause OOM kills or timeout errors during multi-step reasoning. Explicitly configure cpu, memory, and max-instances in Terraform based on load testing.
Confusing System Traces with Reasoning Traces: System traces highlight infrastructure bottlenecks (network latency, container cold starts), while reasoning traces expose cognitive failures (hallucinations, tool misrouting). Filter traces by span type to avoid misdiagnosing agent behavior.

Deliverables

Deployment Blueprint: Architecture diagram detailing the flow from Cloud Run ingress → FastAPI server → ADK agent routing → Vertex AI Memory Bank → Secret Manager → Terraform-managed infrastructure. Includes data lineage for telemetry and state persistence.
Production Readiness Checklist: Pre-deployment validation steps covering local test runner execution, IAM role verification, Secret Manager secret versioning, Cloud Run resource limit configuration, and telemetry endpoint validation.
Configuration Templates: Ready-to-use Terraform modules (variables.tf, main.tf, cloud_run.tf), FastAPI server scaffold with ADK integration, and environment variable mapping guide for secure secret injection and memory bank URI resolution.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back

Current Situation Analysis

WOW Moment: Key Findings

Core Solution

🎉 Mid-Year Sale — Unlock Full Article

Production Bundle