sources.
The Application Server
The fast_api_app.py file transforms core agent logic into a production-ready FastAPI server. It establishes the critical connection to the Vertex AI memory bank via MEMORY_URI, enabling the ADK framework to persist and retrieve user preferences across production sessions. The server also initializes production-grade telemetry and securely unpacks runtime secrets without polluting the environment namespace.
cd ..
Paste the following code in dev_signal_agent/fast_api_app.py:
import os
from fastapi import FastAPI
from google.adk.cli.fast_api import get_fast_api_app
from google.cloud import logging as cloud_logging
from vertexai import agent_engines
from dev_signal_agent.app_utils.env import init_environment
# --- Initialization & Secure Secret Retrieval ---
# We now unpack the SECRETS dictionary returned by our updated env.py
PROJECT_ID, MODEL_LOC, SERVICE_LOC, SECRETS = init_environment()
logger = cloud_logging.Client().logger(__name__)
# Access sensitive credentials from the SECRETS dictionary
# These keys stay in memory and are NOT injected into os.environ
REDDIT_CLIENT_ID = SECRETS.get("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = SECRETS.get("REDDIT_CLIENT_SECRET")
REDDIT_USER_AGENT = SECRETS.get("REDDIT_USER_AGENT")
DK_API_KEY = SECRETS.get("DK_API_KEY")
# --- Configuration & Sessions ---
AGENT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Non-sensitive configuration uses environment variables
BUCKET = os.environ.get("AI_ASSETS_BUCKET")
USE_IN_MEMORY = os.environ.get("USE_IN_MEMORY_SESSION", "").lower() in ("true", "1")
# --- MEMORY BANK CONNECTION ---
def _get_memory_bank_uri():
if USE_IN_MEMORY: return None, None
# We use 'dev_signal_agent' as the display name for the Vertex AI memory bank
name = os.environ.get("AGENT_ENGINE_MEMORY_BANK_NAME", "dev_signal_agent")
existing = list(agent_engines.list(filter=f"display_name={name}"))
ae = existing[0] if existing else agent_engines.create(display_name=name)
uri = f"agentengine://{ae.resource_name}"
print(f"DEBUG: Connecting to Memory Bank: {uri} (display_name={name})")
return uri, uri
SESSION_URI, MEMORY_URI = _get_memory_bank_uri()
# --- Initialize FastAPI with ADK ---
app: FastAPI = get_fast_api_app(
agents_dir=AGENT_DIR,
web=True,
artifact_service_uri=f"gs://{BUCKET}" if BUCKET else None,
allow_origins=os.getenv("ALLOW_ORIGINS", "").split(",") if os.getenv("ALLOW_ORIGINS") else None,
session_service_uri=SESSION_URI,
memory_service_uri=MEMORY_URI, # <--- Connects the Memory Bank
otel_to_cloud=True, # <--- Enables production telemetry
)
if __name__ == "__main__":
import uvicorn
# Standard Cloud Run port is 8080
uvicorn.run(app, host="0.0.0.0", port=8080)
Implementing Telemetry
Production visibility requires structured tracing of agent reasoning. Setting otel_to_cloud=True in the ADK initialization automatically instruments the application, exporting "Agent Traces" to Google Cloud Console. These traces render a visual waterfall of cognitive operations, LLM invocations, and MCP tool calls, enabling precise differentiation between reasoning failures and infrastructure bottlenecks.
Monitoring vs. Targeted Evaluation:
Cloud Run applies trace sampling to balance performance and cost. System traces monitor aggregate behavior (latency, timeouts), while reasoning traces require targeted evaluation calls to capture full request details for quality assessment.
Viewing the Trace:
Navigate to Trace Explorer in Google Cloud Console, filter by service name (e.g., dev-signal), and open specific Trace IDs to view Gantt-style breakdowns. This reveals cognitive decision paths versus physical system constraints.
Infrastructure as Code: Provisioning Secure Cloud Resources
Terraform automates the creation of a security-first platform, enforcing least-privilege IAM, automated secret injection, and reproducible resource provisioning. The infrastructure is modularized into logical blocks:
- Resources & Variables: Project, region, and secret mappings
- Core Infrastructure: API enablement and private Artifact Registry
- IAM: Specialized service accounts with scoped permissions
- Secret Management: Secure ingestion into Google Secret Manager
- Cloud Run Configuration: Container environment, resource limits, and runtime secret binding
To begin provisioning, return to the root folder and create the deployment structure:
cd ..
mkdir deployment
cd deployment
mkdir terraform
cd terraform
The variables.tf file defines configurable deployment parameters, enabling environment customization without logic modification. It includes project/region settings, service naming, and a secrets map for secure runtime credential injection.
variable "project_id" {
description = "The Google Cloud Project ID"
type = string
}
variable "region" {
description = "The Google Cloud region to deploy to"
type = string
default = "us-central1"
}
variable "service_name" {
description = "The name of the Cloud Run service"
type = string
default = "dev-signal"
}
variable "secrets" {
description = "A map of secret names and
Pitfall Guide
- Hardcoding Secrets in Environment Variables: Injecting API keys directly into
os.environ or Dockerfiles exposes credentials in logs and container metadata. Always use Secret Manager with runtime injection via Terraform, keeping secrets isolated in memory.
- Ignoring Trace Sampling Limits: Cloud Run samples traces by default. Assuming every request is captured leads to false negatives during debugging. Use targeted evaluation calls for full reasoning trace capture, and rely on system traces for aggregate monitoring.
- Over-Provisioning IAM Permissions: Granting broad roles (e.g.,
roles/editor) to Cloud Run service accounts violates zero-trust principles. Use specialized, least-privilege service accounts scoped to specific APIs (Secret Manager, Vertex AI, Artifact Registry).
- Skipping Local Validation Before Cloud Deployment: Deploying untested agents to Cloud Run amplifies debugging complexity. Always run the dedicated test runner (from local verification phases) to synchronize research, content creation, and memory retrieval before cloud provisioning.
- Incorrect Memory Bank URI Construction: The
agentengine:// URI format requires exact resource name matching. Misconfigured display_name filters or missing agent_engines initialization will cause silent memory failures. Validate URI construction with debug prints before production rollout.
- Overlooking Cloud Run Resource Limits: LLM agents and MCP tool calls are CPU/memory intensive. Default Cloud Run limits may cause OOM kills or timeout errors during multi-step reasoning. Explicitly configure
cpu, memory, and max-instances in Terraform based on load testing.
- Confusing System Traces with Reasoning Traces: System traces highlight infrastructure bottlenecks (network latency, container cold starts), while reasoning traces expose cognitive failures (hallucinations, tool misrouting). Filter traces by span type to avoid misdiagnosing agent behavior.
Deliverables
- Deployment Blueprint: Architecture diagram detailing the flow from Cloud Run ingress β FastAPI server β ADK agent routing β Vertex AI Memory Bank β Secret Manager β Terraform-managed infrastructure. Includes data lineage for telemetry and state persistence.
- Production Readiness Checklist: Pre-deployment validation steps covering local test runner execution, IAM role verification, Secret Manager secret versioning, Cloud Run resource limit configuration, and telemetry endpoint validation.
- Configuration Templates: Ready-to-use Terraform modules (
variables.tf, main.tf, cloud_run.tf), FastAPI server scaffold with ADK integration, and environment variable mapping guide for secure secret injection and memory bank URI resolution.