enting zero-trust requires shifting from network-based trust to identity-based verification. The architecture separates policy decision from policy enforcement, ensuring that trust is never assumed and always evaluated.
Step 1: Establish Workload Identity
Replace IP-based authentication with cryptographic workload identity. Use SPIFFE (Secure Production Identity Framework For Everyone) and SPIRE to issue short-lived X.509 SVIDs (SPIFFE Verifiable Identity Documents). Each service receives a unique, rotating identity bound to its environment, not its host.
Step 2: Deploy Policy Decision & Enforcement Points
Decouple authorization logic from application code. The Policy Decision Point (PDP) evaluates requests against centralized policies. The Policy Enforcement Point (PEP) intercepts traffic, forwards metadata to the PDP, and enforces the verdict. Open Policy Agent (OPA) serves as a production-hardened PDP. Envoy or custom middleware acts as the PEP.
Step 3: Enforce mTLS & Micro-Segmentation
All service-to-service communication must use mutual TLS. Certificates are automatically rotated via SPIRE. Network policies enforce micro-segmentation at the workload level, not the subnet level. This eliminates implicit trust between services sharing the same VLAN or VPC.
Step 4: Implement Continuous Verification & Least Privilege
Trust is never static. Every request triggers policy evaluation. Roles, scopes, and environment attributes are validated in real time. Access is granted for the minimum duration and privilege required.
TypeScript Policy Enforcement Point (PEP) Example
import { Request, Response, NextFunction } from 'express';
import axios from 'axios';
const OPA_URL = process.env.OPA_URL || 'http://opa:8181/v1/data/http/authz';
interface AuthContext {
identity: string;
action: string;
resource: string;
metadata: Record<string, string>;
}
export async function zeroTrustPep(
req: Request,
res: Response,
next: NextFunction
): Promise<void> {
const svid = req.headers['x-svid'] as string;
if (!svid) {
res.status(401).json({ error: 'Missing workload identity' });
return;
}
const context: AuthContext = {
identity: svid,
action: req.method.toLowerCase(),
resource: req.path,
metadata: {
namespace: req.headers['x-namespace'] as string || 'default',
environment: process.env.NODE_ENV || 'production',
},
};
try {
const { data } = await axios.post(OPA_URL, { input: context });
if (!data?.result?.allow) {
res.status(403).json({ error: 'Policy denied' });
return;
}
// Attach verified identity to request for downstream use
req.auth = { identity: svid, namespace: context.metadata.namespace };
next();
} catch (err) {
// Fail-closed: deny if policy engine is unreachable
res.status(503).json({ error: 'Policy engine unavailable' });
}
}
Architecture Rationale
- Control Plane vs Data Plane: SPIRE and OPA operate in the control plane. Envoy/Express middleware operates in the data plane. This separation prevents policy evaluation from blocking high-throughput traffic.
- Fail-Closed Design: If the PDP is unreachable, the PEP denies traffic. This prevents silent policy bypass during outages.
- SVID Rotation: Short-lived certificates (1–24 hours) limit credential exposure. SPIRE handles rotation transparently.
- Policy-as-Code: OPA Rego policies are version-controlled, tested in CI, and deployed independently of application code.
Pitfall Guide
-
Treating Zero-Trust as a Firewall Upgrade
Zero-trust is not network segmentation. Segmentation restricts traffic; zero-trust verifies identity and intent before allowing it. Replacing ACLs with SASE or SD-WAN without implementing identity-centric verification leaves lateral movement pathways intact.
-
Ignoring Identity Lifecycle Management
Workload identities must be provisioned, rotated, and revoked automatically. Manual certificate management or long-lived service accounts defeat zero-trust. Use SPIRE or equivalent workload identity platforms with automated rotation and revocation.
-
Overcomplicating Policy Rules
Writing monolithic Rego policies that evaluate dozens of attributes per request increases latency and debugging complexity. Decompose policies into reusable modules. Test policies with OPA's test command and CI pipelines before deployment.
-
Neglecting Observability & Audit Trails
Zero-trust generates high-volume policy evaluation logs. Without structured logging, tracing, and alerting, you cannot detect policy drift or false positives. Implement distributed tracing (OpenTelemetry) and centralize PDP/PEP audit logs.
-
Assuming Encryption Replaces Verification
mTLS encrypts traffic but does not authorize it. A compromised service can still present a valid certificate. Always pair encryption with policy evaluation. Certificates prove identity; policies prove authorization.
-
Big-Bang Deployment
Migrating all services to zero-trust simultaneously causes cascading failures. Start with non-critical workloads, validate policy evaluation latency, and gradually expand. Use canary deployments for PEP updates.
-
Misaligning PDP/PEP Trust Boundaries
If the PEP trusts the application layer for identity claims, attackers can inject forged headers. Always extract identity from the transport layer (mTLS SVID, SPIFFE header) or a trusted sidecar. Never trust application-supplied identity without cryptographic verification.
Best Practices from Production:
- Run policy evaluation in-process when latency budgets are tight (<5ms).
- Cache negative policy decisions to reduce PDP load during attacks.
- Implement policy versioning and rollback mechanisms.
- Simulate breach scenarios using automated red-team exercises against your PDP.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Legacy monolith on-prem | Sidecar PEP + SPIRE + OPA | Minimal code changes; policy enforcement decoupled from app | Low upfront, moderate infra cost |
| Cloud-native microservices | In-process PEP + service mesh | Sub-5ms latency; native Kubernetes integration | Higher engineering cost, lower operational overhead |
| Hybrid SaaS integration | Gateway PDP + mTLS termination | Centralized policy for external partners; avoids app modifications | Medium cost; simplifies compliance audits |
| High-compliance workload (PCI/HIPAA) | Strict OPA policies + audit logging + fail-closed | Meets regulatory requirements; provides verifiable access trails | High initial setup, reduces breach liability |
Configuration Template
OPA Policy (policy.rego)
package http.authz
default allow = false
allow {
input.identity == "spiffe://example.org/ns/production/sa/api-service"
input.action == "get"
input.resource == "/api/v1/data"
input.metadata.environment == "production"
}
allow {
input.identity == "spiffe://example.org/ns/production/sa/admin-service"
input.action == "post"
input.resource == "/api/v1/config"
}
SPIRE Agent Config (agent.conf)
agent {
data_dir = "/opt/spire/data"
log_level = "INFO"
server_address = "spire-server"
server_port = "50000"
socket_path = "/tmp/spire-registration.sock"
trust_domain = "example.org"
join_token = "your-join-token"
}
Envoy Listener Snippet (mTLS + SPIFFE header injection)
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificate_sds_secret_configs:
- name: "spiffe_cert"
validation_context_sds_secret_config:
name: "spiffe_ca"
filters:
- name: envoy.filters.http.lua
typed_config:
inline_code: |
function envoy_on_request(request_handle)
local svid = request_handle:headers():get("x-forwarded-client-cert")
request_handle:headers():add("x-svid", svid)
end
Quick Start Guide
- Install SPIRE: Run
docker run -d --name spire-server -p 50000:50000 ghcr.io/spiffe/spire-server:latest. Register your workload: spire-server entry create -spiffeID spiffe://example.org/ns/dev/sa/myapp -parentID spiffe://example.org/ns/dev -selector k8s:ns:dev -selector k8s:sa:myapp.
- Deploy OPA: Run
docker run -d --name opa -p 8181:8181 openpolicyagent/opa:latest run --server --log-level info. Load your policy: curl -X PUT http://localhost:8181/v1/policies/myapp -d @policy.rego.
- Launch PEP Middleware: Use the provided TypeScript PEP in your Express app. Set
OPA_URL=http://localhost:8181 and NODE_ENV=development.
- Verify Enforcement: Send a request without
x-svid header → expect 401. Send a request with valid SVID but unauthorized action → expect 403. Send a request matching policy → expect 200.
Zero-trust is not a product you install. It is a control-plane discipline you enforce. Start with identity, automate policy evaluation, and measure blast radius reduction. The architecture scales when trust is cryptographic, continuous, and explicitly denied until verified.