ns runtime environments with committed state.
Rationale: Decoupling pipeline definition from execution prevents runner lock-in. TypeScript provides type safety for configuration schemas, enabling compile-time validation before manifests reach CI runners. GitOps ensures drift detection and rollback capability without manual intervention.
Step 2: Implement Declarative Pipeline Orchestration
Replace inline runner scripts with a TypeScript configuration schema that generates workflow manifests. This enforces consistency across services and eliminates copy-paste pipeline drift.
// pipeline.config.ts
import { z } from 'zod';
export const PipelineSchema = z.object({
service: z.string().min(1),
triggers: z.object({
branches: z.array(z.string()),
paths: z.array(z.string()).optional(),
schedules: z.array(z.string()).optional()
}),
stages: z.array(z.object({
name: z.string(),
runsOn: z.enum(['ubuntu-latest', 'self-hosted', 'macos-latest']),
timeoutMinutes: z.number().min(5).max(120),
steps: z.array(z.object({
id: z.string(),
uses: z.string().optional(),
run: z.string().optional(),
env: z.record(z.string()).optional(),
with: z.record(z.unknown()).optional()
}))
})).min(2)
});
export type PipelineConfig = z.infer<typeof PipelineSchema>;
export const defaultPipeline: PipelineConfig = {
service: 'api-gateway',
triggers: {
branches: ['main', 'release/*'],
paths: ['src/', 'Dockerfile', 'tsconfig.json']
},
stages: [
{
name: 'validate',
runsOn: 'ubuntu-latest',
timeoutMinutes: 10,
steps: [
{ id: 'checkout', uses: 'actions/checkout@v4' },
{ id: 'setup-node', uses: 'actions/setup-node@v4', with: { 'node-version': '20' } },
{ id: 'lint', run: 'npm ci && npm run lint' }
]
},
{
name: 'build-and-scan',
runsOn: 'ubuntu-latest',
timeoutMinutes: 20,
steps: [
{ id: 'checkout', uses: 'actions/checkout@v4' },
{ id: 'build', run: 'npm run build' },
{ id: 'sast', run: 'npx @safe/cli scan --format sarif --output results.sarif' },
{ id: 'upload-sarif', uses: 'github/codeql-action/upload-sarif@v3', with: { 'sarif-file': 'results.sarif' } }
]
}
]
};
Rationale: The schema enforces stage isolation, timeout boundaries, and standardized step structures. TypeScript compilation catches misconfigurations before they reach the runner. Runtime environments remain immutable; only configuration changes trigger pipeline updates.
Step 3: Integrate Shift-Left Security & Policy Gates
Security must execute as a pipeline stage, not a post-merge approval. Implement policy-as-code to evaluate compliance before artifact promotion.
// policy.engine.ts
import { OPA, Data } from 'open-policy-agent';
export class PolicyEngine {
private opa: OPA;
constructor(policyPath: string) {
this.opa = new OPA();
this.opa.loadPolicy(policyPath);
}
async evaluate(input: Record<string, unknown>): Promise<{ allowed: boolean; violations: string[] }> {
const result = await this.opa.evaluate(input);
const violations = result.violations || [];
return {
allowed: violations.length === 0,
violations
};
}
}
// Usage in pipeline orchestration
const engine = new PolicyEngine('./policies/security.rego');
const policyResult = await engine.evaluate({
artifact: { type: 'docker', image: 'registry.internal/api-gateway:sha256:abc123' },
environment: 'staging',
requestedBy: 'deploy-bot'
});
if (!policyResult.allowed) {
throw new Error(`Policy violation: ${policyResult.violations.join(', ')}`);
}
Rationale: Rego policies are evaluated at pipeline runtime, blocking non-compliant deployments before they reach infrastructure. This eliminates approval bottlenecks and provides deterministic compliance evidence.
Step 4: Establish Observability & Feedback Loops
Pipeline telemetry must be treated as first-class metrics. Emit structured events for stage duration, cache hit rates, failure classifications, and resource consumption.
// telemetry.emitter.ts
import { MetricsClient } from '@cloudwatch/metrics';
export class PipelineTelemetry {
private client: MetricsClient;
constructor(namespace: string) {
this.client = new MetricsClient({ namespace });
}
emitStageMetrics(stage: string, durationMs: number, status: 'success' | 'failure' | 'skipped', cacheHit: boolean) {
this.client.putMetric('pipeline.stage.duration', durationMs, { stage, status });
this.client.putMetric('pipeline.cache.hit', cacheHit ? 1 : 0, { stage });
this.client.putMetric('pipeline.stage.status', 1, { stage, status });
}
}
Rationale: Without telemetry, toolchain degradation is invisible. Tracking cache efficiency and failure classification enables proactive optimization. Metrics feed into cost allocation and reliability scoring.
Pitfall Guide
1. Treating Pipelines as Linear Scripts
Linear pipelines assume deterministic execution order and ignore parallelization opportunities. This inflates lead time and creates single points of failure.
Best Practice: Model pipelines as directed acyclic graphs (DAGs). Allow independent stages to run concurrently. Use artifact dependencies to enforce ordering only where necessary.
2. Hardcoding Environment-Specific Configuration
Embedding environment variables, endpoints, or credentials directly in pipeline definitions breaks portability and violates security baselines.
Best Practice: Externalize configuration using environment-scoped variable stores. Resolve secrets at runtime through OIDC or short-lived tokens. Maintain a single pipeline definition that parametrizes per-environment values.
3. Ignoring Pipeline Flakiness Metrics
Teams track deployment success but rarely measure pipeline reliability. Flaky stages cause false failures, eroding trust and triggering unnecessary rollbacks.
Best Practice: Classify failures as infrastructure, network, code, or configuration. Implement retry logic with exponential backoff for transient errors. Quarantine flaky tests and enforce deterministic test execution environments.
4. Over-Reliance on Third-Party Actions Without Verification
Using unvetted community actions introduces supply chain risk. Actions can be compromised, deprecated, or introduce dependency conflicts.
Best Practice: Pin action versions to SHA256 digests. Maintain an internal allowlist. Run dependency scanning on action manifests. Prefer self-hosted or verified vendor actions for critical stages.
5. Skipping Artifact Immutability & Provenance
Mutable artifacts enable replay attacks and make rollback verification impossible. Without provenance, compliance audits fail.
Best Practice: Sign artifacts at build time using Sigstore/Cosign. Store provenance metadata alongside binaries. Enforce immutable tags in registries. Reject deployments with unsigned or tampered artifacts.
6. Failing to Implement Progressive Delivery from Day One
Big-bang deployments increase blast radius and recovery complexity. Toolchains that only support full replacements lack resilience.
Best Practice: Integrate canary analysis and traffic shifting into the promotion stage. Use service mesh or ingress controllers for percentage-based routing. Automate rollback on error rate thresholds.
Ad-hoc tool selection creates knowledge silos and multiplies maintenance overhead. Cross-team onboarding becomes a configuration puzzle.
Best Practice: Publish internal toolchain standards as code. Provide shared templates, linting rules, and validation hooks. Enforce schema compliance through pre-commit checks and PR bots.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small team (<10 devs), single service | Managed CI/CD with TypeScript config validation | Reduces operational overhead; fast time-to-value | Low ($50-150/mo runner costs) |
| Multi-service architecture, regulated industry | GitOps + Policy-as-Code + Immutable Artifacts | Enforces compliance; eliminates drift; audit-ready | Medium ($300-800/mo policy engines, scanning, registry) |
| High-frequency deployment (>50/day) | Event-driven orchestration + Progressive delivery | Minimizes blast radius; optimizes runner utilization | High ($800-2000/mo traffic management, canary analysis, observability) |
| Legacy monolith migration | Parallel pipeline execution + Artifact immutability | Enables safe incremental modernization; rollback safety | Medium ($400-900/mo build caching, artifact storage, telemetry) |
Configuration Template
# .github/workflows/pipeline.yml
name: Modular DevOps Pipeline
on:
workflow_dispatch:
push:
branches: [main, 'release/*']
paths: ['src/**', 'Dockerfile', 'tsconfig.json']
env:
REGISTRY: registry.internal
IMAGE_NAME: ${{ github.repository }}
jobs:
generate:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.config.outputs.matrix }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm ci
- id: config
run: |
MATRIX=$(node scripts/generate-matrix.js)
echo "matrix=$MATRIX" >> $GITHUB_OUTPUT
build:
needs: generate
runs-on: ${{ matrix.runsOn }}
strategy:
matrix: ${{ fromJson(needs.generate.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm ci
- run: npm run build
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.REGISTRY_USER }}
password: ${{ secrets.REGISTRY_PASS }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
security:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: |
docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
npx trivy image --severity HIGH,CRITICAL --exit-code 1 ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
- uses: sigstore/cosign-installer@v3
- run: cosign sign --yes ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
deploy:
needs: security
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- run: |
kubectl set image deployment/api-gateway api-gateway=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
kubectl rollout status deployment/api-gateway --timeout=120s
Quick Start Guide
- Initialize the control plane: Run
npx @codcompass/toolchain init to scaffold the TypeScript schema, policy templates, and telemetry hooks. This generates pipeline.config.ts, policies/, and scripts/ directories.
- Configure runner authentication: Set up OIDC trust between your CI provider and cloud account. Replace long-lived access keys with
aws-actions/configure-aws-credentials@v4 or equivalent.
- Commit and validate: Push the scaffolded repository. The pre-commit hook runs
ts-node scripts/validate-config.ts to ensure schema compliance. Fix any type errors before merging.
- Trigger first pipeline: Open a PR modifying
src/. The workflow generates the execution matrix, builds the artifact, runs Trivy and Cosign, and deploys to the staging cluster. Monitor pipeline metrics in your observability dashboard.
- Enforce policy gates: Add
policies/deployment.rego to block promotions without signed artifacts. Commit the policy and verify that unsigned deployments are rejected at the security stage.