2: Enforce Tag Immutability and Lifecycle Policies
Mutable tags (latest, dev, nightly) create nondeterministic deployments and complicate rollback strategies. Enable tag immutability at the repository level. Define retention policies that preserve:
- Promoted production releases (indefinite or 12-month retention)
- Staging/candidate builds (30-day retention)
- CI-only artifacts (7-day retention)
- Unreferenced layers (garbage collection after 48 hours)
Step 3: Integrate Vulnerability Scanning and SBOM Generation
Scan images at build time, not at deployment. Use Trivy or Grype for CVE detection, and Syft for SBOM generation. Store SBOMs alongside images as OCI artifacts. Configure scanning gates that block promotion when critical/high vulnerabilities exceed thresholds. Cache scan results to avoid redundant analysis.
Step 4: Implement RBAC and Image Signing
Apply least-privilege access control. Separate roles: registry-admin, repo-pusher, repo-puller, scanner-service. Use short-lived OIDC tokens for CI runners instead of long-lived credentials. Sign images with Cosign or Notary v2. Verify signatures in deployment pipelines using policy engines.
Step 5: Automate with Policy-as-Code
Encode retention, scanning, and signing rules in OPA Rego or equivalent policy language. Evaluate policies against registry metadata before promotion. Trigger garbage collection via scheduled jobs or event-driven webhooks.
TypeScript Implementation: Policy-Driven Lifecycle Manager
The following TypeScript script demonstrates a lightweight registry lifecycle manager that queries repository tags, evaluates retention policies, and triggers garbage collection. It uses the OCI Distribution Spec API pattern and can be adapted to Harbor, ECR, or self-hosted registries.
import { createHash } from 'crypto';
interface TagMetadata {
name: string;
digest: string;
created: string;
size: number;
labels: Record<string, string>;
}
interface RetentionPolicy {
maxAgeDays: number;
keepPromoted: boolean;
exemptLabels: string[];
}
interface RegistryClient {
listTags(repo: string): Promise<TagMetadata[]>;
deleteTag(repo: string, digest: string): Promise<void>;
triggerGC(): Promise<void>;
}
class LifecycleManager {
constructor(
private client: RegistryClient,
private policy: RetentionPolicy,
private repo: string
) {}
private isExpired(tag: TagMetadata): boolean {
const created = new Date(tag.created);
const now = new Date();
const diffDays = (now.getTime() - created.getTime()) / (1000 * 60 * 60 * 24);
return diffDays > this.policy.maxAgeDays;
}
private isExempt(tag: TagMetadata): boolean {
return this.policy.exemptLabels.some(label => tag.labels[label] === 'true');
}
async enforce(): Promise<void> {
const tags = await this.client.listTags(this.repo);
const candidates = tags.filter(tag => this.isExpired(tag) && !this.isExempt(tag));
for (const tag of candidates) {
console.log(`Marking for deletion: ${tag.name} (${tag.digest})`);
await this.client.deleteTag(this.repo, tag.digest);
}
if (candidates.length > 0) {
console.log(`Triggering garbage collection for ${this.repo}`);
await this.client.triggerGC();
} else {
console.log(`No expired tags found in ${this.repo}`);
}
}
}
// Example usage with mock registry client
const mockClient: RegistryClient = {
listTags: async () => [
{ name: 'v1.2.0', digest: 'sha256:abc', created: '2024-01-01T00:00:00Z', size: 250, labels: { promoted: 'true' } },
{ name: 'feature-x', digest: 'sha256:def', created: '2024-05-10T00:00:00Z', size: 180, labels: {} },
{ name: 'build-442', digest: 'sha256:ghi', created: '2024-06-20T00:00:00Z', size: 210, labels: {} }
],
deleteTag: async (repo, digest) => console.log(`Deleted ${repo}@${digest}`),
triggerGC: async () => console.log('GC initiated')
};
const policy: RetentionPolicy = {
maxAgeDays: 30,
keepPromoted: true,
exemptLabels: ['promoted']
};
const manager = new LifecycleManager(mockClient, policy, 'myorg/myapp');
manager.enforce().catch(console.error);
This script demonstrates policy evaluation, tag filtering, and GC triggering. In production, replace the mock client with registry-specific SDKs (@aws-sdk/client-ecr, @azure/container-registry, or Harbor REST API). Run as a cron job or GitHub Action to enforce retention continuously.
Architecture Decisions and Rationale
- Centralized vs. Distributed: Use a single authoritative registry per environment with read-only replicas for edge consumption. This prevents drift and simplifies policy enforcement.
- Immutable Tags: Enforce at the registry level. Mutable tags break reproducibility and complicate vulnerability attribution.
- GC Scheduling: Run garbage collection during low-traffic windows. GC locks repositories and can block pulls if scheduled during peak build/deploy cycles.
- SBOM as First-Class Artifact: Store SBOMs in the same registry namespace. Enables dependency tracking, license compliance, and automated remediation workflows.
- Policy Engine Placement: Evaluate policies at promotion boundaries, not at build time. Build failures from policy violations increase CI costs; promotion gates provide better feedback loops.
Pitfall Guide
1. Treating latest as a Deployment Standard
latest is a mutable pointer that breaks reproducibility. Deployments referencing latest cannot be reliably rolled back or audited. Always pin to digests or immutable semantic tags.
2. Skipping Layer Deduplication and GC Tuning
Container registries store layers, not full images. Without proper GC configuration, orphaned layers accumulate. Tune GC to run after tag deletion, and enable layer sharing across repositories.
3. Weak RBAC with Broad Admin Scopes
Granting admin access to CI runners or developers violates least-privilege principles. Use scoped tokens with expiration, and separate push/pull permissions by namespace.
4. No SBOM or Provenance Tracking
Without SBOMs, vulnerability response is reactive and manual. Provenance attestation (Sigstore, in-toto) is required for compliance frameworks like NIST SSDF and EU CRA.
5. Ignoring Cross-Repo Dependency Graphs
Images often depend on base images or shared libraries stored in other repositories. Retention policies that delete base images break dependent images. Implement dependency-aware retention or pin base image digests.
6. Running GC During Peak Build Windows
Garbage collection acquires repository locks. Scheduling GC during high-throughput CI periods causes pipeline failures. Use event-driven GC or off-peak cron schedules.
Labels and annotations are visible in registry APIs and public manifests. Never embed credentials, API keys, or internal endpoints in image metadata. Use runtime secret injection.
Production Best Practices
- Enforce tag immutability at the repository level
- Use digest pins in deployment manifests
- Run vulnerability scans at build time, block promotion on critical/high findings
- Store SBOMs and signatures as OCI artifacts
- Rotate registry credentials every 90 days or use OIDC federation
- Audit registry access logs and integrate with SIEM
- Review retention policies quarterly to align with compliance requirements
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup / Single Region | Managed cloud registry (ECR/ACR/GCR) with built-in scanning | Low operational overhead, native CI/CD integration | Low ($0.10/GB/month) |
| Enterprise / Multi-Cloud | Harbor with cross-region replication and OPA policy engine | Consistent governance, air-gap capable, audit-ready | Medium ($0.15/GB + infra) |
| Compliance-Heavy (FINRA/HIPAA) | Signed images + SBOMs + immutable tags + air-gapped proxy | Meets NIST/EU CRA requirements, enables provenance tracking | High (compliance tooling + storage) |
| High-Frequency CI/CD | Caching proxy + layer deduplication + event-driven GC | Reduces egress costs, prevents pipeline bottlenecks | Low-Medium (cache infra + GC tuning) |
Configuration Template
Harbor Retention & Scanning Policy (YAML)
retention_policy:
repositories:
- name: "myorg/*"
rules:
- tag_select:
pattern: "prod-*"
action: keep
retention_days: 0 # indefinite
- tag_select:
pattern: "staging-*"
action: keep
retention_days: 30
- tag_select:
pattern: "build-*"
action: keep
retention_days: 7
- tag_select:
pattern: "*"
action: delete
untagged_only: true
retention_days: 2
garbage_collection:
schedule: "0 2 * * 0" # Sunday 02:00 UTC
dry_run: false
scanning_policy:
engine: "trivy"
trigger: "manual" # or "push"
severity_threshold: "HIGH"
block_promotion: true
sbom_generation: true
sbom_format: "spdx"
AWS ECR Lifecycle Policy (JSON)
{
"rules": [
{
"rulePriority": 1,
"description": "Keep prod releases indefinitely",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["prod-"],
"countType": "imageCountMoreThan",
"countNumber": 0
},
"action": { "type": "expire" }
},
{
"rulePriority": 2,
"description": "Expire staging images after 30 days",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["staging-"],
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 30
},
"action": { "type": "expire" }
},
{
"rulePriority": 3,
"description": "Delete untagged images after 7 days",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 7
},
"action": { "type": "expire" }
}
]
}
Quick Start Guide
- Provision the Registry: Create a repository in your chosen registry (Harbor, ECR, ACR, or GitHub Packages). Enable tag immutability and namespace isolation.
- Apply Retention Policy: Upload the lifecycle configuration template matching your environment. Set GC schedule to off-peak hours.
- Integrate Scanning & SBOM: Add a build step that runs
trivy image <image> and syft packages <image> -o spdx-json > sbom.json. Push SBOM as an OCI artifact.
- Configure Promotion Gates: In your CI pipeline, block
docker push to production namespaces if scan output contains CRITICAL or HIGH vulnerabilities.
- Verify Enforcement: Push a test image, confirm retention rules apply, validate SBOM storage, and trigger a manual GC run. Monitor logs for policy compliance.
Registry management is not storage optimization; it is supply chain governance. Implementing these controls transforms the registry from a passive artifact bucket into a verifiable, cost-efficient, and secure component of your deployment architecture.