ions, and ArgoCD Application resources. This repo is the true Source of Truth for the cluster state.
gitops-env-repo/
βββ base/
β βββ cluster-config.yaml # Cluster-wide resources (RBAC, CRDs)
βββ clusters/
β βββ prod/
β β βββ kustomization.yaml # Selects overlays for prod
β β βββ apps/
β β βββ frontend-app.yaml # ArgoCD Application for frontend
β β βββ backend-app.yaml # ArgoCD Application for backend
β βββ staging/
β βββ kustomization.yaml
β βββ apps/
β βββ frontend-app.yaml # Different image tag/ref
βββ namespaces/
βββ prod.yaml
2. Operator Implementation: ArgoCD Application Resource
The ArgoCD Application resource defines the synchronization logic. It specifies the source repository, the path to manifests, the destination cluster, and the sync policy.
# gitops-env-repo/clusters/prod/apps/frontend-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: frontend-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/frontend-app.git
targetRevision: main
path: k8s/overlays/prod
destination:
server: https://kubernetes.default.svc
namespace: frontend
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- PruneLast=true
Rationale: selfHeal: true enables the reconciliation loop to fix drift automatically. prune: true ensures resources removed from Git are deleted from the cluster. PruneLast prevents downtime by deleting resources after new ones are healthy.
3. TypeScript Validation Pre-commit Hook
GitOps shifts validation to the repository. Before merging changes to the Env repo, manifests must be validated. A TypeScript pre-commit hook ensures structural integrity and policy compliance.
// tools/validate-gitops.ts
import * as fs from 'fs';
import * as yaml from 'js-yaml';
import { validate } from 'kubernetes-jsonschema';
interface GitOpsChange {
path: string;
content: string;
}
export function validateGitOpsManifests(changes: GitOpsChange[]): void {
const errors: string[] = [];
changes.forEach(change => {
try {
const docs = yaml.loadAll(change.content);
docs.forEach(doc => {
if (doc && typeof doc === 'object' && 'kind' in doc) {
// Validate against K8s schema
const schemaResult = validate(doc as any, { version: '1.28' });
if (!schemaResult.valid) {
errors.push(`Schema error in ${change.path}: ${schemaResult.errors?.join(', ')}`);
}
// Custom Policy: No 'latest' tags in prod
if (change.path.includes('/prod/') && 'spec' in doc) {
const spec = (doc as any).spec;
if (spec.template?.spec?.containers) {
spec.template.spec.containers.forEach((c: any) => {
if (c.image?.endsWith(':latest')) {
errors.push(`Policy violation in ${change.path}: 'latest' tag prohibited in prod.`);
}
});
}
}
}
});
} catch (e) {
errors.push(`YAML parse error in ${change.path}: ${(e as Error).message}`);
}
});
if (errors.length > 0) {
console.error('GitOps Validation Failed:');
errors.forEach(err => console.error(` - ${err}`));
process.exit(1);
}
}
// Integration with husky/pre-commit
const stagedFiles = process.argv.slice(2);
const changes: GitOpsChange[] = stagedFiles
.filter(f => f.endsWith('.yaml') || f.endsWith('.yml'))
.map(f => ({ path: f, content: fs.readFileSync(f, 'utf8') }));
if (changes.length > 0) validateGitOpsManifests(changes);
4. Secret Management: Sealed Secrets
Storing secrets in Git is prohibited. The implementation must use a solution that allows encrypted secrets to be stored in the Env repo while the decryption key remains in the cluster.
- Tool: Bitnami Sealed Secrets.
- Workflow: Developers use
kubeseal to encrypt a secret locally. The encrypted SealedSecret resource is committed to Git. The Sealed Secrets controller in the cluster decrypts it and creates the standard Kubernetes Secret.
# Encrypted secret committed to Git
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: db-credentials
namespace: backend
spec:
encryptedData:
password: AgBy3i4OJSWK+PiY...
template:
metadata:
name: db-credentials
namespace: backend
type: Opaque
Pitfall Guide
1. Violating the Source of Truth with Manual Edits
Mistake: Developers edit resources via kubectl patch to fix urgent issues, bypassing Git.
Impact: The reconciliation loop detects drift and immediately reverts the change, causing a "flapping" state or outages if the manual fix was necessary.
Best Practice: Configure ArgoCD ResourceIgnoreDifferences only for specific, unavoidable cases (e.g., controller-generated fields). Enforce RBAC to deny kubectl apply access to all users except the GitOps operator service account.
2. Monolithic Repository Bloat
Mistake: Storing all application manifests and cluster config in a single repo.
Impact: Merge conflicts spike, sync times degrade, and security boundaries blur. A change to a minor app requires reviewing the entire repo.
Best Practice: Adopt the Multi-Repo pattern. Use a "Cluster Config" repo for infrastructure and separate "Environment" repos per cluster or region.
3. Improper RBAC Configuration
Mistake: Granting the GitOps operator cluster-admin privileges without namespace scoping.
Impact: A compromised repository or malicious PR can destroy the entire cluster.
Best Practice: Use ArgoCD RBAC policies to map teams to specific projects and namespaces. Ensure the operator runs with the minimum required permissions via ServiceAccount roles scoped to target namespaces.
4. Ignoring the "Cluster in a Box" for Testing
Mistake: Testing GitOps workflows only against production or staging.
Impact: Destructive sync policies (like prune: true) can wipe data in test environments if misconfigured.
Best Practice: Use tools like kind or k3d to spin up ephemeral clusters for CI validation. Run the sync process in a dry-run mode against these clusters before merging.
Mistake: Using GitOps only for deployment but managing infrastructure provisioning (Terraform) separately without GitOps integration.
Impact: Fragmented state management. Infrastructure drift occurs in Terraform while apps drift in GitOps.
Best Practice: Apply GitOps principles to infrastructure. Use Terraform Cloud/Enterprise with Git triggers or tools like Crossplane managed via GitOps to ensure infrastructure state is also reconciled.
6. Missing Rollback Procedures
Mistake: Assuming Git history is enough, but lacking a defined process for reverting.
Impact: During incidents, teams hesitate to revert due to fear of cascading failures.
Best Practice: Document the "Git Revert" procedure. Train teams that git revert followed by a merge is the standard rollback mechanism. Automate notifications to Slack/PagerDuty when a revert occurs.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small Team, Single Cluster | Flux + Monorepo | Flux is lightweight and Git-native; Monorepo reduces overhead. | Low |
| Multi-Cluster, Multi-Region | ArgoCD + Multi-Repo + App of Apps | ArgoCD UI and multi-cluster management are superior; App of Apps scales well. | Medium |
| High Compliance / Audit Required | ArgoCD + Signed Commits + Sealed Secrets | Immutable Git history with signatures meets audit standards; Sealed Secrets secure credentials. | Medium |
| Complex Microservices | Kustomize Overlays + ArgoCD | Kustomize handles variations efficiently without Helm template complexity. | Low |
| Legacy Apps with Custom Scripts | Helm + ArgoCD | Helm allows packaging legacy logic; ArgoCD manages the lifecycle. | Low |
Configuration Template
ArgoCD Application with CI Integration Snippet:
# argocd-app-prod.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service-prod
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "10"
spec:
project: payments
source:
repoURL: git@github.com:org/payment-service.git
targetRevision: refs/heads/main
path: deploy/overlays/prod
destination:
server: https://k8s.prod.internal
namespace: payments
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
syncOptions:
- CreateNamespace=true
GitHub Actions CI for Env Repo:
# .github/workflows/validate-env.yml
name: Validate GitOps Env
on:
pull_request:
paths:
- 'clusters/**'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- run: npx ts-node tools/validate-gitops.ts clusters/**/*.{yaml,yml}
- run: kustomize build clusters/prod | kubeval --strict
Quick Start Guide
-
Initialize Environment:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
-
Access ArgoCD:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
kubectl port-forward svc/argocd-server -n argocd 8080:443
Login via https://localhost:8080 with user admin and the retrieved password.
-
Create Application via CLI:
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace default
-
Sync and Verify:
argocd app sync guestbook
argocd app wait guestbook --sync --health
kubectl get pods -n default
The application is now managed by the reconciliation loop. Any drift will be detected and corrected automatically.