axed security. Production requires immutability, resource limits, and hardened networking.
# docker-compose.yml (development)
services:
api:
build: .
volumes:
- ./src:/app/src
environment:
- NODE_ENV=development
# docker-compose.prod.yml (production overrides)
services:
api:
build:
context: .
dockerfile: Dockerfile.prod
environment:
- NODE_ENV=production
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.25'
memory: 128M
read_only: true
tmpfs:
- /tmp
Deploy with: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
Step 2: Immutable Image Tagging & Build Context
Production deployments must never rely on latest. Implement semantic versioning or commit-sha tagging baked into the build pipeline.
# Dockerfile.prod
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
USER node
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD wget -qO- http://localhost:3000/health || exit 1
EXPOSE 3000
CMD ["node", "dist/index.js"]
Step 3: Resource Constraints & Kernel Limits
Docker's default behavior allows containers to consume all available host CPU and memory. Production manifests must declare hard limits to prevent noisy neighbor scenarios and OOM kills.
services:
api:
deploy:
resources:
limits:
cpus: '1.5'
memory: 1G
reservations:
cpus: '0.5'
memory: 256M
restart: on-failure:5
stop_grace_period: 30s
Step 4: Healthchecks & Dependency Ordering
Implicit startup order is unreliable. Use healthchecks to gate dependent services.
services:
db:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
api:
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
interval: 15s
timeout: 5s
retries: 3
start_period: 20s
Step 5: Secrets Management
Environment variables are visible in docker inspect and process listings. Production workloads must use Docker secrets or external vaults.
services:
api:
secrets:
- db_password
- jwt_secret
environment:
- DB_HOST=db
- DB_USER=app_user
deploy:
replicas: 2
secrets:
db_password:
file: ./secrets/db_password.txt
jwt_secret:
external: true
For external vaults (HashiCorp Vault, AWS Secrets Manager, Doppler), inject secrets at runtime via init containers or entrypoint scripts rather than baking them into images or compose files.
Step 6: Logging & Observability Integration
Default JSON file logging grows unbounded. Configure log drivers with rotation or forward to centralized systems.
services:
api:
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
labels: "production"
# Optional: forward to Loki/Fluentd
# logging:
# driver: fluentd
# options:
# fluentd-address: localhost:24224
# tag: api.{{.Name}}
Step 7: Data Persistence & Backup Hooks
Named volumes are not backups. Implement snapshot hooks or external volume drivers for stateful services.
services:
db:
volumes:
- pgdata:/var/lib/postgresql/data
deploy:
placement:
constraints:
- node.labels.storage == ssd
volumes:
pgdata:
driver: local
driver_opts:
type: none
o: bind
device: /mnt/nvme/pgdata
Pair with a cron job or sidecar container that runs pg_dump or mongodump to immutable storage. Docker Compose does not manage backups; you must externalize them.
Pitfall Guide
-
Using latest or mutable tags in production
latest breaks reproducibility. A background push to a public registry can silently upgrade your production stack, introducing breaking changes or supply chain vulnerabilities. Always pin to digest (sha256:...) or semantic version. Implement image signing (Cosign/Notary) if compliance requires it.
-
Omitting deploy.resources limits
Without CPU/memory boundaries, a single misbehaving container can starve the host, trigger kernel OOM killer, or crash sibling services. Docker's default behavior is permissive; production requires explicit ceilings. Always set both limits and reservations to enable proper scheduling and burst handling.
-
Storing secrets in environment variables or compose files
docker inspect exposes all environment variables. Compose files are often committed to version control. Use Docker secrets, mounted files, or external vaults with short-lived tokens. Never bake credentials into images.
-
Ignoring healthcheck start_period
Healthchecks that fire before an application finishes initialization cause premature restarts, creating restart loops that degrade availability. Always configure start_period to match your application's cold start time, especially for databases and JVM-based runtimes.
-
Running containers as root
Default Docker images often run as root. This expands the attack surface for container escape vulnerabilities. Always specify USER in Dockerfiles and user: "1000:1000" in compose manifests. Combine with read_only: true and explicit tmpfs mounts for writable paths.
-
Assuming named volumes are backups
Named volumes persist across container recreation but offer zero protection against host failure, accidental deletion, or data corruption. Implement external backup strategies: cloud provider snapshots, volume plugin replication, or periodic dump/export scripts.
-
No log rotation or forwarding configuration
Default json-file driver writes indefinitely until disk exhaustion. Production environments must configure max-size/max-file or forward logs to centralized aggregators (Loki, Elasticsearch, Datadog). Unmanaged logs are a silent availability risk.
Best practices from production experience:
- Treat compose files as infrastructure-as-code. Lint them with
docker compose config and version control them.
- Use
--no-deps for targeted service updates during hotfixes.
- Implement blue/green or canary patterns by running parallel compose stacks with reverse proxy routing (Traefik/Nginx).
- Pin Docker Engine version on hosts. Compose v2 behavior varies across minor releases.
- Validate resource limits against actual application profiling data, not guesses.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Monolith or <5 services, single region | Docker Compose | Minimal control plane, fast deployments, low operational overhead | ~$0β$50/mo control plane |
| Multi-region microservices, >10 services | Kubernetes | Native service mesh, auto-scaling, advanced rollout strategies | ~$200β$500/mo control plane + node tax |
| Edge/IoT or constrained hardware | Docker Compose + Swarm | Lightweight clustering, no etcd dependency, predictable resource usage | ~$20β$100/mo |
| Compliance-heavy (PCI/HIPAA) | Kubernetes + External Secrets | Audit trails, RBAC, policy enforcement, secret rotation automation | ~$300β$800/mo + compliance tooling |
Configuration Template
# docker-compose.prod.yml
version: "3.9"
services:
api:
image: registry.example.com/api:${API_VERSION:-1.0.0}
restart: on-failure:5
read_only: true
tmpfs:
- /tmp
- /app/cache
user: "1000:1000"
environment:
- NODE_ENV=production
- DB_HOST=db
- DB_PORT=5432
secrets:
- db_password
- jwt_secret
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
interval: 15s
timeout: 5s
retries: 3
start_period: 20s
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.25'
memory: 128M
replicas: 2
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
db:
image: postgres:16-alpine
restart: unless-stopped
environment:
- POSTGRES_USER=app_user
- POSTGRES_DB=production
secrets:
- db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app_user"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
volumes:
- pgdata:/var/lib/postgresql/data
deploy:
resources:
limits:
cpus: '2.0'
memory: 2G
placement:
constraints:
- node.labels.storage == ssd
redis:
image: redis:7-alpine
restart: unless-stopped
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
volumes:
- redisdata:/data
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
secrets:
db_password:
file: ./secrets/db_password.txt
jwt_secret:
external: true
volumes:
pgdata:
driver: local
redisdata:
driver: local
networks:
default:
driver: bridge
ipam:
config:
- subnet: 172.28.0.0/16
Quick Start Guide
- Initialize the manifest structure: Create
docker-compose.yml for development and docker-compose.prod.yml for production overrides. Copy the template above and replace registry/image references with your artifacts.
- Generate secrets: Create a
./secrets/ directory. Store sensitive values as plain text files (e.g., db_password.txt). Set file permissions to 600. Mark external secrets as external: true if managed by a vault.
- Validate configuration: Run
docker compose -f docker-compose.yml -f docker-compose.prod.yml config to merge and validate the manifest. Fix any syntax or reference errors before deployment.
- Deploy with resource isolation: Execute
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --pull always. Verify containers are running with docker compose ps and confirm health status with docker compose ps --format json | jq '.[].Health'.
- Hook observability & backups: Configure log forwarding to your monitoring stack. Schedule a cron job or sidecar container to dump database volumes to immutable storage. Test restoration procedures quarterly.