own dependencies, variables, templates, and handlers. This enables cross-environment reuse, independent testing via Molecule, and granular code review boundaries.
Step 2: Idempotent State Management
Ansible's core value is state reconciliation, not command execution. Every task must evaluate current system state before applying changes.
Non-Idempotent (Anti-Pattern):
- name: Install nginx
command: apt-get install nginx -y
Idempotent (Pattern-Compliant):
- name: Ensure nginx is installed
apt:
name: nginx
state: present
update_cache: yes
notify: Restart nginx
- name: Configure nginx upstream
template:
src: upstream.conf.j2
dest: /etc/nginx/conf.d/upstream.conf
owner: root
group: root
mode: '0644'
notify: Validate and reload nginx
handlers:
- name: Restart nginx
service:
name: nginx
state: restarted
- name: Validate and reload nginx
command: nginx -t
changed_when: false
notify: Reload nginx
- name: Reload nginx
service:
name: nginx
state: reloaded
Architecture Rationale: Handlers execute only when notified by changed tasks, preventing unnecessary service restarts. The changed_when: false directive on validation ensures idempotency isn't broken by diagnostic commands. This pattern guarantees safe re-runs and predictable drift correction.
Step 3: Variable Scoping & Secret Management
Variable precedence in Ansible is deterministic but easily mismanaged. Enforce strict scoping boundaries and integrate Ansible Vault for credential isolation.
Variable Hierarchy Enforcement:
# roles/base_os/defaults/main.yml
base_os_packages:
- curl
- wget
- unzip
- jq
# inventory/production/group_vars/all.yml
base_os_timezone: UTC
base_os_ssh_port: 22
# inventory/production/host_vars/web-01.yml
base_os_custom_kernel_params: "net.core.somaxconn=1024"
Vault Integration Pattern:
# Encrypt secrets
ansible-vault encrypt_string 'SuperSecretDBPass' --name 'db_admin_password'
# Usage in playbook
vars_files:
- vault/credentials.yml
- name: Configure application database
template:
src: database.yml.j2
dest: /opt/app/config/database.yml
mode: '0600'
Architecture Rationale: defaults provide safe fallbacks. group_vars handle environment-wide configuration. host_vars override for node-specific tuning. Vault isolates secrets from version control without requiring external secret managers initially. This scoping prevents variable collision and enables safe configuration promotion across environments.
Step 4: Testing & Validation Pipeline
Unvalidated automation is technical debt. Implement a multi-layer testing strategy using ansible-lint, yamllint, and molecule.
Molecule Configuration (molecule/default/molecule.yml):
driver:
name: docker
platforms:
- name: ubuntu-2204
image: ubuntu:22.04
pre_build_image: true
provisioner:
name: ansible
playbooks:
converge: ${MOLECULE_PROJECT_DIRECTORY}/../../playbooks/role_converge.yml
verifier:
name: ansible
lint:
name: yamllint
directories:
- tests
Validation Test (molecule/default/tests/test_default.yml):
- name: Verify service state
hosts: all
tasks:
- name: Check nginx is running
service:
name: nginx
state: running
register: svc_status
- name: Assert service is enabled
assert:
that:
- svc_status.status.ActiveState == "active"
- svc_status.status.UnitFileState == "enabled"
Architecture Rationale: Molecule spins up isolated containers per role, runs convergence, and validates state. This catches idempotency breaks, dependency gaps, and template rendering failures before promotion. Integration with CI ensures every PR passes structural and functional validation.
Step 5: CI/CD Integration & Artifact Management
Automation patterns require delivery pipelines. Treat infrastructure code as first-class software artifacts.
GitHub Actions Workflow Snippet:
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: pip install ansible ansible-lint molecule molecule-plugins[docker] yamllint
- name: Lint codebase
run: |
ansible-lint playbooks/ roles/
yamllint -d relaxed .
- name: Run molecule tests
run: molecule test --scenario-name default
env:
ANSIBLE_VAULT_PASSWORD_FILE: .vault_pass
- name: Archive test results
uses: actions/upload-artifact@v4
with:
name: molecule-report
path: molecule/*/tests/
Architecture Rationale: Pipeline enforcement removes human inconsistency. Linting catches syntax and style violations. Molecule validates role isolation. Artifact archiving enables audit trails. This transforms Ansible from a local utility into a governed delivery mechanism.
Pitfall Guide
1. Treating Ansible as a Remote Shell
Mistake: Overusing command or shell modules to bypass native resource modules.
Impact: Breaks idempotency, prevents state reconciliation, and creates untestable logic.
Best Practice: Always prefer native modules (apt, yum, service, template, lineinfile). If shell is unavoidable, wrap it with creates or removes flags to enforce idempotency.
2. Ignoring Handler Execution Order
Mistake: Chaining handlers without explicit notification dependencies, causing services to restart before configuration files are written.
Impact: Intermittent deployment failures and service downtime.
Best Practice: Use meta: flush_handlers strategically, or restructure roles to separate configuration writes from service restarts. Document handler dependencies in role metadata.
3. Hardcoding Credentials or Bypassing Vault
Mistake: Embedding passwords, API keys, or certificates directly in playbooks or group variables.
Impact: Credential leakage in version control, failed compliance audits, and manual rotation overhead.
Best Practice: Enforce ansible-vault for all sensitive data. Integrate with external secret managers (HashiCorp Vault, AWS Secrets Manager) via lookup plugins for dynamic credential injection.
4. Monolithic Playbooks Without Role Boundaries
Mistake: Writing a single site.yml containing hundreds of tasks across multiple system layers.
Impact: Unmaintainable code, impossible parallel development, and failed code reviews.
Best Practice: Decompose by system boundary (OS, runtime, application, monitoring). Enforce role dependencies via meta/main.yml. Require pull requests to touch only relevant role directories.
5. Misunderstanding Variable Precedence
Mistake: Defining the same variable across defaults, vars, group_vars, and host_vars without understanding override hierarchy.
Impact: Silent configuration drift and environment-specific failures.
Best Practice: Document variable sources in README.md. Use ansible-config dump --only-changed to audit active precedence. Prefer vars_files for complex data structures over inline vars.
6. Skipping Linting and Testing in CI
Mistake: Relying on manual ansible-playbook --check runs or skipping validation entirely.
Impact: Syntax errors, deprecated module usage, and idempotency breaks reaching production.
Best Practice: Block merges on ansible-lint and yamllint failures. Run molecule convergence tests on every PR. Treat infrastructure tests with the same rigor as application unit tests.
7. Assuming Idempotency Equals Safety
Mistake: Believing that re-running a playbook will always correct drift without side effects.
Impact: Resource exhaustion, database connection spikes, and race conditions during mass re-convergence.
Best Practice: Implement rate limiting for mass operations. Use throttle and serial directives for rolling updates. Add idempotency guards for external API calls and database migrations.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup / Single Environment | Monolithic playbook with Vault + Linting | Speed of delivery outweighs architectural overhead; Vault prevents credential leakage | Low setup cost; moderate long-term maintenance |
| Mid-Market / Multi-Cloud | Role-based architecture + Molecule + CI linting | Cross-environment consistency requires isolation; testing prevents cloud-specific drift | Moderate setup cost; high ROI via reduced MTTR |
| Enterprise / Compliance-Heavy | Full pattern stack + External secret manager + Audit trails | Regulatory requirements demand versioned state, secret rotation, and immutable change records | High initial investment; eliminates compliance audit failures |
| Immutable Infrastructure | Ansible for golden image baking + Terraform for provisioning | Ansible excels at OS/package state; Terraform handles resource lifecycle cleanly | Optimized toolchain; reduces configuration drift to near zero |
Configuration Template
# ansible.cfg
[defaults]
inventory = ./inventory/production/hosts.yml
roles_path = ./roles
vault_password_file = .vault_pass
retry_files_enabled = False
forks = 20
timeout = 30
log_path = ./ansible.log
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[diff]
always = True
context = 3
# roles/nginx_proxy/meta/main.yml
dependencies:
- role: base_os
vars:
base_os_packages:
- nginx
- certbot
galaxy_info:
author: infrastructure-team
description: Nginx reverse proxy with TLS termination
min_ansible_version: "2.14"
platforms:
- name: Ubuntu
versions:
- focal
- jammy
Quick Start Guide
- Initialize Project Structure: Run
mkdir -p roles playbooks inventory/production/group_vars tests/molecule && touch ansible.cfg .vault_pass .pre-commit-config.yaml
- Install Toolchain: Execute
pip install ansible ansible-lint molecule molecule-plugins[docker] yamllint pre-commit && pre-commit install
- Create Base Role: Scaffold
roles/base_os/ with tasks/main.yml, defaults/main.yml, and meta/main.yml. Add a single idempotent package installation task.
- Validate Locally: Run
ansible-lint roles/base_os/ && molecule test --scenario-name default to verify lint compliance and container convergence.
- Deploy to Target: Execute
ansible-playbook playbooks/site.yml --check for dry-run validation, then ansible-playbook playbooks/site.yml for state reconciliation.