β βββ outputs.tf
β βββ compute/
β βββ main.tf
β βββ variables.tf
βββ environments/
β βββ dev/
β β βββ main.tf
β β βββ backend.tf
β β βββ terraform.tfvars
β βββ prod/
β βββ main.tf
β βββ backend.tf
β βββ terraform.tfvars
βββ .gitignore
**Rationale:** This structure enforces separation of concerns. Modules define *how* resources are built; environments define *what* resources are built. This allows modules to be versioned and tested independently, reducing duplication and ensuring consistency across environments.
#### 2. State Management with Remote Backend
Local state files are prohibited in production. Use a remote backend with locking capabilities. For AWS, S3 with DynamoDB locking is the standard pattern.
**`environments/prod/backend.tf`:**
```hcl
terraform {
backend "s3" {
bucket = "my-org-terraform-state-prod"
key = "networking/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
Rationale:
- S3 Bucket: Provides durable storage and versioning for state files, enabling point-in-time recovery.
- DynamoDB Table: Implements state locking to prevent race conditions during concurrent operations.
- Key Path: The key includes the component name (
networking), enabling state sharding. Sharding isolates failures; a lock in the networking state does not block compute deployments.
3. Module Implementation Example
Modules should be idempotent and accept variables for all configurable parameters.
modules/networking/main.tf:
resource "aws_vpc" "this" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(var.tags, {
Name = "${var.environment}-vpc"
})
}
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.this.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
tags = merge(var.tags, {
Name = "${var.environment}-public-subnet-${count.index}"
})
}
modules/networking/variables.tf:
variable "vpc_cidr" {
type = string
description = "CIDR block for the VPC"
}
variable "public_subnet_cidrs" {
type = list(string)
description = "List of CIDR blocks for public subnets"
}
variable "availability_zones" {
type = list(string)
description = "List of availability zones"
}
variable "environment" {
type = string
description = "Deployment environment"
}
variable "tags" {
type = map(string)
default = {}
description = "Common tags for resources"
}
Rationale: Explicit variable definitions with types and descriptions improve module usability and validation. Merging tags ensures consistent resource tagging for cost allocation and governance.
4. Environment Configuration
Environments consume modules and pass specific values.
environments/prod/main.tf:
module "networking" {
source = "../../modules/networking"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24"]
availability_zones = ["us-east-1a", "us-east-1b"]
environment = "prod"
tags = {
Team = "Platform"
ManagedBy = "Terraform"
}
}
Rationale: Environment files act as the single source of truth for configuration values. This separation allows the same module to be deployed across multiple accounts or regions with minimal code changes.
5. CI/CD Integration Strategy
Automate terraform plan and terraform apply via CI/CD pipelines.
- Plan Stage: Run on every pull request. Post the plan output as a comment. Block merge if the plan contains destructive changes (
-/+) without approval.
- Apply Stage: Trigger only on merge to the main branch. Use environment secrets for backend credentials. Implement approval gates for production.
Rationale: Automation eliminates human error, enforces review processes, and ensures that the state always reflects the code in the repository.
Pitfall Guide
Production Terraform usage is fraught with anti-patterns. The following pitfalls and best practices are derived from extensive production experience.
-
Storing State Locally
- Mistake: Keeping
terraform.tfstate in the repository or on a local disk.
- Impact: State is lost if the machine fails; no locking leads to corruption; secrets in state are exposed.
- Best Practice: Always use a remote backend with encryption and locking. Add
terraform.tfstate and *.tfvars to .gitignore.
-
Hardcoding Secrets in HCL
- Mistake: Defining passwords or API keys directly in variable defaults or resource attributes.
- Impact: Secrets are committed to version control and exposed in state files.
- Best Practice: Inject secrets via CI/CD environment variables or use a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) with data sources. Never store secrets in state without encryption at rest.
-
Monolithic State Files
- Mistake: Defining all resources in a single state file.
- Impact: Large state files slow down operations; a lock on one resource blocks all changes; a corruption event affects the entire infrastructure.
- Best Practice: Shard state by component or environment. Use separate backend configurations for networking, compute, databases, etc.
-
Ignoring lifecycle Rules
- Mistake: Not configuring
create_before_destroy or prevent_destroy for critical resources.
- Impact: Updates to critical resources (e.g., databases, load balancers) cause downtime or accidental deletion.
- Best Practice: Use
lifecycle { create_before_destroy = true } for stateful resources that require zero-downtime updates. Use prevent_destroy = true for production databases and state storage.
-
Misusing count vs. for_each
- Mistake: Using
count for lists of resources where order matters or when items are removed from the middle of the list.
- Impact: Removing an item from the middle of a
count list forces Terraform to recreate all subsequent resources because indices shift.
- Best Practice: Prefer
for_each with maps or sets. for_each tracks resources by key, so removing an item only destroys that specific resource, leaving others intact.
-
Over-Complicating Modules (God Modules)
- Mistake: Creating a single module that provisions a VPC, EC2 instances, RDS, and IAM roles with dozens of optional variables.
- Impact: Modules become difficult to maintain, test, and reuse. High coupling reduces flexibility.
- Best Practice: Keep modules focused on a single domain. Compose small modules in the root configuration. Limit module inputs to essential parameters.
-
Skipping terraform plan Review
- Mistake: Blindly running
terraform apply without reviewing the execution plan.
- Impact: Unintended resource deletions or modifications due to subtle configuration changes.
- Best Practice: Always review the plan output. Automate plan reviews in CI/CD. Train teams to understand diff indicators (
+, -, ~, -/+).
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small Team / Single Project | Single Remote State + CI/CD | Simplicity outweighs sharding benefits; reduces operational overhead. | Low |
| Multi-Environment / Compliance | Sharded State + Policy as Code | Isolation prevents cross-env drift; policy ensures compliance at scale. | Medium |
| Large Org / Multi-Account | Terraform Cloud/Enterprise + Workspaces | Centralized governance, audit trails, and cost estimation justify licensing. | High |
| High Churn / Frequent Updates | for_each + Immutable Patterns | Reduces resource recreation; improves deployment speed and reliability. | Low |
| Legacy Manual Infra | terraform import + State Migration | Brings existing resources under IaC control without recreation. | Low |
Configuration Template
backend.hcl (Remote Backend Config):
bucket = "terraform-state-${var.aws_account_id}-${var.environment}"
key = "infrastructure/${var.component}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks-${var.aws_account_id}"
main.tf (Production Root Structure):
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {}
}
provider "aws" {
region = var.region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
Team = "Platform"
}
}
}
module "networking" {
source = "git::https://github.com/org/terraform-modules//networking?ref=v1.2.0"
vpc_cidr = var.vpc_cidr
environment = var.environment
}
module "compute" {
source = "git::https://github.com/org/terraform-modules//compute?ref=v1.2.0"
vpc_id = module.networking.vpc_id
subnet_ids = module.networking.private_subnet_ids
environment = var.environment
}
variables.tf (Environment Inputs):
variable "environment" {
type = string
description = "Deployment environment (dev, staging, prod)"
}
variable "region" {
type = string
description = "AWS region"
default = "us-east-1"
}
variable "vpc_cidr" {
type = string
description = "CIDR block for the VPC"
}
Quick Start Guide
-
Initialize Project:
mkdir my-infra && cd my-infra
terraform init
Creates the .terraform directory and downloads providers.
-
Define Resources:
Create main.tf with your resource definitions or module calls. Ensure variables are defined in variables.tf.
-
Configure Backend:
Create backend.hcl with your remote state configuration and run:
terraform init -backend-config=backend.hcl
-
Validate and Plan:
terraform validate
terraform plan -out=tfplan
Review the plan output carefully for any unexpected changes.
-
Apply Configuration:
terraform apply tfplan
Executes the changes and updates the remote state file.
Codcompass Technical Note: This article assumes familiarity with cloud provider concepts. For teams new to Terraform, prioritize state management and CI/CD automation over advanced module patterns to establish a stable foundation.