rest, and prevents concurrent writes that corrupt state.
# backend.tf
terraform {
required_version = ">= 1.5.0"
backend "s3" {
bucket = "my-company-terraform-state"
key = "prod/networking/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
2. Modular Design
Monolithic state files create bottlenecks. As the state grows, terraform plan and apply times degrade, and lock contention increases. Decompose infrastructure into logical modules.
Rationale: Modules enforce encapsulation. A vpc module should not care about ec2 instances. This allows teams to version modules independently and reuse patterns across environments.
# modules/vpc/main.tf
resource "aws_vpc" "this" {
cidr_block = var.cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(var.tags, {
Name = "${var.environment}-vpc"
})
}
output "vpc_id" {
value = aws_vpc.this.id
}
# modules/vpc/variables.tf
variable "cidr" {
type = string
description = "CIDR block for the VPC"
}
variable "environment" {
type = string
description = "Deployment environment"
}
variable "tags" {
type = map(string)
default = {}
description = "Tags to apply to resources"
}
3. Variable Management and Secrets
Never hardcode secrets. Use variable files for environment-specific configuration and integrate with secret managers for sensitive data.
Implementation: Use .tfvars files excluded from version control for local development. In CI/CD, inject variables via environment variables or secret managers.
# variables.tf
variable "db_password" {
type = string
sensitive = true
description = "Database master password"
}
resource "aws_db_instance" "main" {
allocated_storage = 20
engine = "mysql"
instance_class = "db.t3.micro"
username = "admin"
password = var.db_password
skip_final_snapshot = true
}
4. CI/CD Integration
Terraform operations must be automated. The pipeline should run terraform init, terraform fmt, terraform validate, terraform plan, and terraform apply (on merge to main).
Pipeline Logic:
- Plan Stage: Generates the execution plan. Comment the plan back to the Pull Request for review.
- Apply Stage: Triggered only on merge. Requires approval for production environments.
- Drift Detection: Scheduled job running
terraform plan to detect manual changes.
5. State Import Strategy
Legacy resources must be imported into Terraform control. Use terraform import or the newer import blocks to bring existing infrastructure under management without recreation.
# import.tf
import {
to = aws_instance.existing_web_server
id = "i-0123456789abcdef0"
}
Pitfall Guide
1. Local State Storage
Mistake: Keeping terraform.tfstate in the repository or on local disks.
Impact: Team members overwrite each other's changes. Secrets in state are exposed. No state locking leads to corruption.
Fix: Always configure a remote backend with locking immediately after initialization.
2. Monolithic State Files
Mistake: Defining all resources (network, compute, database, IAM) in a single state file.
Impact: terraform plan takes minutes. Locking prevents parallel deployments. A small change to a tag requires scanning the entire graph.
Fix: Split state by logical component (e.g., network, app, data) using separate backend configurations or directory structures.
3. Hardcoding Secrets
Mistake: Embedding API keys or passwords directly in .tf files.
Impact: Secrets are committed to git history. Audit trails are compromised. Rotation requires code changes.
Fix: Use sensitive = true on variables. Integrate with AWS Secrets Manager, HashiCorp Vault, or GitHub Secrets. Never commit .tfvars containing secrets.
4. Ignoring lifecycle Rules
Mistake: Creating resources without prevent_destroy or create_before_destroy where appropriate.
Impact: terraform apply deletes production databases or load balancers during refactoring.
Fix: Apply lifecycle { prevent_destroy = true } to critical stateful resources. Use create_before_destroy for resources that cannot tolerate downtime.
resource "aws_db_instance" "production" {
# ... config ...
lifecycle {
prevent_destroy = true
}
}
5. The depends_on Trap
Mistake: Overusing explicit depends_on to force ordering.
Impact: Terraform's graph is disrupted. Parallelism is lost. Plans become fragile and slow.
Fix: Rely on implicit dependencies via references. Use depends_on only for external dependencies not visible to Terraform (e.g., provisioners accessing DNS that isn't managed by Terraform).
Mistake: Manually tweaking security groups or scaling instances via console.
Impact: Drift occurs. Next apply may revert manual changes or fail due to state mismatch.
Fix: Enforce "IaC Only" policies. Use SCPs (Service Control Policies) to deny console changes to managed resources. Run drift detection regularly.
7. Lack of State Versioning
Mistake: Backend configuration does not support versioning.
Impact: If state is corrupted, there is no rollback path.
Fix: Enable versioning on S3 buckets. Use Terraform Cloud/Enterprise which provides built-in state history and rollback.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Solo Developer / PoC | Local State + Git | Simplicity; no backend overhead. | Free |
| Small Team (2-5 devs) | Remote S3 + DynamoDB | Collaboration, locking, versioning at low cost. | Low (~$1-2/mo) |
| Enterprise / Compliance | Terraform Cloud/Enterprise | SSO, audit logs, policy as code, managed state. | High (Subscription) |
| Multi-Region Deployment | Directory Structure per Region | Isolation of state; parallel execution; blast radius containment. | Low |
| High Churn Resources | Ephemeral State / CI/CD Only | State managed only in pipeline; no local state risk. | Medium (Pipeline compute) |
Configuration Template
This template provides a production-ready structure for an AWS environment with remote state, module usage, and variable handling.
# main.tf
terraform {
required_version = ">= 1.5.0"
backend "s3" {
bucket = "acme-corp-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.30"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = var.environment
Team = "platform-eng"
}
}
}
# Network Module
module "vpc" {
source = "./modules/vpc"
environment = var.environment
cidr = var.vpc_cidr
tags = local.common_tags
}
# Application Module
module "app" {
source = "./modules/app"
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
db_password = var.db_password
}
# Variables
variable "aws_region" {
type = string
default = "us-east-1"
}
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "vpc_cidr" {
type = string
default = "10.0.0.0/16"
}
variable "db_password" {
type = string
sensitive = true
}
# Locals
locals {
common_tags = {
Project = "AcmePlatform"
}
}
Quick Start Guide
- Install Terraform: Download the binary from HashiCorp or use a package manager (
brew install terraform, choco install terraform). Verify with terraform -version.
- Initialize Project: Create a directory and run
terraform init. This downloads providers and configures the backend.
- Write Configuration: Create
main.tf with provider and resource blocks. Use the template above as a baseline.
- Plan Execution: Run
terraform plan. Review the output carefully. Ensure no unexpected deletions or modifications.
- Apply Changes: Run
terraform apply. Confirm the execution. Terraform will create resources and update the state file.
- Verify and Destroy: Check resources in the cloud console. When done, run
terraform destroy to tear down resources and avoid costs.
Note: For production, never run apply locally. Use the CI/CD pipeline defined in the Core Solution to ensure auditability and state safety.