Terraform Production Guide

Overview

Terraform is an Infrastructure as Code (IaC) tool created by HashiCorp. It lets you define cloud and on-premise infrastructure in declarative configuration files written in HCL (HashiCorp Configuration Language), then provisions and manages that infrastructure through provider APIs. Terraform is the most widely adopted IaC tool in the industry, with support for hundreds of cloud services, SaaS platforms, and on-premise systems through its provider ecosystem.

Declarative vs imperative

Terraform uses a declarative approach: you describe the desired end state of your infrastructure, and Terraform figures out the sequence of API calls needed to reach that state. This is fundamentally different from imperative scripting (Bash, AWS CLI scripts) where you write step-by-step instructions for what to do. Ansible occupies a middle ground — individual modules are often declarative, but playbooks are executed procedurally in order. The declarative model means Terraform can determine what has changed, what needs to be created, updated, or destroyed, and in what order — all automatically.

The BSL license change

In August 2023, HashiCorp changed Terraform's license from the Mozilla Public License 2.0 (MPL-2.0) to the Business Source License 1.1 (BSL 1.1, also called BUSL). This means Terraform is no longer open source by the OSI definition. The BSL prohibits using Terraform to build competing commercial products. For most end users deploying their own infrastructure, this change has no practical impact. For vendors building Terraform-based SaaS products, it matters significantly. This license change directly led to the creation of OpenTofu, a community fork under the Linux Foundation.

Strengths

Multi-cloud — single tool for AWS, Azure, GCP, Kubernetes, and hundreds more
Massive ecosystem — 3,000+ providers in the public registry
Dependency graph — automatic ordering of resource creation and destruction
Plan before apply — preview changes before making them
State tracking — knows what exists and what needs to change
Module system — reusable, composable infrastructure components
Mature tooling — IDE support, linters, testing frameworks, CI integrations

Considerations

State management complexity — state file must be stored securely and locked properly
BSL license — no longer open source; consider OpenTofu for OSS purity
No rollback — Terraform does not natively support rolling back to a previous state
Drift detection is manual — you must run terraform plan to detect drift
Learning curve — HCL, state, modules, and workspaces take time to master
Secrets in state — state file contains sensitive values in plaintext

Positioning

Terraform is the default choice for multi-cloud infrastructure provisioning. If a client is not locked into a single cloud provider and needs to manage infrastructure as code, Terraform (or OpenTofu) is the answer. For AWS-only shops, CloudFormation is a viable alternative that avoids state management overhead. For teams that prefer general-purpose languages over HCL, Pulumi and AWS CDK are worth considering.

How Terraform Works

Terraform follows a simple but powerful lifecycle: write configuration, init the working directory, plan the changes, and apply them. Understanding this lifecycle is essential for using Terraform effectively.

The four core commands

Command	Purpose	What happens
`terraform init`	Initialize	Downloads provider plugins, initializes the backend (where state is stored), downloads modules referenced in configuration. Must be run once per working directory (and again when providers or backend change).
`terraform plan`	Preview	Reads the current state, compares it to the desired configuration, and produces an execution plan showing what will be created, modified, or destroyed. No changes are made. This is the safety net.
`terraform apply`	Execute	Runs the plan and executes the changes by calling provider APIs. Creates, updates, or deletes resources. Updates the state file with the new reality. Prompts for confirmation unless `-auto-approve` is used.
`terraform destroy`	Tear down	Destroys all resources managed by the current configuration. Equivalent to `terraform apply -destroy`. Prompts for confirmation. Use with extreme caution in production.

Provider plugins

Providers are the bridge between Terraform and the APIs of infrastructure platforms. When you run terraform init, Terraform downloads the provider binaries specified in your configuration from the Terraform Registry (or a mirror). Each provider implements CRUD operations for the resources it manages. For example, the aws provider knows how to create an EC2 instance by calling the AWS API, and the kubernetes provider knows how to create a Deployment by talking to the Kubernetes API server.

Dependency graph

Terraform builds a directed acyclic graph (DAG) of all resources and their dependencies. If resource B references an attribute of resource A, Terraform knows it must create A before B. This graph is also used to determine what can be created in parallel. You can visualize the graph with terraform graph | dot -Tpng > graph.png.

Resource addressing

Every resource in Terraform has a unique address: resource_type.resource_name. For example, aws_instance.web or google_compute_network.vpc. When using modules, the address includes the module path: module.networking.aws_vpc.main. When using count or for_each, an index is appended: aws_instance.web[0] or aws_instance.web["us-east-1"]. These addresses are how Terraform tracks resources in state and how you reference them in CLI commands like terraform state mv.

The reconciliation loop

Terraform's core algorithm is a reconciliation loop:

Read state — load the current state file to understand what resources Terraform believes exist
Refresh — optionally query the real infrastructure (provider APIs) to update state with any out-of-band changes
Diff — compare the desired configuration (HCL) against the refreshed state to determine what actions are needed
Plan — produce an ordered list of create, update, and delete actions
Apply — execute the actions in dependency order, updating state after each successful operation

Key insight

Terraform is not a continuous reconciliation controller like Kubernetes. It only checks state when you run plan or apply. Between runs, infrastructure can drift (someone manually changes a security group in the AWS console, for example). Terraform will only detect and correct this drift the next time you run it. This is why scheduled terraform plan runs in CI/CD are important for drift detection.

State Management

The state file is the single most important concept in Terraform after the configuration itself. It is a JSON file that records the mapping between your Terraform resources and the real-world objects they represent. Without state, Terraform cannot know what it has previously created and would try to create everything from scratch on every run.

Why state matters

Maps config to reality — aws_instance.web in config maps to i-0abc123def456 in AWS
Tracks metadata — dependencies, resource ordering, provider information
Performance — caches resource attributes so Terraform does not need to query every resource on every plan
Enables collaboration — when stored remotely, multiple team members can work on the same infrastructure

Local vs remote state

Local state

Stored in terraform.tfstate in the working directory. Fine for learning and personal projects. Dangerous for teams because there is no locking, no shared access, and the file can be accidentally deleted or committed to Git (exposing secrets).

Remote state

Stored in a shared backend. Enables team collaboration, state locking (prevents concurrent modifications), encryption at rest, and versioning. This is the only acceptable option for production.

State backends

Backend	Locking	Encryption	Notes
S3	Yes (native S3 or DynamoDB)	SSE-S3/KMS	The most common backend for AWS shops. Since Terraform 1.10, S3 supports native state locking via `use_lockfile = true`, eliminating the need for a DynamoDB table. DynamoDB-based locking is deprecated. Enable versioning on the S3 bucket.
GCS	Yes (native)	Google-managed	Google Cloud Storage. Built-in locking. Simple to configure for GCP-centric teams.
Azure Blob	Yes (blob lease)	Azure-managed	Azure Storage Account with blob leasing for locks. Standard for Azure shops.
HCP Terraform	Yes	HashiCorp-managed	Formerly Terraform Cloud (renamed April 2024). Free tier for up to 500 managed resources. Includes remote execution, policy checks, VCS integration. Lock-in to HashiCorp ecosystem.
pg (PostgreSQL)	Yes (advisory locks)	Depends on setup	Stores state in a PostgreSQL database. Useful for on-premise environments without cloud object storage.
consul	Yes	Depends on setup	HashiCorp Consul KV store. Less common now that HCP Terraform exists.

Backend configuration example

terraform {
  backend "s3" {
    bucket       = "mycompany-terraform-state"
    key          = "prod/networking/terraform.tfstate"
    region       = "us-east-1"
    use_lockfile = true   # Native S3 locking (Terraform 1.10+)
    encrypt      = true
  }
}

State locking

State locking prevents two people (or two CI pipelines) from running terraform apply at the same time on the same state. Without locking, concurrent applies can corrupt the state file or create conflicting infrastructure. Most remote backends support locking natively. If a lock is stuck (e.g., a pipeline crashed), you can force-unlock with terraform force-unlock LOCK_ID — but only after confirming no other operation is running.

terraform state commands

Command	Purpose
`terraform state list`	List all resources in state
`terraform state show <addr>`	Show attributes of a specific resource
`terraform state mv <src> <dst>`	Move/rename a resource in state. Prefer the declarative `moved {}` block in HCL (Terraform 1.1+) for versioned, reviewable refactors
`terraform state rm <addr>`	Remove a resource from state without destroying it
`terraform state pull`	Download remote state to stdout
`terraform state push`	Upload a local state file to the remote backend
`terraform import <addr> <id>`	Import an existing resource into state. Prefer the declarative `import {}` block in HCL (Terraform 1.5+), which can also auto-generate configuration with `-generate-config-out`

Warning

Never manually edit the state file. It is a JSON file and technically editable, but manual edits are the #1 cause of state corruption. Use terraform state commands instead. If you must edit state (e.g., to recover from corruption), always back up the file first, and understand that one wrong edit can cause Terraform to destroy and recreate resources.

HCL Language

HCL (HashiCorp Configuration Language) is a domain-specific language designed for defining infrastructure. It is intentionally not a general-purpose programming language — it has no loops in the traditional sense, no classes, no exception handling. This is by design: it forces configurations to be declarative and readable.

Resources

Resources are the most important element. Each resource block declares a piece of infrastructure:

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.public.id

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}

Data sources

Data sources let you query existing infrastructure that Terraform does not manage:

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
}

Variables

# Input variable (variables.tf)
variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "dev"
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "instance_count" {
  description = "Number of instances to create"
  type        = number
  default     = 2
}

# Output value (outputs.tf)
output "instance_ips" {
  description = "Public IPs of all instances"
  value       = aws_instance.web[*].public_ip
}

# Local value
locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Project     = "web-platform"
  }
}

for_each and count

# count - create N identical resources
resource "aws_instance" "web" {
  count         = var.instance_count
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  tags          = { Name = "web-${count.index}" }
}

# for_each - create resources from a map or set
variable "subnets" {
  default = {
    "public-a"  = { cidr = "10.0.1.0/24", az = "us-east-1a" }
    "public-b"  = { cidr = "10.0.2.0/24", az = "us-east-1b" }
    "private-a" = { cidr = "10.0.3.0/24", az = "us-east-1a" }
  }
}

resource "aws_subnet" "this" {
  for_each          = var.subnets
  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr
  availability_zone = each.value.az
  tags              = { Name = each.key }
}

Dynamic blocks

variable "ingress_rules" {
  default = [
    { port = 80,  cidr = "0.0.0.0/0" },
    { port = 443, cidr = "0.0.0.0/0" },
    { port = 22,  cidr = "10.0.0.0/8" },
  ]
}

resource "aws_security_group" "web" {
  name   = "web-sg"
  vpc_id = aws_vpc.main.id

  dynamic "ingress" {
    for_each = var.ingress_rules
    content {
      from_port   = ingress.value.port
      to_port     = ingress.value.port
      protocol    = "tcp"
      cidr_blocks = [ingress.value.cidr]
    }
  }
}

Key functions

Function	Example	Purpose
`lookup`	`lookup(var.amis, var.region)`	Map lookup with optional default
`merge`	`merge(local.common_tags, { Name = "web" })`	Merge maps
`join`	`join(",", var.subnets)`	Join list to string
`format`	`format("arn:aws:s3:::%s/*", var.bucket)`	String formatting
`cidrsubnet`	`cidrsubnet("10.0.0.0/16", 8, 1)`	Calculate subnet CIDRs
`templatefile`	`templatefile("init.sh", { env = var.env })`	Render template file
`try`	`try(var.config.setting, "default")`	Try expressions with fallback
`flatten`	`flatten([var.list_a, var.list_b])`	Flatten nested lists

Tip

Prefer for_each over count for most use cases. With count, removing an item from the middle of a list causes all subsequent resources to be destroyed and recreated (because their index changes). With for_each, resources are keyed by map key or set value, so removing one item only affects that specific resource.

Modules

Modules are the primary mechanism for code reuse in Terraform. A module is simply a directory containing .tf files. Every Terraform configuration is a module — the top-level directory is the root module, and any modules it calls are child modules.

Module structure

modules/
  vpc/
    main.tf          # Resource definitions
    variables.tf     # Input variables
    outputs.tf       # Output values
    versions.tf      # Provider and Terraform version constraints
    README.md        # Documentation

Calling a module

# From a local path
module "vpc" {
  source = "./modules/vpc"

  vpc_cidr     = "10.0.0.0/16"
  environment  = var.environment
  project_name = var.project_name
}

# From the Terraform Registry
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "my-vpc"
  cidr = "10.0.0.0/16"
  azs  = ["us-east-1a", "us-east-1b", "us-east-1c"]

  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true
}

# From a Git repository
module "vpc" {
  source = "git::https://github.com/myorg/terraform-modules.git//vpc?ref=v2.1.0"
}

# Referencing module outputs
resource "aws_instance" "web" {
  subnet_id = module.vpc.public_subnet_ids[0]
}

Module sources

Source	Syntax	Best for
Local path	`./modules/vpc`	Modules within the same repository
Terraform Registry	`hashicorp/consul/aws`	Community modules with versioning
GitHub	`github.com/org/repo//subdir`	Private organizational modules
Git (generic)	`git::https://...?ref=v1.0`	Any Git repository
S3 bucket	`s3::https://s3.amazonaws.com/bucket/module.zip`	Air-gapped or private distribution

Module versioning

Always pin module versions in production. Use semantic versioning constraints:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.2.0"       # Exact version (most conservative)
  # version = "~> 5.2"    # Allows 5.2.x but not 5.3.0
  # version = ">= 5.0, < 6.0"  # Range constraint
}

Module composition pattern

A well-structured Terraform project composes multiple modules together in the root module. Each module handles one concern — networking, compute, database, monitoring — and the root module wires them together through input variables and output references:

module "networking" {
  source      = "./modules/networking"
  environment = var.environment
  vpc_cidr    = var.vpc_cidr
}

module "database" {
  source     = "./modules/database"
  subnet_ids = module.networking.private_subnet_ids
  vpc_id     = module.networking.vpc_id
}

module "application" {
  source          = "./modules/application"
  subnet_ids      = module.networking.public_subnet_ids
  db_endpoint     = module.database.endpoint
  db_secret_arn   = module.database.secret_arn
}

Recommendation

Use the public Terraform Registry modules as a starting point, not as a final solution. Registry modules like terraform-aws-modules/vpc/aws are well-tested and cover common patterns, but they expose hundreds of variables you may not need. For organizations with specific standards, fork or wrap registry modules in a thin internal module that enforces your defaults (naming conventions, tagging, encryption settings).

Providers

Providers are plugins that let Terraform interact with specific infrastructure platforms and services. Each provider is a separate binary that implements the Terraform plugin protocol, translating HCL resource definitions into API calls.

Provider configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 3.0"
    }
    proxmox = {
      source  = "bpg/proxmox"
      version = "~> 0.98"
    }
  }
}

provider "aws" {
  region = "us-east-1"
  # Authentication: uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars,
  # shared credentials file, IAM role, or SSO
}

# Multiple provider instances with aliases
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

resource "aws_instance" "west_server" {
  provider      = aws.west
  ami           = "ami-0abcdef1234567890"
  instance_type = "t3.micro"
}

Key providers

Provider	Source	Use case
AWS	`hashicorp/aws`	EC2, S3, RDS, VPC, IAM, Lambda, EKS — the most feature-rich provider
Azure (azurerm)	`hashicorp/azurerm`	VMs, Storage, AKS, Azure AD, networking — requires subscription_id
Google Cloud	`hashicorp/google`	GCE, GKE, Cloud SQL, VPC, IAM — use `google-beta` for preview features
Kubernetes	`hashicorp/kubernetes`	Deployments, Services, ConfigMaps — works with any K8s cluster
Helm	`hashicorp/helm`	Deploy Helm charts via Terraform — useful for bootstrapping clusters
Proxmox	`bpg/proxmox`	VMs and containers on Proxmox VE — popular for homelab and on-prem
vSphere	`hashicorp/vsphere`	VMware vSphere VMs, datastores, networks — enterprise on-prem
Cloudflare	`cloudflare/cloudflare`	DNS records, WAF rules, Workers, tunnels

Authentication patterns

Environment variables

The preferred approach for CI/CD. Set AWS_ACCESS_KEY_ID, GOOGLE_CREDENTIALS, ARM_CLIENT_ID, etc. as environment variables. Keeps secrets out of code.

IAM roles / workload identity

Best for cloud-native execution. EC2 instance profiles, GKE workload identity, or Azure managed identity. No static credentials needed.

Shared credentials file

Uses ~/.aws/credentials or equivalent. Fine for local development. Never use in CI/CD or shared environments.

Hardcoded in config

Never put credentials directly in .tf files. They will end up in version control. Use environment variables or a secrets manager instead.

Provider lock file

Running terraform init generates a .terraform.lock.hcl file that records the exact provider versions and checksums used. Commit this file to version control. It ensures everyone on the team and CI/CD uses the same provider versions, preventing "works on my machine" issues. It is the equivalent of a package-lock.json or go.sum.

Workspaces & Environments

Managing multiple environments (dev, staging, prod) is one of the most common challenges in Terraform. There are two primary patterns: workspace-based and directory-based.

Terraform workspaces

Terraform workspaces allow you to maintain multiple state files from a single configuration directory. Each workspace has its own state, so resources created in the dev workspace are completely independent from those in prod.

# Create and switch workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod

# List workspaces
terraform workspace list

# Use workspace name in configuration
# terraform.workspace returns the current workspace name

locals {
  env_config = {
    dev     = { instance_type = "t3.micro",  count = 1 }
    staging = { instance_type = "t3.small",  count = 2 }
    prod    = { instance_type = "t3.large",  count = 3 }
  }
  config = local.env_config[terraform.workspace]
}

resource "aws_instance" "web" {
  count         = local.config.count
  instance_type = local.config.instance_type
  ami           = data.aws_ami.ubuntu.id
  tags          = { Environment = terraform.workspace }
}

Directory-based pattern

Instead of workspaces, use separate directories for each environment, each with their own backend configuration and variable values:

infrastructure/
  modules/
    vpc/
    compute/
    database/
  environments/
    dev/
      main.tf          # Calls modules with dev settings
      terraform.tfvars # Dev variable values
      backend.tf       # Dev state backend
    staging/
      main.tf
      terraform.tfvars
      backend.tf
    prod/
      main.tf
      terraform.tfvars
      backend.tf

When to use each

Aspect	Workspaces	Directory-based
Config differences	Same config, different variables	Can differ per environment
State isolation	Same backend, different state keys	Completely separate backends
CI/CD complexity	Simpler — one pipeline, switch workspace	More pipelines but clearer separation
Blast radius	Easier to accidentally apply to wrong workspace	Harder to make cross-environment mistakes
Best for	Small teams, identical environments	Larger teams, environments with structural differences

Caution

Most experienced Terraform practitioners prefer the directory-based pattern for production. Workspaces have a foot-gun problem: there is nothing stopping you from being in the dev workspace and thinking you are in prod (or vice versa). The directory-based approach makes the environment explicit in the file path and makes it physically harder to run the wrong command against the wrong environment.

Terraform vs OpenTofu

OpenTofu is a fork of Terraform created in response to HashiCorp's BSL license change in August 2023. It is maintained by the Linux Foundation and is fully open source under the MPL-2.0 license. OpenTofu aims to be a drop-in replacement for Terraform, maintaining compatibility with existing Terraform configurations, providers, and modules.

Why OpenTofu exists

License freedom — MPL-2.0 allows unrestricted use, including building commercial products on top of it
Community governance — decisions made by the community and a steering committee, not a single company
Linux Foundation backing — provides organizational structure, funding, and legitimacy
Vendor neutrality — no single company controls the project's direction

Comparison

Aspect	Terraform	OpenTofu
License	BSL 1.1 (not open source)	MPL-2.0 (open source)
Governance	HashiCorp (single company)	Linux Foundation + steering committee
CLI command	`terraform`	`tofu`
Config language	HCL	HCL (identical syntax)
Provider compatibility	Full registry access	Full registry access (same providers)
State format	JSON state file	Compatible JSON state file
Unique features	HCP Terraform integration	State encryption (native), provider-defined functions, early `-parallelism` improvements
Maturity	10+ years, battle-tested	Forked late 2023, rapidly maturing
Enterprise support	HashiCorp (paid)	Third-party vendors (Spacelift, env0, etc.)

Migration path

Migrating from Terraform to OpenTofu is straightforward for most projects:

Install OpenTofu (tofu binary)
Replace terraform with tofu in your commands
Run tofu init to re-initialize (downloads the same providers)
Run tofu plan to verify no changes are detected
Update CI/CD pipelines to use tofu instead of terraform

State files are compatible in both directions. No state migration is needed.

Consultant guidance

For new projects, recommend OpenTofu if the client values open-source licensing and community governance. For existing Terraform deployments, there is no urgent need to migrate unless the BSL license is a legal concern (e.g., the client is building a competing IaC product). Both tools work identically for day-to-day infrastructure management. The ecosystem (providers, modules, documentation) is shared.

Terraform vs CloudFormation

For AWS-only environments, CloudFormation is the primary alternative to Terraform. The choice between them depends on multi-cloud requirements, team experience, and operational preferences.

Comparison

Aspect	Terraform	CloudFormation
Cloud support	Multi-cloud (AWS, Azure, GCP, +3000 providers)	AWS only
Language	HCL (purpose-built, readable)	JSON or YAML (verbose)
State management	Self-managed (S3 + DynamoDB, etc.)	AWS-managed (no state file to worry about)
Drift detection	Manual (`terraform plan`)	Built-in drift detection in console
Rollback	No native rollback	Automatic rollback on stack failure
Preview changes	`terraform plan`	Change sets
Speed of new features	AWS provider updates within days/weeks	Same-day support for new AWS services
Modularity	Modules (mature, registry)	Nested stacks, StackSets (less flexible)
Cost	Free (BSL) / OpenTofu (free OSS)	Free (AWS service)
Learning curve	HCL + state + providers	YAML/JSON + AWS concepts

When to use which

Choose Terraform when

Multi-cloud or hybrid cloud strategy
Managing non-AWS resources (Kubernetes, Cloudflare, Datadog, etc.)
Team already knows HCL
Need strong module ecosystem for reuse
Want consistent tooling across all infrastructure

Choose CloudFormation when

100% AWS-only and will stay that way
Want zero state management overhead
Need automatic rollback on failures
Using AWS-native features like StackSets for multi-account
Team is more comfortable with YAML than learning HCL

Other alternatives

Pulumi

Uses general-purpose languages (TypeScript, Python, Go, C#) instead of a DSL. Same multi-cloud, state-managed approach as Terraform but with full programming language capabilities — loops, conditionals, unit tests, IDE autocomplete. Growing adoption, especially among teams that dislike DSLs. State managed via Pulumi Cloud or self-hosted backends.

AWS CDK

AWS Cloud Development Kit — write AWS infrastructure in TypeScript, Python, Java, Go, or C#, and it synthesizes to CloudFormation templates. Best of both worlds: programming language ergonomics with CloudFormation's managed state and rollback. AWS-only. CDK for Terraform (CDKTF) brings the same concept to Terraform providers.

Positioning

For most consulting engagements, Terraform is the default recommendation because it works everywhere. If a client is AWS-only, small, and wants minimal operational overhead, CloudFormation is perfectly fine — do not over-engineer with Terraform just because it is trendy. If a client's team is strong in TypeScript or Python and resistant to learning a DSL, consider Pulumi as a serious alternative rather than forcing HCL.

CI/CD Integration

Running Terraform in CI/CD pipelines is the standard operating model for teams. The pattern is simple: plan on merge request, apply on merge to main. This ensures changes are reviewed before being applied and that infrastructure changes follow the same review process as application code.

Standard pipeline pattern

Developer pushes branch | v +------------------+ | MR / PR created | +------------------+ | v +------------------+ | terraform init | | terraform plan | +------------------+ | v +------------------+ | Plan output | | posted as MR | | comment | +------------------+ | v +------------------+ | Review & approve | +------------------+ | v +------------------+ | Merge to main | +------------------+ | v +------------------+ | terraform apply | | -auto-approve | +------------------+

GitHub Actions example

name: Terraform
on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: infrastructure/prod
      - run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure/prod
      - uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan
            \`\`\`
            ${process.env.PLAN}
            \`\`\``;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

  apply:
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: infrastructure/prod
      - run: terraform apply -auto-approve
        working-directory: infrastructure/prod

GitLab CI example

stages:
  - validate
  - plan
  - apply

plan:
  stage: plan
  image: hashicorp/terraform:1.14
  script:
    - cd infrastructure/prod
    - terraform init
    - terraform plan -out=tfplan
  artifacts:
    paths:
      - infrastructure/prod/tfplan
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == "main"'

apply:
  stage: apply
  image: hashicorp/terraform:1.14
  script:
    - cd infrastructure/prod
    - terraform init
    - terraform apply -auto-approve tfplan
  dependencies:
    - plan
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
  when: manual

Atlantis

Atlantis is a self-hosted application that automates Terraform via pull request comments. Instead of building custom CI/CD pipelines, you deploy Atlantis and interact with Terraform through PR comments like atlantis plan and atlantis apply. Atlantis handles state locking, plan output, and apply execution. It is popular with teams that want a lightweight, GitOps-style workflow without the complexity of HCP Terraform.

HCP Terraform (formerly Terraform Cloud)

HashiCorp's hosted platform for running Terraform, renamed from Terraform Cloud to HCP Terraform in April 2024. Provides remote execution, state management, policy enforcement (Sentinel/OPA), VCS integration, a private module registry, and cost estimation. The free tier supports up to 500 managed resources. For teams that want a fully managed Terraform experience and are comfortable with HashiCorp vendor lock-in, HCP Terraform eliminates the need to build your own CI/CD pipeline for Terraform.

Recommendation

For most teams, GitHub Actions or GitLab CI with a simple plan/apply pipeline is sufficient. Atlantis is excellent for teams with many repositories and contributors who want self-service infrastructure changes. HCP Terraform is worth evaluating if the client needs policy enforcement, cost estimation, and does not want to manage CI/CD pipelines for infrastructure. Do not over-engineer the pipeline — the goal is reviewed, auditable infrastructure changes.

Best Practices & Security

.gitignore essentials

Every Terraform repository must have a proper .gitignore:

# Terraform .gitignore
*.tfstate
*.tfstate.*
*.tfvars          # May contain secrets
.terraform/       # Downloaded providers and modules
crash.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# DO commit these:
# .terraform.lock.hcl  (provider lock file)

Secrets management

Do

Use environment variables for provider credentials
Store secrets in AWS Secrets Manager, Vault, or similar
Use sensitive = true on variables containing secrets
Encrypt state at rest (S3 SSE, GCS encryption)
Restrict access to the state bucket/backend
Use IAM roles instead of static credentials

Do not

Commit .tfstate files to Git
Hardcode credentials in .tf files
Share state files over Slack, email, or shared drives
Use the same credentials for dev and prod
Ignore that state files contain secrets in plaintext
Commit .tfvars files with sensitive values

Least-privilege IAM for Terraform

Terraform typically needs broad permissions to create and manage infrastructure, but the permissions should be scoped per environment and per pipeline:

Separate IAM roles per environment — the dev pipeline should not have permissions to touch prod resources
Narrower permissions for plan — the MR pipeline needs read access to cloud APIs plus write access to the state backend (for refresh). Full create/update/delete permissions are only needed for apply
Scope to specific services — if a Terraform project only manages networking, the IAM role should not have EC2 or RDS permissions
Use OIDC federation — GitHub Actions and GitLab CI both support OIDC for assuming AWS/GCP/Azure roles without static credentials

Dependency pinning

terraform {
  required_version = "~> 1.14.0"  # Pin Terraform version

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"          # Pin provider version
    }
  }
}

# Always commit .terraform.lock.hcl to lock exact versions

Code review for plans

Always review the plan output before approving an apply — this is your last line of defense
Look for unexpected destroy or replace actions — these indicate breaking changes
Watch for ~ (update in-place) on sensitive resources like databases or load balancers
Verify that the number of changes matches expectations — a "simple tag change" that shows 47 resources changing is a red flag
Use terraform plan -target=resource.name to limit scope when debugging specific changes

Operational hygiene

Use terraform fmt in CI to enforce consistent formatting
Use terraform validate to catch syntax errors before plan
Use tflint for linting rules beyond what validate catches (deprecated arguments, naming conventions)
Use checkov or trivy config for security scanning of Terraform configs (tfsec is deprecated and merged into Trivy)
Use terraform-docs to auto-generate module documentation
Use infracost to estimate cost impact of changes in PRs

Security reminder

The Terraform state file contains every secret value Terraform manages — database passwords, API keys, TLS certificates, etc. — in plaintext JSON. Treat the state backend with the same security posture as your secrets manager. Encrypt at rest, restrict access to authorized pipelines and operators only, enable versioning for recovery, and audit access logs.

Consultant's Checklist

Use this checklist when assessing or setting up Terraform for a client engagement.

Foundation

Remote state backend configured with encryption and locking
State bucket/backend access restricted to CI/CD and authorized operators
Terraform and provider versions pinned in required_version and required_providers
.terraform.lock.hcl committed to version control
.gitignore excludes state files, .terraform/, and sensitive .tfvars
Directory structure established (modules, environments, or workspaces)

CI/CD

Plan runs on every merge request / pull request
Plan output posted as MR/PR comment for review
Apply runs only on merge to main (or manual approval)
OIDC federation used for cloud provider authentication (no static credentials)
Separate IAM roles per environment
State locking prevents concurrent applies

Code quality

terraform fmt enforced in CI
terraform validate run before plan
tflint or equivalent linter configured
Security scanner (checkov, trivy) in pipeline
Modules documented with terraform-docs
Variables have descriptions and type constraints

Security

No credentials in .tf files or version control
State encrypted at rest and access-controlled
Sensitive variables marked with sensitive = true
Secrets stored in Vault / Secrets Manager, not in Terraform variables
Least-privilege IAM roles for Terraform execution
Plan output reviewed before every apply — no blind auto-approve

Decision points

Terraform vs OpenTofu? — If OSS licensing matters, use OpenTofu. Otherwise, either works identically.
Terraform vs CloudFormation? — Multi-cloud = Terraform. AWS-only with zero state overhead preference = CloudFormation.
Workspaces vs directories? — Small team with identical environments = workspaces. Larger teams or differing environments = directories.
HCP Terraform vs self-hosted CI/CD? — If the client wants managed policy enforcement and cost estimation, HCP Terraform. Otherwise, GitHub Actions / GitLab CI is simpler and cheaper.
Atlantis vs custom pipeline? — Many repos and contributors wanting self-service = Atlantis. Small team with few repos = custom pipeline.

HCL (HashiCorp Configuration Language)

The domain-specific language for all HashiCorp tools

What it is

HCL is a declarative configuration language created by HashiCorp. It is used by Terraform, Packer, Vault, Consul, Nomad, and other HashiCorp tools. HCL is designed to be both human-readable and machine-parseable, sitting between JSON (too verbose for humans) and full programming languages (too powerful for configuration).

Key characteristics

Block-based syntax — resources, variables, and other constructs are defined in blocks with curly braces
Declarative — you describe what you want, not how to get there
Type system — supports strings, numbers, booleans, lists, maps, objects, and sets
Expressions — supports string interpolation, conditionals (condition ? true : false), and function calls
No general-purpose loops — uses for_each, count, and for expressions instead of traditional loops

Example

variable "name" {
  type    = string
  default = "world"
}

resource "aws_instance" "example" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name = "Hello ${var.name}"
  }
}

HCL vs JSON

Terraform also accepts JSON syntax (.tf.json files), but this is primarily for machine-generated configurations. Humans should always write HCL. The same configuration in JSON is roughly 3x longer and much harder to read.

Key point: HCL is intentionally limited. The lack of general-purpose programming features is a feature, not a bug. It keeps configurations predictable and auditable. If you find yourself fighting HCL to implement complex logic, you are likely over-engineering your Terraform config — consider moving that logic into a module or a script that generates .tfvars.

State File

Terraform's source of truth for managed infrastructure

What it is

The state file (terraform.tfstate) is a JSON document that maps every Terraform resource to its real-world counterpart. It records resource IDs, attributes, dependencies, and metadata. Terraform reads this file to understand what currently exists and compares it against the desired configuration to determine what changes to make.

What it contains

{
  "version": 4,
  "terraform_version": "1.14.0",
  "resources": [
    {
      "mode": "managed",
      "type": "aws_instance",
      "name": "web",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "attributes": {
            "id": "i-0abc123def456",
            "ami": "ami-0c55b159cbfafe1f0",
            "instance_type": "t3.micro",
            "public_ip": "54.123.45.67",
            "private_ip": "10.0.1.42"
          }
        }
      ]
    }
  ]
}

Why it matters

Without state, Terraform is blind — it cannot know what resources it previously created
Contains secrets — database passwords, API keys, certificates are stored in plaintext in state
Must be protected — encrypt at rest, restrict access, enable versioning for recovery
Must be locked — concurrent modifications corrupt state

Key point: The state file is the most security-sensitive artifact in a Terraform project. It contains every secret that Terraform manages. Treat your state backend (S3 bucket, GCS bucket, etc.) with the same security posture as your secrets management system.

OpenTofu

The open-source fork of Terraform under the Linux Foundation

What it is

OpenTofu is a fork of Terraform v1.5.7 (the last MPL-2.0 release), created in September 2023 after HashiCorp changed Terraform's license from MPL-2.0 to BSL 1.1. It is governed by the Linux Foundation and maintained by a consortium of companies including Spacelift, env0, Gruntwork, Harness, and others. The CLI command is tofu instead of terraform.

Compatibility

HCL syntax — fully compatible with Terraform HCL files
Provider registry — uses the same Terraform provider registry
State format — reads and writes the same state file format
Modules — compatible with all Terraform modules

Unique features (diverging from Terraform)

Native state encryption — OpenTofu can encrypt the state file itself, not just rely on backend encryption
Provider-defined functions — allows providers to export custom functions for use in HCL
Removed BSL-licensed code — any features added to Terraform after the license change are reimplemented independently

Bottom line: OpenTofu is functionally equivalent to Terraform for nearly all use cases. The choice between them is primarily a licensing and governance decision. If you care about open-source principles or are a vendor building products on top of IaC tooling, OpenTofu matters. For end users managing their own infrastructure, either tool works.

CloudFormation

AWS's native infrastructure as code service

What it is

AWS CloudFormation is a managed service that lets you define AWS infrastructure in JSON or YAML templates. You upload a template, CloudFormation creates a "stack" of resources, and AWS manages the provisioning, updating, and deletion. There is no state file to manage — AWS handles state internally.

Key differences from Terraform

AWS-only — CloudFormation only works with AWS services. It cannot manage Azure, GCP, Kubernetes, or third-party services.
Managed state — no state file, no locking, no backend configuration. AWS handles everything.
Automatic rollback — if a stack update fails, CloudFormation automatically rolls back to the previous state. Terraform has no equivalent.
Drift detection — CloudFormation can detect out-of-band changes automatically.
Verbose syntax — YAML/JSON templates are significantly more verbose than HCL for the same infrastructure.

When to recommend

CloudFormation is a solid choice for teams that are 100% AWS, want zero operational overhead for state management, and value automatic rollback. It is particularly strong for organizations using AWS Organizations with StackSets for multi-account deployments.

Consultant tip: Do not dismiss CloudFormation because "everyone uses Terraform." For AWS-only shops, CloudFormation removes an entire category of operational concerns (state management, locking, backend configuration). The trade-off is vendor lock-in and verbose syntax. If the client is already committed to AWS, that lock-in may not matter.

Atlantis

Self-hosted Terraform automation via pull requests

What it is

Atlantis is a Go application that listens for GitHub/GitLab/Bitbucket webhooks and runs Terraform commands in response to pull request comments. Instead of building a custom CI/CD pipeline for Terraform, you deploy Atlantis as a service and interact with it through PR comments.

How it works

Developer opens a pull request that modifies .tf files
Atlantis automatically runs terraform plan and posts the output as a PR comment
Reviewer examines the plan and approves the PR
Developer (or reviewer) comments atlantis apply
Atlantis runs terraform apply and posts the result

Key features

PR-based workflow — all Terraform operations happen through pull requests
Automatic planning — plans run automatically when PR is opened or updated
Locking — locks the Terraform directory so only one PR can apply at a time
Multi-repo support — a single Atlantis instance can manage many repositories
Custom workflows — define pre/post hooks, custom plan/apply commands
Policy checks — integrate with Conftest or OPA for policy enforcement

Deployment

Atlantis runs as a single binary or container. It needs network access to your VCS (GitHub/GitLab), your Terraform state backend, and the cloud provider APIs. Most teams deploy it as a Kubernetes Deployment or an ECS service with an IAM role for cloud access.

When to use: Atlantis shines in organizations with many Terraform repositories and contributors who want self-service infrastructure changes without building custom CI/CD pipelines. It is less useful for small teams with a single repository where a simple GitHub Actions workflow is sufficient.

Providers

The plugin system that connects Terraform to infrastructure APIs

What they are

Providers are standalone Go binaries that implement the Terraform Plugin Protocol. Each provider is responsible for understanding API interactions and exposing resources for a specific infrastructure platform. When you run terraform init, Terraform downloads the required provider binaries from the Terraform Registry (registry.terraform.io) into the .terraform/providers/ directory.

How they work

Resource CRUD — each provider implements Create, Read, Update, and Delete operations for every resource type it supports
Data sources — providers also implement read-only data sources for querying existing infrastructure
Authentication — providers handle authentication to their respective APIs (AWS credentials, GCP service accounts, etc.)
Schema — providers define the schema (required/optional attributes, types, validation) for each resource and data source

Provider versioning

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"    # namespace/name
      version = "~> 5.0"           # pessimistic constraint
    }
  }
}

The registry

The Terraform Registry at registry.terraform.io hosts over 3,000 providers. Major cloud providers (AWS, Azure, GCP) maintain official providers with HashiCorp. Community providers cover everything from DNS services to monitoring platforms to physical hardware. The registry also hosts modules, which are reusable Terraform configurations.

Key point: Providers are the reason Terraform can manage anything with an API. If a service has an API, someone has probably written a Terraform provider for it. If not, you can write your own using the Terraform Plugin Framework (in Go). The provider ecosystem is Terraform's greatest competitive advantage over alternatives.

Terraform & Infrastructure as Code

Overview

Declarative vs imperative

The BSL license change

Strengths

Considerations

How Terraform Works

The four core commands

Provider plugins

Dependency graph

Resource addressing

The reconciliation loop

State Management

Why state matters

Local vs remote state

Local state

Remote state

State backends

Backend configuration example

State locking

terraform state commands

HCL Language

Resources

Data sources

Variables

for_each and count

Dynamic blocks

Key functions

Modules

Module structure

Calling a module

Module sources

Module versioning

Module composition pattern

Providers

Provider configuration

Key providers

Authentication patterns

Environment variables

IAM roles / workload identity

Shared credentials file

Hardcoded in config

Workspaces & Environments

Terraform workspaces

Directory-based pattern

When to use each

Terraform vs OpenTofu

Why OpenTofu exists

Comparison

Migration path

Terraform vs CloudFormation

Comparison

When to use which

Choose Terraform when

Choose CloudFormation when

Other alternatives

Pulumi

AWS CDK

CI/CD Integration

Standard pipeline pattern

GitHub Actions example

GitLab CI example

Atlantis

HCP Terraform (formerly Terraform Cloud)

Best Practices & Security

.gitignore essentials

Secrets management

Do

Do not

Least-privilege IAM for Terraform

Dependency pinning

Code review for plans

Operational hygiene

Consultant's Checklist

Foundation

CI/CD

Code quality

Security

Decision points

HCL (HashiCorp Configuration Language)