Terraform & Infrastructure as Code
Declarative infrastructure management — HCL, state, modules, providers & operations
Overview
Terraform is an Infrastructure as Code (IaC) tool created by HashiCorp. It lets you define cloud and on-premise infrastructure in declarative configuration files written in HCL (HashiCorp Configuration Language), then provisions and manages that infrastructure through provider APIs. Terraform is the most widely adopted IaC tool in the industry, with support for hundreds of cloud services, SaaS platforms, and on-premise systems through its provider ecosystem.
Declarative vs imperative
Terraform uses a declarative approach: you describe the desired end state of your infrastructure, and Terraform figures out the sequence of API calls needed to reach that state. This is fundamentally different from imperative scripting (Bash, AWS CLI scripts) where you write step-by-step instructions for what to do. Ansible occupies a middle ground — individual modules are often declarative, but playbooks are executed procedurally in order. The declarative model means Terraform can determine what has changed, what needs to be created, updated, or destroyed, and in what order — all automatically.
The BSL license change
In August 2023, HashiCorp changed Terraform's license from the Mozilla Public License 2.0 (MPL-2.0) to the Business Source License 1.1 (BSL 1.1, also called BUSL). This means Terraform is no longer open source by the OSI definition. The BSL prohibits using Terraform to build competing commercial products. For most end users deploying their own infrastructure, this change has no practical impact. For vendors building Terraform-based SaaS products, it matters significantly. This license change directly led to the creation of OpenTofu, a community fork under the Linux Foundation.
Strengths
- Multi-cloud — single tool for AWS, Azure, GCP, Kubernetes, and hundreds more
- Massive ecosystem — 3,000+ providers in the public registry
- Dependency graph — automatic ordering of resource creation and destruction
- Plan before apply — preview changes before making them
- State tracking — knows what exists and what needs to change
- Module system — reusable, composable infrastructure components
- Mature tooling — IDE support, linters, testing frameworks, CI integrations
Considerations
- State management complexity — state file must be stored securely and locked properly
- BSL license — no longer open source; consider OpenTofu for OSS purity
- No rollback — Terraform does not natively support rolling back to a previous state
- Drift detection is manual — you must run
terraform planto detect drift - Learning curve — HCL, state, modules, and workspaces take time to master
- Secrets in state — state file contains sensitive values in plaintext
Terraform is the default choice for multi-cloud infrastructure provisioning. If a client is not locked into a single cloud provider and needs to manage infrastructure as code, Terraform (or OpenTofu) is the answer. For AWS-only shops, CloudFormation is a viable alternative that avoids state management overhead. For teams that prefer general-purpose languages over HCL, Pulumi and AWS CDK are worth considering.
How Terraform Works
Terraform follows a simple but powerful lifecycle: write configuration, init the working directory, plan the changes, and apply them. Understanding this lifecycle is essential for using Terraform effectively.
The four core commands
| Command | Purpose | What happens |
|---|---|---|
terraform init | Initialize | Downloads provider plugins, initializes the backend (where state is stored), downloads modules referenced in configuration. Must be run once per working directory (and again when providers or backend change). |
terraform plan | Preview | Reads the current state, compares it to the desired configuration, and produces an execution plan showing what will be created, modified, or destroyed. No changes are made. This is the safety net. |
terraform apply | Execute | Runs the plan and executes the changes by calling provider APIs. Creates, updates, or deletes resources. Updates the state file with the new reality. Prompts for confirmation unless -auto-approve is used. |
terraform destroy | Tear down | Destroys all resources managed by the current configuration. Equivalent to terraform apply -destroy. Prompts for confirmation. Use with extreme caution in production. |
Provider plugins
Providers are the bridge between Terraform and the APIs of infrastructure platforms. When you run terraform init, Terraform downloads the provider binaries specified in your configuration from the Terraform Registry (or a mirror). Each provider implements CRUD operations for the resources it manages. For example, the aws provider knows how to create an EC2 instance by calling the AWS API, and the kubernetes provider knows how to create a Deployment by talking to the Kubernetes API server.
Dependency graph
Terraform builds a directed acyclic graph (DAG) of all resources and their dependencies. If resource B references an attribute of resource A, Terraform knows it must create A before B. This graph is also used to determine what can be created in parallel. You can visualize the graph with terraform graph | dot -Tpng > graph.png.
Resource addressing
Every resource in Terraform has a unique address: resource_type.resource_name. For example, aws_instance.web or google_compute_network.vpc. When using modules, the address includes the module path: module.networking.aws_vpc.main. When using count or for_each, an index is appended: aws_instance.web[0] or aws_instance.web["us-east-1"]. These addresses are how Terraform tracks resources in state and how you reference them in CLI commands like terraform state mv.
The reconciliation loop
Terraform's core algorithm is a reconciliation loop:
- Read state — load the current state file to understand what resources Terraform believes exist
- Refresh — optionally query the real infrastructure (provider APIs) to update state with any out-of-band changes
- Diff — compare the desired configuration (HCL) against the refreshed state to determine what actions are needed
- Plan — produce an ordered list of create, update, and delete actions
- Apply — execute the actions in dependency order, updating state after each successful operation
Terraform is not a continuous reconciliation controller like Kubernetes. It only checks state when you run plan or apply. Between runs, infrastructure can drift (someone manually changes a security group in the AWS console, for example). Terraform will only detect and correct this drift the next time you run it. This is why scheduled terraform plan runs in CI/CD are important for drift detection.
State Management
The state file is the single most important concept in Terraform after the configuration itself. It is a JSON file that records the mapping between your Terraform resources and the real-world objects they represent. Without state, Terraform cannot know what it has previously created and would try to create everything from scratch on every run.
Why state matters
- Maps config to reality —
aws_instance.webin config maps toi-0abc123def456in AWS - Tracks metadata — dependencies, resource ordering, provider information
- Performance — caches resource attributes so Terraform does not need to query every resource on every plan
- Enables collaboration — when stored remotely, multiple team members can work on the same infrastructure
Local vs remote state
Local state
Stored in terraform.tfstate in the working directory. Fine for learning and personal projects. Dangerous for teams because there is no locking, no shared access, and the file can be accidentally deleted or committed to Git (exposing secrets).
Remote state
Stored in a shared backend. Enables team collaboration, state locking (prevents concurrent modifications), encryption at rest, and versioning. This is the only acceptable option for production.
State backends
| Backend | Locking | Encryption | Notes |
|---|---|---|---|
| S3 | Yes (native S3 or DynamoDB) | SSE-S3/KMS | The most common backend for AWS shops. Since Terraform 1.10, S3 supports native state locking via use_lockfile = true, eliminating the need for a DynamoDB table. DynamoDB-based locking is deprecated. Enable versioning on the S3 bucket. |
| GCS | Yes (native) | Google-managed | Google Cloud Storage. Built-in locking. Simple to configure for GCP-centric teams. |
| Azure Blob | Yes (blob lease) | Azure-managed | Azure Storage Account with blob leasing for locks. Standard for Azure shops. |
| HCP Terraform | Yes | HashiCorp-managed | Formerly Terraform Cloud (renamed April 2024). Free tier for up to 500 managed resources. Includes remote execution, policy checks, VCS integration. Lock-in to HashiCorp ecosystem. |
| pg (PostgreSQL) | Yes (advisory locks) | Depends on setup | Stores state in a PostgreSQL database. Useful for on-premise environments without cloud object storage. |
| consul | Yes | Depends on setup | HashiCorp Consul KV store. Less common now that HCP Terraform exists. |
Backend configuration example
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/networking/terraform.tfstate"
region = "us-east-1"
use_lockfile = true # Native S3 locking (Terraform 1.10+)
encrypt = true
}
}
State locking
State locking prevents two people (or two CI pipelines) from running terraform apply at the same time on the same state. Without locking, concurrent applies can corrupt the state file or create conflicting infrastructure. Most remote backends support locking natively. If a lock is stuck (e.g., a pipeline crashed), you can force-unlock with terraform force-unlock LOCK_ID — but only after confirming no other operation is running.
terraform state commands
| Command | Purpose |
|---|---|
terraform state list | List all resources in state |
terraform state show <addr> | Show attributes of a specific resource |
terraform state mv <src> <dst> | Move/rename a resource in state. Prefer the declarative moved {} block in HCL (Terraform 1.1+) for versioned, reviewable refactors |
terraform state rm <addr> | Remove a resource from state without destroying it |
terraform state pull | Download remote state to stdout |
terraform state push | Upload a local state file to the remote backend |
terraform import <addr> <id> | Import an existing resource into state. Prefer the declarative import {} block in HCL (Terraform 1.5+), which can also auto-generate configuration with -generate-config-out |
Never manually edit the state file. It is a JSON file and technically editable, but manual edits are the #1 cause of state corruption. Use terraform state commands instead. If you must edit state (e.g., to recover from corruption), always back up the file first, and understand that one wrong edit can cause Terraform to destroy and recreate resources.
HCL Language
HCL (HashiCorp Configuration Language) is a domain-specific language designed for defining infrastructure. It is intentionally not a general-purpose programming language — it has no loops in the traditional sense, no classes, no exception handling. This is by design: it forces configurations to be declarative and readable.
Resources
Resources are the most important element. Each resource block declares a piece of infrastructure:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
subnet_id = aws_subnet.public.id
tags = {
Name = "web-server"
Environment = "production"
}
}
Data sources
Data sources let you query existing infrastructure that Terraform does not manage:
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
}
Variables
# Input variable (variables.tf)
variable "environment" {
description = "Deployment environment"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_count" {
description = "Number of instances to create"
type = number
default = 2
}
# Output value (outputs.tf)
output "instance_ips" {
description = "Public IPs of all instances"
value = aws_instance.web[*].public_ip
}
# Local value
locals {
common_tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = "web-platform"
}
}
for_each and count
# count - create N identical resources
resource "aws_instance" "web" {
count = var.instance_count
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
tags = { Name = "web-${count.index}" }
}
# for_each - create resources from a map or set
variable "subnets" {
default = {
"public-a" = { cidr = "10.0.1.0/24", az = "us-east-1a" }
"public-b" = { cidr = "10.0.2.0/24", az = "us-east-1b" }
"private-a" = { cidr = "10.0.3.0/24", az = "us-east-1a" }
}
}
resource "aws_subnet" "this" {
for_each = var.subnets
vpc_id = aws_vpc.main.id
cidr_block = each.value.cidr
availability_zone = each.value.az
tags = { Name = each.key }
}
Dynamic blocks
variable "ingress_rules" {
default = [
{ port = 80, cidr = "0.0.0.0/0" },
{ port = 443, cidr = "0.0.0.0/0" },
{ port = 22, cidr = "10.0.0.0/8" },
]
}
resource "aws_security_group" "web" {
name = "web-sg"
vpc_id = aws_vpc.main.id
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.port
to_port = ingress.value.port
protocol = "tcp"
cidr_blocks = [ingress.value.cidr]
}
}
}
Key functions
| Function | Example | Purpose |
|---|---|---|
lookup | lookup(var.amis, var.region) | Map lookup with optional default |
merge | merge(local.common_tags, { Name = "web" }) | Merge maps |
join | join(",", var.subnets) | Join list to string |
format | format("arn:aws:s3:::%s/*", var.bucket) | String formatting |
cidrsubnet | cidrsubnet("10.0.0.0/16", 8, 1) | Calculate subnet CIDRs |
templatefile | templatefile("init.sh", { env = var.env }) | Render template file |
try | try(var.config.setting, "default") | Try expressions with fallback |
flatten | flatten([var.list_a, var.list_b]) | Flatten nested lists |
Prefer for_each over count for most use cases. With count, removing an item from the middle of a list causes all subsequent resources to be destroyed and recreated (because their index changes). With for_each, resources are keyed by map key or set value, so removing one item only affects that specific resource.
Modules
Modules are the primary mechanism for code reuse in Terraform. A module is simply a directory containing .tf files. Every Terraform configuration is a module — the top-level directory is the root module, and any modules it calls are child modules.
Module structure
modules/
vpc/
main.tf # Resource definitions
variables.tf # Input variables
outputs.tf # Output values
versions.tf # Provider and Terraform version constraints
README.md # Documentation
Calling a module
# From a local path
module "vpc" {
source = "./modules/vpc"
vpc_cidr = "10.0.0.0/16"
environment = var.environment
project_name = var.project_name
}
# From the Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "my-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}
# From a Git repository
module "vpc" {
source = "git::https://github.com/myorg/terraform-modules.git//vpc?ref=v2.1.0"
}
# Referencing module outputs
resource "aws_instance" "web" {
subnet_id = module.vpc.public_subnet_ids[0]
}
Module sources
| Source | Syntax | Best for |
|---|---|---|
| Local path | ./modules/vpc | Modules within the same repository |
| Terraform Registry | hashicorp/consul/aws | Community modules with versioning |
| GitHub | github.com/org/repo//subdir | Private organizational modules |
| Git (generic) | git::https://...?ref=v1.0 | Any Git repository |
| S3 bucket | s3::https://s3.amazonaws.com/bucket/module.zip | Air-gapped or private distribution |
Module versioning
Always pin module versions in production. Use semantic versioning constraints:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.2.0" # Exact version (most conservative)
# version = "~> 5.2" # Allows 5.2.x but not 5.3.0
# version = ">= 5.0, < 6.0" # Range constraint
}
Module composition pattern
A well-structured Terraform project composes multiple modules together in the root module. Each module handles one concern — networking, compute, database, monitoring — and the root module wires them together through input variables and output references:
module "networking" {
source = "./modules/networking"
environment = var.environment
vpc_cidr = var.vpc_cidr
}
module "database" {
source = "./modules/database"
subnet_ids = module.networking.private_subnet_ids
vpc_id = module.networking.vpc_id
}
module "application" {
source = "./modules/application"
subnet_ids = module.networking.public_subnet_ids
db_endpoint = module.database.endpoint
db_secret_arn = module.database.secret_arn
}
Use the public Terraform Registry modules as a starting point, not as a final solution. Registry modules like terraform-aws-modules/vpc/aws are well-tested and cover common patterns, but they expose hundreds of variables you may not need. For organizations with specific standards, fork or wrap registry modules in a thin internal module that enforces your defaults (naming conventions, tagging, encryption settings).
Providers
Providers are plugins that let Terraform interact with specific infrastructure platforms and services. Each provider is a separate binary that implements the Terraform plugin protocol, translating HCL resource definitions into API calls.
Provider configuration
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 6.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 3.0"
}
proxmox = {
source = "bpg/proxmox"
version = "~> 0.98"
}
}
}
provider "aws" {
region = "us-east-1"
# Authentication: uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars,
# shared credentials file, IAM role, or SSO
}
# Multiple provider instances with aliases
provider "aws" {
alias = "west"
region = "us-west-2"
}
resource "aws_instance" "west_server" {
provider = aws.west
ami = "ami-0abcdef1234567890"
instance_type = "t3.micro"
}
Key providers
| Provider | Source | Use case |
|---|---|---|
| AWS | hashicorp/aws | EC2, S3, RDS, VPC, IAM, Lambda, EKS — the most feature-rich provider |
| Azure (azurerm) | hashicorp/azurerm | VMs, Storage, AKS, Azure AD, networking — requires subscription_id |
| Google Cloud | hashicorp/google | GCE, GKE, Cloud SQL, VPC, IAM — use google-beta for preview features |
| Kubernetes | hashicorp/kubernetes | Deployments, Services, ConfigMaps — works with any K8s cluster |
| Helm | hashicorp/helm | Deploy Helm charts via Terraform — useful for bootstrapping clusters |
| Proxmox | bpg/proxmox | VMs and containers on Proxmox VE — popular for homelab and on-prem |
| vSphere | hashicorp/vsphere | VMware vSphere VMs, datastores, networks — enterprise on-prem |
| Cloudflare | cloudflare/cloudflare | DNS records, WAF rules, Workers, tunnels |
Authentication patterns
Environment variables
The preferred approach for CI/CD. Set AWS_ACCESS_KEY_ID, GOOGLE_CREDENTIALS, ARM_CLIENT_ID, etc. as environment variables. Keeps secrets out of code.
IAM roles / workload identity
Best for cloud-native execution. EC2 instance profiles, GKE workload identity, or Azure managed identity. No static credentials needed.
Shared credentials file
Uses ~/.aws/credentials or equivalent. Fine for local development. Never use in CI/CD or shared environments.
Hardcoded in config
Never put credentials directly in .tf files. They will end up in version control. Use environment variables or a secrets manager instead.
Running terraform init generates a .terraform.lock.hcl file that records the exact provider versions and checksums used. Commit this file to version control. It ensures everyone on the team and CI/CD uses the same provider versions, preventing "works on my machine" issues. It is the equivalent of a package-lock.json or go.sum.
Workspaces & Environments
Managing multiple environments (dev, staging, prod) is one of the most common challenges in Terraform. There are two primary patterns: workspace-based and directory-based.
Terraform workspaces
Terraform workspaces allow you to maintain multiple state files from a single configuration directory. Each workspace has its own state, so resources created in the dev workspace are completely independent from those in prod.
# Create and switch workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
# List workspaces
terraform workspace list
# Use workspace name in configuration
# terraform.workspace returns the current workspace name
locals {
env_config = {
dev = { instance_type = "t3.micro", count = 1 }
staging = { instance_type = "t3.small", count = 2 }
prod = { instance_type = "t3.large", count = 3 }
}
config = local.env_config[terraform.workspace]
}
resource "aws_instance" "web" {
count = local.config.count
instance_type = local.config.instance_type
ami = data.aws_ami.ubuntu.id
tags = { Environment = terraform.workspace }
}
Directory-based pattern
Instead of workspaces, use separate directories for each environment, each with their own backend configuration and variable values:
infrastructure/
modules/
vpc/
compute/
database/
environments/
dev/
main.tf # Calls modules with dev settings
terraform.tfvars # Dev variable values
backend.tf # Dev state backend
staging/
main.tf
terraform.tfvars
backend.tf
prod/
main.tf
terraform.tfvars
backend.tf
When to use each
| Aspect | Workspaces | Directory-based |
|---|---|---|
| Config differences | Same config, different variables | Can differ per environment |
| State isolation | Same backend, different state keys | Completely separate backends |
| CI/CD complexity | Simpler — one pipeline, switch workspace | More pipelines but clearer separation |
| Blast radius | Easier to accidentally apply to wrong workspace | Harder to make cross-environment mistakes |
| Best for | Small teams, identical environments | Larger teams, environments with structural differences |
Most experienced Terraform practitioners prefer the directory-based pattern for production. Workspaces have a foot-gun problem: there is nothing stopping you from being in the dev workspace and thinking you are in prod (or vice versa). The directory-based approach makes the environment explicit in the file path and makes it physically harder to run the wrong command against the wrong environment.
Terraform vs OpenTofu
OpenTofu is a fork of Terraform created in response to HashiCorp's BSL license change in August 2023. It is maintained by the Linux Foundation and is fully open source under the MPL-2.0 license. OpenTofu aims to be a drop-in replacement for Terraform, maintaining compatibility with existing Terraform configurations, providers, and modules.
Why OpenTofu exists
- License freedom — MPL-2.0 allows unrestricted use, including building commercial products on top of it
- Community governance — decisions made by the community and a steering committee, not a single company
- Linux Foundation backing — provides organizational structure, funding, and legitimacy
- Vendor neutrality — no single company controls the project's direction
Comparison
| Aspect | Terraform | OpenTofu |
|---|---|---|
| License | BSL 1.1 (not open source) | MPL-2.0 (open source) |
| Governance | HashiCorp (single company) | Linux Foundation + steering committee |
| CLI command | terraform | tofu |
| Config language | HCL | HCL (identical syntax) |
| Provider compatibility | Full registry access | Full registry access (same providers) |
| State format | JSON state file | Compatible JSON state file |
| Unique features | HCP Terraform integration | State encryption (native), provider-defined functions, early -parallelism improvements |
| Maturity | 10+ years, battle-tested | Forked late 2023, rapidly maturing |
| Enterprise support | HashiCorp (paid) | Third-party vendors (Spacelift, env0, etc.) |
Migration path
Migrating from Terraform to OpenTofu is straightforward for most projects:
- Install OpenTofu (
tofubinary) - Replace
terraformwithtofuin your commands - Run
tofu initto re-initialize (downloads the same providers) - Run
tofu planto verify no changes are detected - Update CI/CD pipelines to use
tofuinstead ofterraform
State files are compatible in both directions. No state migration is needed.
For new projects, recommend OpenTofu if the client values open-source licensing and community governance. For existing Terraform deployments, there is no urgent need to migrate unless the BSL license is a legal concern (e.g., the client is building a competing IaC product). Both tools work identically for day-to-day infrastructure management. The ecosystem (providers, modules, documentation) is shared.
Terraform vs CloudFormation
For AWS-only environments, CloudFormation is the primary alternative to Terraform. The choice between them depends on multi-cloud requirements, team experience, and operational preferences.
Comparison
| Aspect | Terraform | CloudFormation |
|---|---|---|
| Cloud support | Multi-cloud (AWS, Azure, GCP, +3000 providers) | AWS only |
| Language | HCL (purpose-built, readable) | JSON or YAML (verbose) |
| State management | Self-managed (S3 + DynamoDB, etc.) | AWS-managed (no state file to worry about) |
| Drift detection | Manual (terraform plan) | Built-in drift detection in console |
| Rollback | No native rollback | Automatic rollback on stack failure |
| Preview changes | terraform plan | Change sets |
| Speed of new features | AWS provider updates within days/weeks | Same-day support for new AWS services |
| Modularity | Modules (mature, registry) | Nested stacks, StackSets (less flexible) |
| Cost | Free (BSL) / OpenTofu (free OSS) | Free (AWS service) |
| Learning curve | HCL + state + providers | YAML/JSON + AWS concepts |
When to use which
Choose Terraform when
- Multi-cloud or hybrid cloud strategy
- Managing non-AWS resources (Kubernetes, Cloudflare, Datadog, etc.)
- Team already knows HCL
- Need strong module ecosystem for reuse
- Want consistent tooling across all infrastructure
Choose CloudFormation when
- 100% AWS-only and will stay that way
- Want zero state management overhead
- Need automatic rollback on failures
- Using AWS-native features like StackSets for multi-account
- Team is more comfortable with YAML than learning HCL
Other alternatives
Pulumi
Uses general-purpose languages (TypeScript, Python, Go, C#) instead of a DSL. Same multi-cloud, state-managed approach as Terraform but with full programming language capabilities — loops, conditionals, unit tests, IDE autocomplete. Growing adoption, especially among teams that dislike DSLs. State managed via Pulumi Cloud or self-hosted backends.
AWS CDK
AWS Cloud Development Kit — write AWS infrastructure in TypeScript, Python, Java, Go, or C#, and it synthesizes to CloudFormation templates. Best of both worlds: programming language ergonomics with CloudFormation's managed state and rollback. AWS-only. CDK for Terraform (CDKTF) brings the same concept to Terraform providers.
For most consulting engagements, Terraform is the default recommendation because it works everywhere. If a client is AWS-only, small, and wants minimal operational overhead, CloudFormation is perfectly fine — do not over-engineer with Terraform just because it is trendy. If a client's team is strong in TypeScript or Python and resistant to learning a DSL, consider Pulumi as a serious alternative rather than forcing HCL.
CI/CD Integration
Running Terraform in CI/CD pipelines is the standard operating model for teams. The pattern is simple: plan on merge request, apply on merge to main. This ensures changes are reviewed before being applied and that infrastructure changes follow the same review process as application code.
Standard pipeline pattern
GitHub Actions example
name: Terraform
on:
pull_request:
paths: ['infrastructure/**']
push:
branches: [main]
paths: ['infrastructure/**']
jobs:
plan:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: infrastructure/prod
- run: terraform plan -no-color -out=tfplan
working-directory: infrastructure/prod
- uses: actions/github-script@v7
with:
script: |
const output = `#### Terraform Plan
\`\`\`
${process.env.PLAN}
\`\`\``;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
apply:
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: infrastructure/prod
- run: terraform apply -auto-approve
working-directory: infrastructure/prod
GitLab CI example
stages:
- validate
- plan
- apply
plan:
stage: plan
image: hashicorp/terraform:1.14
script:
- cd infrastructure/prod
- terraform init
- terraform plan -out=tfplan
artifacts:
paths:
- infrastructure/prod/tfplan
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
- if: '$CI_COMMIT_BRANCH == "main"'
apply:
stage: apply
image: hashicorp/terraform:1.14
script:
- cd infrastructure/prod
- terraform init
- terraform apply -auto-approve tfplan
dependencies:
- plan
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
when: manual
Atlantis
Atlantis is a self-hosted application that automates Terraform via pull request comments. Instead of building custom CI/CD pipelines, you deploy Atlantis and interact with Terraform through PR comments like atlantis plan and atlantis apply. Atlantis handles state locking, plan output, and apply execution. It is popular with teams that want a lightweight, GitOps-style workflow without the complexity of HCP Terraform.
HCP Terraform (formerly Terraform Cloud)
HashiCorp's hosted platform for running Terraform, renamed from Terraform Cloud to HCP Terraform in April 2024. Provides remote execution, state management, policy enforcement (Sentinel/OPA), VCS integration, a private module registry, and cost estimation. The free tier supports up to 500 managed resources. For teams that want a fully managed Terraform experience and are comfortable with HashiCorp vendor lock-in, HCP Terraform eliminates the need to build your own CI/CD pipeline for Terraform.
For most teams, GitHub Actions or GitLab CI with a simple plan/apply pipeline is sufficient. Atlantis is excellent for teams with many repositories and contributors who want self-service infrastructure changes. HCP Terraform is worth evaluating if the client needs policy enforcement, cost estimation, and does not want to manage CI/CD pipelines for infrastructure. Do not over-engineer the pipeline — the goal is reviewed, auditable infrastructure changes.
Best Practices & Security
.gitignore essentials
Every Terraform repository must have a proper .gitignore:
# Terraform .gitignore
*.tfstate
*.tfstate.*
*.tfvars # May contain secrets
.terraform/ # Downloaded providers and modules
crash.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
# DO commit these:
# .terraform.lock.hcl (provider lock file)
Secrets management
Do
- Use environment variables for provider credentials
- Store secrets in AWS Secrets Manager, Vault, or similar
- Use
sensitive = trueon variables containing secrets - Encrypt state at rest (S3 SSE, GCS encryption)
- Restrict access to the state bucket/backend
- Use IAM roles instead of static credentials
Do not
- Commit
.tfstatefiles to Git - Hardcode credentials in
.tffiles - Share state files over Slack, email, or shared drives
- Use the same credentials for dev and prod
- Ignore that state files contain secrets in plaintext
- Commit
.tfvarsfiles with sensitive values
Least-privilege IAM for Terraform
Terraform typically needs broad permissions to create and manage infrastructure, but the permissions should be scoped per environment and per pipeline:
- Separate IAM roles per environment — the dev pipeline should not have permissions to touch prod resources
- Narrower permissions for plan — the MR pipeline needs read access to cloud APIs plus write access to the state backend (for refresh). Full create/update/delete permissions are only needed for apply
- Scope to specific services — if a Terraform project only manages networking, the IAM role should not have EC2 or RDS permissions
- Use OIDC federation — GitHub Actions and GitLab CI both support OIDC for assuming AWS/GCP/Azure roles without static credentials
Dependency pinning
terraform {
required_version = "~> 1.14.0" # Pin Terraform version
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 6.0" # Pin provider version
}
}
}
# Always commit .terraform.lock.hcl to lock exact versions
Code review for plans
- Always review the plan output before approving an apply — this is your last line of defense
- Look for unexpected destroy or replace actions — these indicate breaking changes
- Watch for ~ (update in-place) on sensitive resources like databases or load balancers
- Verify that the number of changes matches expectations — a "simple tag change" that shows 47 resources changing is a red flag
- Use
terraform plan -target=resource.nameto limit scope when debugging specific changes
Operational hygiene
- Use
terraform fmtin CI to enforce consistent formatting - Use
terraform validateto catch syntax errors before plan - Use
tflintfor linting rules beyond what validate catches (deprecated arguments, naming conventions) - Use
checkovortrivy configfor security scanning of Terraform configs (tfsecis deprecated and merged into Trivy) - Use
terraform-docsto auto-generate module documentation - Use
infracostto estimate cost impact of changes in PRs
The Terraform state file contains every secret value Terraform manages — database passwords, API keys, TLS certificates, etc. — in plaintext JSON. Treat the state backend with the same security posture as your secrets manager. Encrypt at rest, restrict access to authorized pipelines and operators only, enable versioning for recovery, and audit access logs.
Consultant's Checklist
Use this checklist when assessing or setting up Terraform for a client engagement.
Foundation
- Remote state backend configured with encryption and locking
- State bucket/backend access restricted to CI/CD and authorized operators
- Terraform and provider versions pinned in
required_versionandrequired_providers .terraform.lock.hclcommitted to version control.gitignoreexcludes state files,.terraform/, and sensitive.tfvars- Directory structure established (modules, environments, or workspaces)
CI/CD
- Plan runs on every merge request / pull request
- Plan output posted as MR/PR comment for review
- Apply runs only on merge to main (or manual approval)
- OIDC federation used for cloud provider authentication (no static credentials)
- Separate IAM roles per environment
- State locking prevents concurrent applies
Code quality
terraform fmtenforced in CIterraform validaterun before plantflintor equivalent linter configured- Security scanner (checkov, trivy) in pipeline
- Modules documented with
terraform-docs - Variables have descriptions and type constraints
Security
- No credentials in
.tffiles or version control - State encrypted at rest and access-controlled
- Sensitive variables marked with
sensitive = true - Secrets stored in Vault / Secrets Manager, not in Terraform variables
- Least-privilege IAM roles for Terraform execution
- Plan output reviewed before every apply — no blind auto-approve
Decision points
- Terraform vs OpenTofu? — If OSS licensing matters, use OpenTofu. Otherwise, either works identically.
- Terraform vs CloudFormation? — Multi-cloud = Terraform. AWS-only with zero state overhead preference = CloudFormation.
- Workspaces vs directories? — Small team with identical environments = workspaces. Larger teams or differing environments = directories.
- HCP Terraform vs self-hosted CI/CD? — If the client wants managed policy enforcement and cost estimation, HCP Terraform. Otherwise, GitHub Actions / GitLab CI is simpler and cheaper.
- Atlantis vs custom pipeline? — Many repos and contributors wanting self-service = Atlantis. Small team with few repos = custom pipeline.