Terraform & Infrastructure as Code

Declarative infrastructure management — HCL, state, modules, providers & operations

01

Overview

Terraform is an Infrastructure as Code (IaC) tool created by HashiCorp. It lets you define cloud and on-premise infrastructure in declarative configuration files written in HCL (HashiCorp Configuration Language), then provisions and manages that infrastructure through provider APIs. Terraform is the most widely adopted IaC tool in the industry, with support for hundreds of cloud services, SaaS platforms, and on-premise systems through its provider ecosystem.

Declarative vs imperative

Terraform uses a declarative approach: you describe the desired end state of your infrastructure, and Terraform figures out the sequence of API calls needed to reach that state. This is fundamentally different from imperative scripting (Bash, AWS CLI scripts) where you write step-by-step instructions for what to do. Ansible occupies a middle ground — individual modules are often declarative, but playbooks are executed procedurally in order. The declarative model means Terraform can determine what has changed, what needs to be created, updated, or destroyed, and in what order — all automatically.

The BSL license change

In August 2023, HashiCorp changed Terraform's license from the Mozilla Public License 2.0 (MPL-2.0) to the Business Source License 1.1 (BSL 1.1, also called BUSL). This means Terraform is no longer open source by the OSI definition. The BSL prohibits using Terraform to build competing commercial products. For most end users deploying their own infrastructure, this change has no practical impact. For vendors building Terraform-based SaaS products, it matters significantly. This license change directly led to the creation of OpenTofu, a community fork under the Linux Foundation.

Strengths

  • Multi-cloud — single tool for AWS, Azure, GCP, Kubernetes, and hundreds more
  • Massive ecosystem — 3,000+ providers in the public registry
  • Dependency graph — automatic ordering of resource creation and destruction
  • Plan before apply — preview changes before making them
  • State tracking — knows what exists and what needs to change
  • Module system — reusable, composable infrastructure components
  • Mature tooling — IDE support, linters, testing frameworks, CI integrations

Considerations

  • State management complexity — state file must be stored securely and locked properly
  • BSL license — no longer open source; consider OpenTofu for OSS purity
  • No rollback — Terraform does not natively support rolling back to a previous state
  • Drift detection is manual — you must run terraform plan to detect drift
  • Learning curve — HCL, state, modules, and workspaces take time to master
  • Secrets in state — state file contains sensitive values in plaintext
Positioning

Terraform is the default choice for multi-cloud infrastructure provisioning. If a client is not locked into a single cloud provider and needs to manage infrastructure as code, Terraform (or OpenTofu) is the answer. For AWS-only shops, CloudFormation is a viable alternative that avoids state management overhead. For teams that prefer general-purpose languages over HCL, Pulumi and AWS CDK are worth considering.

02

How Terraform Works

Terraform follows a simple but powerful lifecycle: write configuration, init the working directory, plan the changes, and apply them. Understanding this lifecycle is essential for using Terraform effectively.

terraform init terraform plan terraform apply | | | v v v +-----------+ +-----------+ +-----------+ | Download | | Read | | Execute | | providers | | current | | planned | | & modules | | state | | changes | +-----------+ +-----------+ +-----------+ | | | v v v +-----------+ +-----------+ +-----------+ | Init | | Diff vs | | Call | | backend | | desired | | provider | | (state | | config | | APIs | | storage) | +-----------+ +-----------+ +-----------+ | | v v +-----------+ +-----------+ | Show | | Update | | execution | | state | | plan | | file | +-----------+ +-----------+

The four core commands

CommandPurposeWhat happens
terraform initInitializeDownloads provider plugins, initializes the backend (where state is stored), downloads modules referenced in configuration. Must be run once per working directory (and again when providers or backend change).
terraform planPreviewReads the current state, compares it to the desired configuration, and produces an execution plan showing what will be created, modified, or destroyed. No changes are made. This is the safety net.
terraform applyExecuteRuns the plan and executes the changes by calling provider APIs. Creates, updates, or deletes resources. Updates the state file with the new reality. Prompts for confirmation unless -auto-approve is used.
terraform destroyTear downDestroys all resources managed by the current configuration. Equivalent to terraform apply -destroy. Prompts for confirmation. Use with extreme caution in production.

Provider plugins

Providers are the bridge between Terraform and the APIs of infrastructure platforms. When you run terraform init, Terraform downloads the provider binaries specified in your configuration from the Terraform Registry (or a mirror). Each provider implements CRUD operations for the resources it manages. For example, the aws provider knows how to create an EC2 instance by calling the AWS API, and the kubernetes provider knows how to create a Deployment by talking to the Kubernetes API server.

Dependency graph

Terraform builds a directed acyclic graph (DAG) of all resources and their dependencies. If resource B references an attribute of resource A, Terraform knows it must create A before B. This graph is also used to determine what can be created in parallel. You can visualize the graph with terraform graph | dot -Tpng > graph.png.

Resource addressing

Every resource in Terraform has a unique address: resource_type.resource_name. For example, aws_instance.web or google_compute_network.vpc. When using modules, the address includes the module path: module.networking.aws_vpc.main. When using count or for_each, an index is appended: aws_instance.web[0] or aws_instance.web["us-east-1"]. These addresses are how Terraform tracks resources in state and how you reference them in CLI commands like terraform state mv.

The reconciliation loop

Terraform's core algorithm is a reconciliation loop:

  1. Read state — load the current state file to understand what resources Terraform believes exist
  2. Refresh — optionally query the real infrastructure (provider APIs) to update state with any out-of-band changes
  3. Diff — compare the desired configuration (HCL) against the refreshed state to determine what actions are needed
  4. Plan — produce an ordered list of create, update, and delete actions
  5. Apply — execute the actions in dependency order, updating state after each successful operation
Key insight

Terraform is not a continuous reconciliation controller like Kubernetes. It only checks state when you run plan or apply. Between runs, infrastructure can drift (someone manually changes a security group in the AWS console, for example). Terraform will only detect and correct this drift the next time you run it. This is why scheduled terraform plan runs in CI/CD are important for drift detection.

03

State Management

The state file is the single most important concept in Terraform after the configuration itself. It is a JSON file that records the mapping between your Terraform resources and the real-world objects they represent. Without state, Terraform cannot know what it has previously created and would try to create everything from scratch on every run.

Why state matters

  • Maps config to realityaws_instance.web in config maps to i-0abc123def456 in AWS
  • Tracks metadata — dependencies, resource ordering, provider information
  • Performance — caches resource attributes so Terraform does not need to query every resource on every plan
  • Enables collaboration — when stored remotely, multiple team members can work on the same infrastructure

Local vs remote state

Local state

Stored in terraform.tfstate in the working directory. Fine for learning and personal projects. Dangerous for teams because there is no locking, no shared access, and the file can be accidentally deleted or committed to Git (exposing secrets).

Remote state

Stored in a shared backend. Enables team collaboration, state locking (prevents concurrent modifications), encryption at rest, and versioning. This is the only acceptable option for production.

State backends

BackendLockingEncryptionNotes
S3Yes (native S3 or DynamoDB)SSE-S3/KMSThe most common backend for AWS shops. Since Terraform 1.10, S3 supports native state locking via use_lockfile = true, eliminating the need for a DynamoDB table. DynamoDB-based locking is deprecated. Enable versioning on the S3 bucket.
GCSYes (native)Google-managedGoogle Cloud Storage. Built-in locking. Simple to configure for GCP-centric teams.
Azure BlobYes (blob lease)Azure-managedAzure Storage Account with blob leasing for locks. Standard for Azure shops.
HCP TerraformYesHashiCorp-managedFormerly Terraform Cloud (renamed April 2024). Free tier for up to 500 managed resources. Includes remote execution, policy checks, VCS integration. Lock-in to HashiCorp ecosystem.
pg (PostgreSQL)Yes (advisory locks)Depends on setupStores state in a PostgreSQL database. Useful for on-premise environments without cloud object storage.
consulYesDepends on setupHashiCorp Consul KV store. Less common now that HCP Terraform exists.

Backend configuration example

terraform {
  backend "s3" {
    bucket       = "mycompany-terraform-state"
    key          = "prod/networking/terraform.tfstate"
    region       = "us-east-1"
    use_lockfile = true   # Native S3 locking (Terraform 1.10+)
    encrypt      = true
  }
}

State locking

State locking prevents two people (or two CI pipelines) from running terraform apply at the same time on the same state. Without locking, concurrent applies can corrupt the state file or create conflicting infrastructure. Most remote backends support locking natively. If a lock is stuck (e.g., a pipeline crashed), you can force-unlock with terraform force-unlock LOCK_ID — but only after confirming no other operation is running.

terraform state commands

CommandPurpose
terraform state listList all resources in state
terraform state show <addr>Show attributes of a specific resource
terraform state mv <src> <dst>Move/rename a resource in state. Prefer the declarative moved {} block in HCL (Terraform 1.1+) for versioned, reviewable refactors
terraform state rm <addr>Remove a resource from state without destroying it
terraform state pullDownload remote state to stdout
terraform state pushUpload a local state file to the remote backend
terraform import <addr> <id>Import an existing resource into state. Prefer the declarative import {} block in HCL (Terraform 1.5+), which can also auto-generate configuration with -generate-config-out
Warning

Never manually edit the state file. It is a JSON file and technically editable, but manual edits are the #1 cause of state corruption. Use terraform state commands instead. If you must edit state (e.g., to recover from corruption), always back up the file first, and understand that one wrong edit can cause Terraform to destroy and recreate resources.

04

HCL Language

HCL (HashiCorp Configuration Language) is a domain-specific language designed for defining infrastructure. It is intentionally not a general-purpose programming language — it has no loops in the traditional sense, no classes, no exception handling. This is by design: it forces configurations to be declarative and readable.

Resources

Resources are the most important element. Each resource block declares a piece of infrastructure:

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.public.id

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}

Data sources

Data sources let you query existing infrastructure that Terraform does not manage:

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
}

Variables

# Input variable (variables.tf)
variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "dev"
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "instance_count" {
  description = "Number of instances to create"
  type        = number
  default     = 2
}

# Output value (outputs.tf)
output "instance_ips" {
  description = "Public IPs of all instances"
  value       = aws_instance.web[*].public_ip
}

# Local value
locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Project     = "web-platform"
  }
}

for_each and count

# count - create N identical resources
resource "aws_instance" "web" {
  count         = var.instance_count
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  tags          = { Name = "web-${count.index}" }
}

# for_each - create resources from a map or set
variable "subnets" {
  default = {
    "public-a"  = { cidr = "10.0.1.0/24", az = "us-east-1a" }
    "public-b"  = { cidr = "10.0.2.0/24", az = "us-east-1b" }
    "private-a" = { cidr = "10.0.3.0/24", az = "us-east-1a" }
  }
}

resource "aws_subnet" "this" {
  for_each          = var.subnets
  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr
  availability_zone = each.value.az
  tags              = { Name = each.key }
}

Dynamic blocks

variable "ingress_rules" {
  default = [
    { port = 80,  cidr = "0.0.0.0/0" },
    { port = 443, cidr = "0.0.0.0/0" },
    { port = 22,  cidr = "10.0.0.0/8" },
  ]
}

resource "aws_security_group" "web" {
  name   = "web-sg"
  vpc_id = aws_vpc.main.id

  dynamic "ingress" {
    for_each = var.ingress_rules
    content {
      from_port   = ingress.value.port
      to_port     = ingress.value.port
      protocol    = "tcp"
      cidr_blocks = [ingress.value.cidr]
    }
  }
}

Key functions

FunctionExamplePurpose
lookuplookup(var.amis, var.region)Map lookup with optional default
mergemerge(local.common_tags, { Name = "web" })Merge maps
joinjoin(",", var.subnets)Join list to string
formatformat("arn:aws:s3:::%s/*", var.bucket)String formatting
cidrsubnetcidrsubnet("10.0.0.0/16", 8, 1)Calculate subnet CIDRs
templatefiletemplatefile("init.sh", { env = var.env })Render template file
trytry(var.config.setting, "default")Try expressions with fallback
flattenflatten([var.list_a, var.list_b])Flatten nested lists
Tip

Prefer for_each over count for most use cases. With count, removing an item from the middle of a list causes all subsequent resources to be destroyed and recreated (because their index changes). With for_each, resources are keyed by map key or set value, so removing one item only affects that specific resource.

05

Modules

Modules are the primary mechanism for code reuse in Terraform. A module is simply a directory containing .tf files. Every Terraform configuration is a module — the top-level directory is the root module, and any modules it calls are child modules.

Module structure

modules/
  vpc/
    main.tf          # Resource definitions
    variables.tf     # Input variables
    outputs.tf       # Output values
    versions.tf      # Provider and Terraform version constraints
    README.md        # Documentation

Calling a module

# From a local path
module "vpc" {
  source = "./modules/vpc"

  vpc_cidr     = "10.0.0.0/16"
  environment  = var.environment
  project_name = var.project_name
}

# From the Terraform Registry
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "my-vpc"
  cidr = "10.0.0.0/16"
  azs  = ["us-east-1a", "us-east-1b", "us-east-1c"]

  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true
}

# From a Git repository
module "vpc" {
  source = "git::https://github.com/myorg/terraform-modules.git//vpc?ref=v2.1.0"
}

# Referencing module outputs
resource "aws_instance" "web" {
  subnet_id = module.vpc.public_subnet_ids[0]
}

Module sources

SourceSyntaxBest for
Local path./modules/vpcModules within the same repository
Terraform Registryhashicorp/consul/awsCommunity modules with versioning
GitHubgithub.com/org/repo//subdirPrivate organizational modules
Git (generic)git::https://...?ref=v1.0Any Git repository
S3 buckets3::https://s3.amazonaws.com/bucket/module.zipAir-gapped or private distribution

Module versioning

Always pin module versions in production. Use semantic versioning constraints:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.2.0"       # Exact version (most conservative)
  # version = "~> 5.2"    # Allows 5.2.x but not 5.3.0
  # version = ">= 5.0, < 6.0"  # Range constraint
}

Module composition pattern

A well-structured Terraform project composes multiple modules together in the root module. Each module handles one concern — networking, compute, database, monitoring — and the root module wires them together through input variables and output references:

module "networking" {
  source      = "./modules/networking"
  environment = var.environment
  vpc_cidr    = var.vpc_cidr
}

module "database" {
  source     = "./modules/database"
  subnet_ids = module.networking.private_subnet_ids
  vpc_id     = module.networking.vpc_id
}

module "application" {
  source          = "./modules/application"
  subnet_ids      = module.networking.public_subnet_ids
  db_endpoint     = module.database.endpoint
  db_secret_arn   = module.database.secret_arn
}
Recommendation

Use the public Terraform Registry modules as a starting point, not as a final solution. Registry modules like terraform-aws-modules/vpc/aws are well-tested and cover common patterns, but they expose hundreds of variables you may not need. For organizations with specific standards, fork or wrap registry modules in a thin internal module that enforces your defaults (naming conventions, tagging, encryption settings).

06

Providers

Providers are plugins that let Terraform interact with specific infrastructure platforms and services. Each provider is a separate binary that implements the Terraform plugin protocol, translating HCL resource definitions into API calls.

Provider configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 3.0"
    }
    proxmox = {
      source  = "bpg/proxmox"
      version = "~> 0.98"
    }
  }
}

provider "aws" {
  region = "us-east-1"
  # Authentication: uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars,
  # shared credentials file, IAM role, or SSO
}

# Multiple provider instances with aliases
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

resource "aws_instance" "west_server" {
  provider      = aws.west
  ami           = "ami-0abcdef1234567890"
  instance_type = "t3.micro"
}

Key providers

ProviderSourceUse case
AWShashicorp/awsEC2, S3, RDS, VPC, IAM, Lambda, EKS — the most feature-rich provider
Azure (azurerm)hashicorp/azurermVMs, Storage, AKS, Azure AD, networking — requires subscription_id
Google Cloudhashicorp/googleGCE, GKE, Cloud SQL, VPC, IAM — use google-beta for preview features
Kuberneteshashicorp/kubernetesDeployments, Services, ConfigMaps — works with any K8s cluster
Helmhashicorp/helmDeploy Helm charts via Terraform — useful for bootstrapping clusters
Proxmoxbpg/proxmoxVMs and containers on Proxmox VE — popular for homelab and on-prem
vSpherehashicorp/vsphereVMware vSphere VMs, datastores, networks — enterprise on-prem
Cloudflarecloudflare/cloudflareDNS records, WAF rules, Workers, tunnels

Authentication patterns

Environment variables

The preferred approach for CI/CD. Set AWS_ACCESS_KEY_ID, GOOGLE_CREDENTIALS, ARM_CLIENT_ID, etc. as environment variables. Keeps secrets out of code.

IAM roles / workload identity

Best for cloud-native execution. EC2 instance profiles, GKE workload identity, or Azure managed identity. No static credentials needed.

Shared credentials file

Uses ~/.aws/credentials or equivalent. Fine for local development. Never use in CI/CD or shared environments.

Hardcoded in config

Never put credentials directly in .tf files. They will end up in version control. Use environment variables or a secrets manager instead.

Provider lock file

Running terraform init generates a .terraform.lock.hcl file that records the exact provider versions and checksums used. Commit this file to version control. It ensures everyone on the team and CI/CD uses the same provider versions, preventing "works on my machine" issues. It is the equivalent of a package-lock.json or go.sum.

07

Workspaces & Environments

Managing multiple environments (dev, staging, prod) is one of the most common challenges in Terraform. There are two primary patterns: workspace-based and directory-based.

Terraform workspaces

Terraform workspaces allow you to maintain multiple state files from a single configuration directory. Each workspace has its own state, so resources created in the dev workspace are completely independent from those in prod.

# Create and switch workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod

# List workspaces
terraform workspace list

# Use workspace name in configuration
# terraform.workspace returns the current workspace name
locals {
  env_config = {
    dev     = { instance_type = "t3.micro",  count = 1 }
    staging = { instance_type = "t3.small",  count = 2 }
    prod    = { instance_type = "t3.large",  count = 3 }
  }
  config = local.env_config[terraform.workspace]
}

resource "aws_instance" "web" {
  count         = local.config.count
  instance_type = local.config.instance_type
  ami           = data.aws_ami.ubuntu.id
  tags          = { Environment = terraform.workspace }
}

Directory-based pattern

Instead of workspaces, use separate directories for each environment, each with their own backend configuration and variable values:

infrastructure/
  modules/
    vpc/
    compute/
    database/
  environments/
    dev/
      main.tf          # Calls modules with dev settings
      terraform.tfvars # Dev variable values
      backend.tf       # Dev state backend
    staging/
      main.tf
      terraform.tfvars
      backend.tf
    prod/
      main.tf
      terraform.tfvars
      backend.tf

When to use each

AspectWorkspacesDirectory-based
Config differencesSame config, different variablesCan differ per environment
State isolationSame backend, different state keysCompletely separate backends
CI/CD complexitySimpler — one pipeline, switch workspaceMore pipelines but clearer separation
Blast radiusEasier to accidentally apply to wrong workspaceHarder to make cross-environment mistakes
Best forSmall teams, identical environmentsLarger teams, environments with structural differences
Caution

Most experienced Terraform practitioners prefer the directory-based pattern for production. Workspaces have a foot-gun problem: there is nothing stopping you from being in the dev workspace and thinking you are in prod (or vice versa). The directory-based approach makes the environment explicit in the file path and makes it physically harder to run the wrong command against the wrong environment.

08

Terraform vs OpenTofu

OpenTofu is a fork of Terraform created in response to HashiCorp's BSL license change in August 2023. It is maintained by the Linux Foundation and is fully open source under the MPL-2.0 license. OpenTofu aims to be a drop-in replacement for Terraform, maintaining compatibility with existing Terraform configurations, providers, and modules.

Why OpenTofu exists

  • License freedom — MPL-2.0 allows unrestricted use, including building commercial products on top of it
  • Community governance — decisions made by the community and a steering committee, not a single company
  • Linux Foundation backing — provides organizational structure, funding, and legitimacy
  • Vendor neutrality — no single company controls the project's direction

Comparison

AspectTerraformOpenTofu
LicenseBSL 1.1 (not open source)MPL-2.0 (open source)
GovernanceHashiCorp (single company)Linux Foundation + steering committee
CLI commandterraformtofu
Config languageHCLHCL (identical syntax)
Provider compatibilityFull registry accessFull registry access (same providers)
State formatJSON state fileCompatible JSON state file
Unique featuresHCP Terraform integrationState encryption (native), provider-defined functions, early -parallelism improvements
Maturity10+ years, battle-testedForked late 2023, rapidly maturing
Enterprise supportHashiCorp (paid)Third-party vendors (Spacelift, env0, etc.)

Migration path

Migrating from Terraform to OpenTofu is straightforward for most projects:

  1. Install OpenTofu (tofu binary)
  2. Replace terraform with tofu in your commands
  3. Run tofu init to re-initialize (downloads the same providers)
  4. Run tofu plan to verify no changes are detected
  5. Update CI/CD pipelines to use tofu instead of terraform

State files are compatible in both directions. No state migration is needed.

Consultant guidance

For new projects, recommend OpenTofu if the client values open-source licensing and community governance. For existing Terraform deployments, there is no urgent need to migrate unless the BSL license is a legal concern (e.g., the client is building a competing IaC product). Both tools work identically for day-to-day infrastructure management. The ecosystem (providers, modules, documentation) is shared.

09

Terraform vs CloudFormation

For AWS-only environments, CloudFormation is the primary alternative to Terraform. The choice between them depends on multi-cloud requirements, team experience, and operational preferences.

Comparison

AspectTerraformCloudFormation
Cloud supportMulti-cloud (AWS, Azure, GCP, +3000 providers)AWS only
LanguageHCL (purpose-built, readable)JSON or YAML (verbose)
State managementSelf-managed (S3 + DynamoDB, etc.)AWS-managed (no state file to worry about)
Drift detectionManual (terraform plan)Built-in drift detection in console
RollbackNo native rollbackAutomatic rollback on stack failure
Preview changesterraform planChange sets
Speed of new featuresAWS provider updates within days/weeksSame-day support for new AWS services
ModularityModules (mature, registry)Nested stacks, StackSets (less flexible)
CostFree (BSL) / OpenTofu (free OSS)Free (AWS service)
Learning curveHCL + state + providersYAML/JSON + AWS concepts

When to use which

Choose Terraform when

  • Multi-cloud or hybrid cloud strategy
  • Managing non-AWS resources (Kubernetes, Cloudflare, Datadog, etc.)
  • Team already knows HCL
  • Need strong module ecosystem for reuse
  • Want consistent tooling across all infrastructure

Choose CloudFormation when

  • 100% AWS-only and will stay that way
  • Want zero state management overhead
  • Need automatic rollback on failures
  • Using AWS-native features like StackSets for multi-account
  • Team is more comfortable with YAML than learning HCL

Other alternatives

Pulumi

Uses general-purpose languages (TypeScript, Python, Go, C#) instead of a DSL. Same multi-cloud, state-managed approach as Terraform but with full programming language capabilities — loops, conditionals, unit tests, IDE autocomplete. Growing adoption, especially among teams that dislike DSLs. State managed via Pulumi Cloud or self-hosted backends.

AWS CDK

AWS Cloud Development Kit — write AWS infrastructure in TypeScript, Python, Java, Go, or C#, and it synthesizes to CloudFormation templates. Best of both worlds: programming language ergonomics with CloudFormation's managed state and rollback. AWS-only. CDK for Terraform (CDKTF) brings the same concept to Terraform providers.

Positioning

For most consulting engagements, Terraform is the default recommendation because it works everywhere. If a client is AWS-only, small, and wants minimal operational overhead, CloudFormation is perfectly fine — do not over-engineer with Terraform just because it is trendy. If a client's team is strong in TypeScript or Python and resistant to learning a DSL, consider Pulumi as a serious alternative rather than forcing HCL.

10

CI/CD Integration

Running Terraform in CI/CD pipelines is the standard operating model for teams. The pattern is simple: plan on merge request, apply on merge to main. This ensures changes are reviewed before being applied and that infrastructure changes follow the same review process as application code.

Standard pipeline pattern

Developer pushes branch | v +------------------+ | MR / PR created | +------------------+ | v +------------------+ | terraform init | | terraform plan | +------------------+ | v +------------------+ | Plan output | | posted as MR | | comment | +------------------+ | v +------------------+ | Review & approve | +------------------+ | v +------------------+ | Merge to main | +------------------+ | v +------------------+ | terraform apply | | -auto-approve | +------------------+

GitHub Actions example

name: Terraform
on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: infrastructure/prod
      - run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure/prod
      - uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan
            \`\`\`
            ${process.env.PLAN}
            \`\`\``;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

  apply:
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: infrastructure/prod
      - run: terraform apply -auto-approve
        working-directory: infrastructure/prod

GitLab CI example

stages:
  - validate
  - plan
  - apply

plan:
  stage: plan
  image: hashicorp/terraform:1.14
  script:
    - cd infrastructure/prod
    - terraform init
    - terraform plan -out=tfplan
  artifacts:
    paths:
      - infrastructure/prod/tfplan
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == "main"'

apply:
  stage: apply
  image: hashicorp/terraform:1.14
  script:
    - cd infrastructure/prod
    - terraform init
    - terraform apply -auto-approve tfplan
  dependencies:
    - plan
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
  when: manual

Atlantis

Atlantis is a self-hosted application that automates Terraform via pull request comments. Instead of building custom CI/CD pipelines, you deploy Atlantis and interact with Terraform through PR comments like atlantis plan and atlantis apply. Atlantis handles state locking, plan output, and apply execution. It is popular with teams that want a lightweight, GitOps-style workflow without the complexity of HCP Terraform.

HCP Terraform (formerly Terraform Cloud)

HashiCorp's hosted platform for running Terraform, renamed from Terraform Cloud to HCP Terraform in April 2024. Provides remote execution, state management, policy enforcement (Sentinel/OPA), VCS integration, a private module registry, and cost estimation. The free tier supports up to 500 managed resources. For teams that want a fully managed Terraform experience and are comfortable with HashiCorp vendor lock-in, HCP Terraform eliminates the need to build your own CI/CD pipeline for Terraform.

Recommendation

For most teams, GitHub Actions or GitLab CI with a simple plan/apply pipeline is sufficient. Atlantis is excellent for teams with many repositories and contributors who want self-service infrastructure changes. HCP Terraform is worth evaluating if the client needs policy enforcement, cost estimation, and does not want to manage CI/CD pipelines for infrastructure. Do not over-engineer the pipeline — the goal is reviewed, auditable infrastructure changes.

11

Best Practices & Security

.gitignore essentials

Every Terraform repository must have a proper .gitignore:

# Terraform .gitignore
*.tfstate
*.tfstate.*
*.tfvars          # May contain secrets
.terraform/       # Downloaded providers and modules
crash.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# DO commit these:
# .terraform.lock.hcl  (provider lock file)

Secrets management

Do

  • Use environment variables for provider credentials
  • Store secrets in AWS Secrets Manager, Vault, or similar
  • Use sensitive = true on variables containing secrets
  • Encrypt state at rest (S3 SSE, GCS encryption)
  • Restrict access to the state bucket/backend
  • Use IAM roles instead of static credentials

Do not

  • Commit .tfstate files to Git
  • Hardcode credentials in .tf files
  • Share state files over Slack, email, or shared drives
  • Use the same credentials for dev and prod
  • Ignore that state files contain secrets in plaintext
  • Commit .tfvars files with sensitive values

Least-privilege IAM for Terraform

Terraform typically needs broad permissions to create and manage infrastructure, but the permissions should be scoped per environment and per pipeline:

  • Separate IAM roles per environment — the dev pipeline should not have permissions to touch prod resources
  • Narrower permissions for plan — the MR pipeline needs read access to cloud APIs plus write access to the state backend (for refresh). Full create/update/delete permissions are only needed for apply
  • Scope to specific services — if a Terraform project only manages networking, the IAM role should not have EC2 or RDS permissions
  • Use OIDC federation — GitHub Actions and GitLab CI both support OIDC for assuming AWS/GCP/Azure roles without static credentials

Dependency pinning

terraform {
  required_version = "~> 1.14.0"  # Pin Terraform version

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"          # Pin provider version
    }
  }
}

# Always commit .terraform.lock.hcl to lock exact versions

Code review for plans

  • Always review the plan output before approving an apply — this is your last line of defense
  • Look for unexpected destroy or replace actions — these indicate breaking changes
  • Watch for ~ (update in-place) on sensitive resources like databases or load balancers
  • Verify that the number of changes matches expectations — a "simple tag change" that shows 47 resources changing is a red flag
  • Use terraform plan -target=resource.name to limit scope when debugging specific changes

Operational hygiene

  • Use terraform fmt in CI to enforce consistent formatting
  • Use terraform validate to catch syntax errors before plan
  • Use tflint for linting rules beyond what validate catches (deprecated arguments, naming conventions)
  • Use checkov or trivy config for security scanning of Terraform configs (tfsec is deprecated and merged into Trivy)
  • Use terraform-docs to auto-generate module documentation
  • Use infracost to estimate cost impact of changes in PRs
Security reminder

The Terraform state file contains every secret value Terraform manages — database passwords, API keys, TLS certificates, etc. — in plaintext JSON. Treat the state backend with the same security posture as your secrets manager. Encrypt at rest, restrict access to authorized pipelines and operators only, enable versioning for recovery, and audit access logs.

12

Consultant's Checklist

Use this checklist when assessing or setting up Terraform for a client engagement.

Foundation

  • Remote state backend configured with encryption and locking
  • State bucket/backend access restricted to CI/CD and authorized operators
  • Terraform and provider versions pinned in required_version and required_providers
  • .terraform.lock.hcl committed to version control
  • .gitignore excludes state files, .terraform/, and sensitive .tfvars
  • Directory structure established (modules, environments, or workspaces)

CI/CD

  • Plan runs on every merge request / pull request
  • Plan output posted as MR/PR comment for review
  • Apply runs only on merge to main (or manual approval)
  • OIDC federation used for cloud provider authentication (no static credentials)
  • Separate IAM roles per environment
  • State locking prevents concurrent applies

Code quality

  • terraform fmt enforced in CI
  • terraform validate run before plan
  • tflint or equivalent linter configured
  • Security scanner (checkov, trivy) in pipeline
  • Modules documented with terraform-docs
  • Variables have descriptions and type constraints

Security

  • No credentials in .tf files or version control
  • State encrypted at rest and access-controlled
  • Sensitive variables marked with sensitive = true
  • Secrets stored in Vault / Secrets Manager, not in Terraform variables
  • Least-privilege IAM roles for Terraform execution
  • Plan output reviewed before every apply — no blind auto-approve

Decision points

  • Terraform vs OpenTofu? — If OSS licensing matters, use OpenTofu. Otherwise, either works identically.
  • Terraform vs CloudFormation? — Multi-cloud = Terraform. AWS-only with zero state overhead preference = CloudFormation.
  • Workspaces vs directories? — Small team with identical environments = workspaces. Larger teams or differing environments = directories.
  • HCP Terraform vs self-hosted CI/CD? — If the client wants managed policy enforcement and cost estimation, HCP Terraform. Otherwise, GitHub Actions / GitLab CI is simpler and cheaper.
  • Atlantis vs custom pipeline? — Many repos and contributors wanting self-service = Atlantis. Small team with few repos = custom pipeline.