Ansible Production Guide

Overview

Ansible is an open-source, agentless automation tool that uses SSH (or WinRM/SSH for Windows) to configure systems, deploy software, and orchestrate complex workflows. Everything is defined in YAML — no proprietary DSL, no compiled agents, no daemons running on managed nodes. You write a playbook, run it, and Ansible connects to your targets over SSH, executes tasks, and reports back.

The core design principle is idempotency: running the same playbook multiple times produces the same result. If a package is already installed, Ansible skips it. If a file already has the correct content, Ansible leaves it alone. This makes Ansible safe to re-run and suitable for drift correction.

Red Hat Ansible vs community

Ansible exists in two forms. The community project (ansible-core) is free, open-source, and maintained on GitHub. Red Hat Ansible Automation Platform (AAP) is the commercial product that bundles ansible-core with AWX/AAP Controller (web UI, RBAC, scheduling), Automation Hub (curated content), and enterprise support. Most teams start with community Ansible and move to AAP when they need centralized management, audit trails, or role-based access for multiple teams.

Why Ansible is popular

Strengths

Agentless — Nothing to install on managed nodes. Just SSH and Python.
YAML-based — Playbooks are human-readable, version-controllable, and reviewable in PRs
Low barrier to entry — A sysadmin can be productive in hours, not weeks
Massive module library — Thousands of modules for cloud, networking, containers, databases, security
Idempotent by default — Safe to re-run, enables drift correction
Works everywhere — Linux, Windows, network devices, cloud APIs, containers
Red Hat backing — Enterprise support, certified content, long-term roadmap

Considerations

Performance at scale — SSH-based execution is slower than agent-based tools for 1000+ nodes
State management — No built-in state file (unlike Terraform). You describe desired state, but Ansible does not track what it previously did.
Error handling — YAML playbooks can get complex with deep conditional logic and error recovery
Windows support — Works via WinRM or SSH (officially supported since ansible-core 2.18). Improving rapidly, especially with OpenSSH built into Windows Server 2025, but still not as mature as Linux support
Secret management — Ansible Vault is basic; many teams pair it with HashiCorp Vault or cloud KMS
No built-in drift detection — Ansible enforces state when run but does not continuously monitor for drift between runs. Event-Driven Ansible (EDA) can help by triggering remediation playbooks in response to external events.

Ansible vs other tools

Feature	Ansible	Terraform	Puppet	Chef
Architecture	Agentless (SSH/WinRM)	Agentless (API)	Agent-based (with agentless option)	Agent-based
Language	YAML	HCL	Puppet DSL (Ruby)	Ruby DSL
Primary use	Config mgmt + orchestration	Infrastructure provisioning	Config mgmt	Config mgmt
State	Stateless (desired state per run)	State file	Agent reports	Agent reports
Learning curve	Low	Medium	High	High
Idempotency	Module-level	Built-in	Built-in	Built-in

Positioning

Ansible and Terraform are complementary, not competing. Terraform provisions infrastructure (VMs, networks, load balancers). Ansible configures what runs on that infrastructure (packages, users, services, files). A common pattern is Terraform to create the VMs, then Ansible to configure them. Trying to use Ansible for cloud infrastructure provisioning or Terraform for OS-level configuration leads to pain.

How Ansible Works

Ansible follows a push-based model. You run ansible-playbook on a control node (your laptop, a CI runner, a bastion host), and it pushes configuration to managed nodes over SSH. There is no central server, no agent, no pull schedule. You decide when to run it.

Architecture

+--------------------------------------------------+ | Control Node | | (laptop, CI runner, bastion host) | | | | ansible-playbook site.yml -i inventory | | | | +------------+ +----------+ +-----------+ | | | Playbook | | Inventory| | ansible. | | | | (YAML) | | (hosts) | | cfg | | | +------------+ +----------+ +-----------+ | +----------+---------------------------------------+ | | SSH (Linux/Windows) / WinRM (Windows) | +-----+-----+-----+-----+ | | | | | v v v v v +----+ +----+ +----+ +----+ +----+ |Node| |Node| |Node| |Node| |Node| | 1 | | 2 | | 3 | | 4 | | 5 | +----+ +----+ +----+ +----+ +----+ (Only requirement: Python + SSH)

Execution flow

When you run ansible-playbook site.yml, this is what happens under the hood:

Parse — Ansible reads the playbook YAML, resolves variables, loads roles, and builds a list of plays
Inventory — Reads the inventory file (or dynamic inventory script) to determine which hosts to target
Fact gathering — Connects to each host via SSH and runs the setup module to collect system facts (OS, IP, memory, disk, etc.)
Task execution — For each task in each play, Ansible generates a small Python script, copies it to the remote host via SFTP/SCP, executes it, captures the output, and removes the script
Result collection — Each task returns JSON with status (changed, ok, failed, skipped). Ansible aggregates results and proceeds to the next task.
Handler notification — If a task reports "changed" and notifies a handler (e.g., restart nginx), the handler runs at the end of the play

Modules and plugins

Modules are the units of work. Each task calls one module (e.g., apt, copy, service). Modules are idempotent — they check current state and only make changes if needed. Modules execute on the remote host.

Plugins extend Ansible's core behavior and run on the control node. Types include connection plugins (SSH, WinRM, Docker), lookup plugins (read from files, environment, Vault), callback plugins (custom output formatting), and filter plugins (Jinja2 filters for data transformation).

Python requirement

Ansible modules are Python scripts that execute on the target. Most modules require Python 3 on managed nodes (Python 2 support was dropped after ansible-core 2.16). The exact minimum Python version depends on your ansible-core release — check the ansible-core support matrix for details. Python is usually already present on modern Linux distributions. For minimal or embedded systems without Python, Ansible provides the raw module which sends raw shell commands without requiring Python, and the script module which copies and executes a script in any language.

Key insight

Ansible is fundamentally an SSH automation framework. Everything it does could be done manually by SSHing to each host and running commands. Ansible provides structure (playbooks), safety (idempotency), scale (parallel execution across hundreds of hosts), and repeatability (version-controlled YAML). If SSH works, Ansible works.

Inventory

The inventory defines which hosts Ansible manages and how to connect to them. It can be a static file (INI or YAML format), a dynamic script that queries a cloud API, or a plugin that reads from an external source. The inventory also defines groups, which let you target subsets of hosts with specific plays.

INI format (traditional)

# inventory/hosts.ini

[webservers]
web1.example.com
web2.example.com
web3.example.com ansible_port=2222

[dbservers]
db1.example.com ansible_user=postgres
db2.example.com ansible_user=postgres

[loadbalancers]
lb1.example.com

# Group of groups
[production:children]
webservers
dbservers
loadbalancers

# Variables for all hosts in a group
[webservers:vars]
http_port=8080
max_connections=1000

[all:vars]
ansible_python_interpreter=/usr/bin/python3

YAML format (preferred)

# inventory/hosts.yml
all:
  vars:
    ansible_python_interpreter: /usr/bin/python3
  children:
    production:
      children:
        webservers:
          vars:
            http_port: 8080
            max_connections: 1000
          hosts:
            web1.example.com:
            web2.example.com:
            web3.example.com:
              ansible_port: 2222
        dbservers:
          vars:
            ansible_user: postgres
          hosts:
            db1.example.com:
            db2.example.com:
        loadbalancers:
          hosts:
            lb1.example.com:

Dynamic inventory

For cloud environments where hosts are ephemeral, static files become stale immediately. Dynamic inventory plugins query cloud APIs in real time to build the host list.

# inventory/aws_ec2.yml (dynamic inventory plugin)
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
  - us-west-2
keyed_groups:
  - key: tags.Environment
    prefix: env
  - key: tags.Role
    prefix: role
  - key: placement.availability_zone
    prefix: az
filters:
  instance-state-name: running
  "tag:ManagedBy": ansible
compose:
  ansible_host: private_ip_address

# Test dynamic inventory
ansible-inventory -i inventory/aws_ec2.yml --graph
ansible-inventory -i inventory/aws_ec2.yml --list

group_vars and host_vars

Variables can be defined per-group or per-host in separate files. Ansible automatically loads them based on directory structure:

# Directory structure
inventory/
  hosts.yml
  group_vars/
    all.yml            # Variables for every host
    webservers.yml     # Variables for webservers group
    dbservers.yml      # Variables for dbservers group
    production.yml     # Variables for production group
  host_vars/
    web1.example.com.yml   # Variables for this specific host
    db1.example.com.yml

# inventory/group_vars/webservers.yml
nginx_version: "1.28"
ssl_certificate_path: /etc/ssl/certs/app.crt
worker_processes: auto
worker_connections: 4096

Inventory patterns

# Target specific groups or hosts
ansible-playbook site.yml -i inventory/ -l webservers       # only webservers
ansible-playbook site.yml -i inventory/ -l 'webservers:&production'  # intersection
ansible-playbook site.yml -i inventory/ -l 'webservers:!web3.example.com'  # exclude
ansible-playbook site.yml -i inventory/ -l '*.example.com'  # wildcard

Recommendation

Use YAML format for inventory — it is consistent with playbooks and supports complex data structures. Use group_vars/host_vars directories rather than inline variables in the inventory file. This keeps secrets separate (you can vault-encrypt individual var files) and makes the inventory readable. For cloud environments, always use dynamic inventory — static files for ephemeral VMs are a maintenance nightmare.

Playbooks

A playbook is a YAML file containing one or more plays. Each play targets a group of hosts and defines a list of tasks to execute. Tasks call modules, and the order of tasks in a play is the order of execution. Playbooks are the core of Ansible — they are the automation scripts that define your infrastructure as code.

Playbook structure

# site.yml - a realistic multi-task playbook
---
- name: Configure web servers
  hosts: webservers
  become: true
  gather_facts: true
  vars:
    app_port: 8080
    app_user: appuser

  pre_tasks:
    - name: Update apt cache
      apt:
        update_cache: true
        cache_valid_time: 3600

  tasks:
    - name: Install required packages
      apt:
        name:
          - nginx
          - python3-pip
          - certbot
        state: present

    - name: Create application user
      user:
        name: "{{ app_user }}"
        shell: /bin/bash
        create_home: true
        system: true

    - name: Deploy nginx configuration
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/sites-available/default
        owner: root
        group: root
        mode: '0644'
      notify: Reload nginx

    - name: Deploy application config
      template:
        src: templates/app.conf.j2
        dest: "/home/{{ app_user }}/app.conf"
        owner: "{{ app_user }}"
        mode: '0600'
      notify: Restart application

    - name: Ensure nginx is enabled and running
      service:
        name: nginx
        state: started
        enabled: true

    - name: Open firewall ports
      ufw:
        rule: allow
        port: "{{ item }}"
        proto: tcp
      loop:
        - '80'
        - '443'
        - "{{ app_port }}"

  handlers:
    - name: Reload nginx
      service:
        name: nginx
        state: reloaded

    - name: Restart application
      systemd:
        name: myapp
        state: restarted
        daemon_reload: true

- name: Configure database servers
  hosts: dbservers
  become: true
  roles:
    - role: geerlingguy.postgresql
      vars:
        postgresql_version: "16"
        postgresql_databases:
          - name: myapp
        postgresql_users:
          - name: myapp
            password: "{{ vault_db_password }}"

Key playbook concepts

Tasks

Tasks are the individual actions. Each task calls one module. Tasks run in order, and Ansible stops on the first failure (unless ignore_errors: true is set). Tasks should have a descriptive name for readability in output.

Handlers

Handlers are tasks that only run when notified by another task that reported "changed". They run once at the end of the play, regardless of how many tasks notify them. Common use: restarting a service after config changes.

Become

become: true escalates privileges (sudo). Can be set at the play level or per-task. Use become_user to become a specific user. The connecting user must have sudo access on the target.

Includes and imports

# Import is static (resolved at parse time)
- import_tasks: tasks/common.yml

# Include is dynamic (resolved at runtime, supports loops and conditionals)
- include_tasks: "tasks/{{ ansible_os_family | lower }}.yml"

# Import a playbook
- import_playbook: webservers.yml
- import_playbook: dbservers.yml

Import vs Include

import_* is static — resolved at playbook parse time. Tags and conditions on an import apply to all tasks inside it. include_* is dynamic — resolved at runtime. This means you can use variables in the filename, but tags on the include statement itself do not propagate to tasks within the included file. To push tags into included tasks, use the apply keyword (e.g., include_tasks: file: db.yml apply: tags: db). Use imports for static structure, includes for dynamic/conditional loading.

Variables & Facts

Variables in Ansible come from many sources, and understanding variable precedence is critical. Ansible has 22 levels of variable precedence. When the same variable is defined in multiple places, the highest-precedence source wins.

Variable precedence (simplified, highest wins)

Priority	Source	Notes
Highest	`--extra-vars` (`-e`)	Command line. Always wins. Use for overrides and CI/CD.
High	Task `vars` (block/task level)	Scoped to specific tasks
High	`include_vars` / `set_fact`	Runtime-defined variables
Medium	Play `vars`, `vars_files`, `vars_prompt`	Defined in the playbook
Medium	Host facts (`ansible_*`)	Gathered from target system
Low-Med	`host_vars/*`	Per-host variable files
Low-Med	`group_vars/*`	Per-group variable files (child groups override parents)
Low	Inventory variables	Defined inline in inventory
Low	Role defaults (`defaults/main.yml`)	Designed to be overridden. Lowest role-level precedence.
Lowest	Command line defaults	Ansible configuration defaults

Ansible facts

When gather_facts: true (the default), Ansible runs the setup module on each host to collect system information. Facts are available as variables prefixed with ansible_:

# Common facts
ansible_hostname          # web1
ansible_fqdn              # web1.example.com
ansible_distribution      # Ubuntu
ansible_distribution_version  # 22.04
ansible_os_family         # Debian
ansible_memtotal_mb       # 8192
ansible_processor_vcpus   # 4
ansible_default_ipv4.address  # 10.0.1.50
ansible_devices            # disk info
ansible_mounts             # mounted filesystems

# Use facts in templates and conditionals
- name: Install packages (Debian)
  apt:
    name: nginx
    state: present
  when: ansible_os_family == "Debian"

- name: Install packages (RedHat)
  dnf:
    name: nginx
    state: present
  when: ansible_os_family == "RedHat"

Registered variables

- name: Check if application is running
  command: systemctl is-active myapp
  register: app_status
  ignore_errors: true

- name: Start application if not running
  service:
    name: myapp
    state: started
  when: app_status.rc != 0

- name: Debug output
  debug:
    msg: "App status: {{ app_status.stdout }}, return code: {{ app_status.rc }}"

Jinja2 templating

# Variable interpolation
message: "Hello {{ username }}"

# Filters
ip_list: "{{ groups['webservers'] | map('extract', hostvars, 'ansible_host') | list }}"
config_hash: "{{ lookup('file', 'app.conf') | hash('sha256') }}"
default_value: "{{ custom_port | default(8080) }}"

# Conditionals in templates (Jinja2)
{% if environment == 'production' %}
log_level: warn
{% else %}
log_level: debug
{% endif %}

# Loops in templates
{% for host in groups['webservers'] %}
server {{ hostvars[host]['ansible_host'] }}:{{ http_port }};
{% endfor %}

Magic variables

Ansible provides special built-in variables that are always available:

inventory_hostname — The name of the current host as defined in inventory
groups — Dictionary of all groups and their host lists
hostvars — Dictionary of all host variables (access another host's vars)
play_hosts — List of hosts in the current play
ansible_play_batch — List of hosts in the current batch (respects serial)
role_path — Path to the current role directory

Variable Debugging

When a variable has an unexpected value, use ansible-playbook site.yml -e @vars.yml --check -vvv to see where variables come from. The debug module with var: is your best friend. For complex precedence issues, remember: extra vars always win, role defaults always lose. Everything else is a spectrum in between.

Roles & Galaxy

Roles are Ansible's mechanism for organizing playbook content into reusable, shareable units. A role bundles tasks, handlers, templates, files, variables, and defaults into a standard directory structure. Instead of a 500-line playbook, you have small, focused roles that can be composed together.

Role directory structure

roles/
  webserver/
    tasks/
      main.yml          # Entry point - task list
      install.yml        # Included by main.yml
      configure.yml
    handlers/
      main.yml          # Handlers (restart services, etc.)
    templates/
      nginx.conf.j2     # Jinja2 templates
      vhost.conf.j2
    files/
      index.html        # Static files to copy
    vars/
      main.yml          # Role variables (high precedence)
    defaults/
      main.yml          # Default variables (low precedence, meant to be overridden)
    meta/
      main.yml          # Role metadata, dependencies, Galaxy info
    tests/
      test.yml          # Test playbook
    README.md

Using roles in playbooks

---
- name: Configure web servers
  hosts: webservers
  become: true
  roles:
    # Simple role inclusion
    - webserver

    # Role with variables
    - role: webserver
      vars:
        nginx_port: 8080
        ssl_enabled: true

    # Role with conditional
    - role: monitoring
      when: enable_monitoring | default(true)

    # Role with tags
    - role: security
      tags: [security, hardening]

Ansible Galaxy

Ansible Galaxy is the public repository for community-shared roles and collections. Instead of writing everything from scratch, you can use battle-tested roles from the community.

# Install a role from Galaxy
ansible-galaxy install geerlingguy.docker
ansible-galaxy install geerlingguy.postgresql

# Install a collection
ansible-galaxy collection install community.general
ansible-galaxy collection install amazon.aws

# Install from a requirements file
ansible-galaxy install -r requirements.yml
ansible-galaxy collection install -r requirements.yml

# requirements.yml
roles:
  - name: geerlingguy.docker
    version: "7.1.0"
  - name: geerlingguy.postgresql
    version: "4.0.3"
  - name: geerlingguy.certbot
    version: "5.1.0"

collections:
  - name: community.general
    version: ">=8.0.0"
  - name: amazon.aws
    version: ">=7.0.0"
  - name: ansible.posix
    version: ">=1.5.0"

Collections vs roles

Roles

Bundle tasks, templates, handlers, and variables
One role = one purpose (e.g., install nginx)
Can contain custom modules (in library/) and plugins, but collections are the preferred distribution format for reusable modules/plugins
Installed to ~/.ansible/roles/ or project roles/
Simpler, focused on playbook organization

Collections

Bundle roles, modules, plugins, and playbooks together
Namespaced: amazon.aws, community.general
Can contain custom modules and plugins
The modern distribution format for Ansible content
Installed to ~/.ansible/collections/

Best Practice

Pin versions in requirements.yml. An unpinned Galaxy role can break your playbook when the author pushes a breaking change. Use version constraints (version: "7.1.0" or version: ">=7.0.0,<8.0.0") and test upgrades explicitly. Treat Galaxy roles the same way you treat library dependencies in application code — pin, test, upgrade deliberately.

Modules & Plugins

Ansible ships with thousands of modules. Knowing which module to use for a given task is the difference between clean, idempotent automation and fragile shell scripts wrapped in YAML. Here are the modules you will use most often.

Essential modules

Module	Purpose	Example
`apt` / `yum` / `dnf`	Package management	`apt: name=nginx state=present`
`copy`	Copy files to remote	`copy: src=app.conf dest=/etc/app.conf`
`template`	Deploy Jinja2 templates	`template: src=nginx.conf.j2 dest=/etc/nginx/nginx.conf`
`file`	Manage files/dirs/links	`file: path=/data state=directory mode='0755'`
`service` / `systemd`	Manage services	`service: name=nginx state=started enabled=true`
`user`	Manage user accounts	`user: name=deploy shell=/bin/bash`
`lineinfile`	Ensure a line in a file	`lineinfile: path=/etc/hosts line="10.0.1.5 db1"`
`uri`	HTTP requests	`uri: url=https://api.example.com method=GET`
`command`	Run a command (no shell)	`command: /usr/bin/myapp --init`
`shell`	Run via shell (pipes, redirects)	`shell: cat /etc/hosts \| grep db`

When to use command/shell vs dedicated modules

Anti-Pattern

Do not use command or shell when a dedicated module exists. For example, shell: apt-get install -y nginx is not idempotent — it runs every time. apt: name=nginx state=present checks first and only installs if needed. Use command/shell only when no module exists for your use case, and always add creates, removes, or when conditions to make them idempotent.

# BAD - not idempotent, runs every time
- name: Install nginx
  shell: apt-get install -y nginx

# GOOD - idempotent, checks state first
- name: Install nginx
  apt:
    name: nginx
    state: present

# ACCEPTABLE - command with idempotency guard
- name: Initialize the application database
  command: /opt/myapp/bin/init-db.sh
  args:
    creates: /opt/myapp/data/.initialized  # Skip if this file exists

# ACCEPTABLE - shell with conditional
- name: Check if cluster is healthy
  shell: kubectl get nodes | grep -c Ready
  register: node_count
  changed_when: false  # This is a read-only check

Template module deep dive

The template module is one of Ansible's most powerful features. It takes a Jinja2 template file and renders it with Ansible variables, then deploys the result to the remote host.

# templates/nginx.conf.j2
upstream app_servers {
{% for host in groups['webservers'] %}
    server {{ hostvars[host]['ansible_host'] }}:{{ app_port }};
{% endfor %}
}

server {
    listen {{ nginx_port | default(80) }};
    server_name {{ server_name }};

    location / {
        proxy_pass http://app_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
{% if ssl_enabled | default(false) %}
    listen 443 ssl;
    ssl_certificate {{ ssl_cert_path }};
    ssl_certificate_key {{ ssl_key_path }};
{% endif %}
}

Custom modules

When no built-in module fits your needs, you can write custom modules in Python. Place them in a library/ directory next to your playbook or in a collection. Custom modules receive arguments as JSON, do their work, and return JSON results with changed, failed, and msg fields. Use the ansible.module_utils.basic.AnsibleModule class for argument parsing, check mode support, and result handling.

Module Documentation

Use ansible-doc <module_name> to see full documentation, examples, and return values for any module. For example, ansible-doc template shows all parameters, defaults, and usage examples. This is faster than searching the web and works offline.

CI/CD Integration

Running Ansible in CI/CD pipelines is the standard way to automate deployments. The pattern is straightforward: your pipeline checks out the playbook repo, installs Ansible, and runs ansible-playbook with the appropriate inventory and vault credentials. The challenge is managing SSH keys, secrets, and inventory in a CI environment.

GitHub Actions

# .github/workflows/deploy.yml
name: Deploy Application
on:
  push:
    branches: [main]
  workflow_dispatch:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Ansible
        run: |
          pip install ansible boto3

      - name: Install Galaxy dependencies
        run: ansible-galaxy install -r requirements.yml

      - name: Set up SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa
          ssh-keyscan -H ${{ secrets.DEPLOY_HOST }} >> ~/.ssh/known_hosts

      - name: Run playbook
        env:
          ANSIBLE_VAULT_PASSWORD: ${{ secrets.VAULT_PASSWORD }}
        run: |
          echo "$ANSIBLE_VAULT_PASSWORD" > .vault_pass
          ansible-playbook site.yml \
            -i inventory/production/ \
            --vault-password-file .vault_pass \
            -e "app_version=${{ github.sha }}"
          rm -f .vault_pass

GitLab CI

# .gitlab-ci.yml
stages:
  - lint
  - deploy

lint:
  stage: lint
  image: python:3.11
  script:
    - pip install ansible-lint
    - ansible-lint site.yml

deploy_staging:
  stage: deploy
  image: python:3.11
  environment:
    name: staging
  before_script:
    - pip install ansible boto3
    - ansible-galaxy install -r requirements.yml
    - mkdir -p ~/.ssh
    - echo "$SSH_PRIVATE_KEY" > ~/.ssh/id_rsa
    - chmod 600 ~/.ssh/id_rsa
    - echo "$VAULT_PASSWORD" > .vault_pass
  script:
    - ansible-playbook site.yml
        -i inventory/staging/
        --vault-password-file .vault_pass
        -e "app_version=${CI_COMMIT_SHA}"
  after_script:
    - rm -f .vault_pass ~/.ssh/id_rsa
  only:
    - main

Ansible in Docker

# Dockerfile for an Ansible runner
FROM python:3.11-slim

RUN pip install --no-cache-dir \
    ansible-core \
    boto3 \
    jmespath \
    ansible-lint

COPY requirements.yml /ansible/
RUN ansible-galaxy install -r /ansible/requirements.yml && \
    ansible-galaxy collection install -r /ansible/requirements.yml

WORKDIR /ansible
ENTRYPOINT ["ansible-playbook"]

# Run Ansible from Docker
docker run --rm \
  -v $(pwd):/ansible \
  -v ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro \
  -e ANSIBLE_HOST_KEY_CHECKING=false \
  my-ansible-runner site.yml -i inventory/production/

AWX / Ansible Automation Platform

For teams that need a web UI, RBAC, job scheduling, and audit trails, AWX (open-source) or Ansible Automation Platform (Red Hat commercial) provides a centralized platform. AWX stores credentials securely, manages inventories, and provides a REST API for triggering playbook runs from other systems. It is essentially "Jenkins for Ansible" but purpose-built.

Secrets in CI/CD

Never store vault passwords or SSH keys in your repository. Use your CI/CD platform's secret management (GitHub Secrets, GitLab CI Variables, etc.). The vault password should be injected at runtime via an environment variable or a temporary file that is cleaned up after the run. For production, consider integrating with HashiCorp Vault or cloud KMS to fetch secrets dynamically during playbook execution using lookup plugins.

Ansible Vault

Ansible Vault provides encryption for sensitive data such as passwords, API keys, and certificates. It uses AES-256 symmetric encryption. You encrypt files or individual strings with a vault password, commit the encrypted content to version control, and provide the vault password at runtime to decrypt.

Encrypting files

# Encrypt an entire file
ansible-vault encrypt group_vars/production/secrets.yml

# Create a new encrypted file
ansible-vault create group_vars/production/secrets.yml

# Edit an encrypted file (decrypts in-place for editing)
ansible-vault edit group_vars/production/secrets.yml

# View encrypted file contents
ansible-vault view group_vars/production/secrets.yml

# Decrypt a file permanently
ansible-vault decrypt group_vars/production/secrets.yml

# Re-key (change the vault password)
ansible-vault rekey group_vars/production/secrets.yml

Encrypting individual strings

# Encrypt a single string (inline in a YAML file)
ansible-vault encrypt_string 'SuperSecretPassword123' --name 'db_password'

# Output (paste this into your vars file):
# db_password: !vault |
#   $ANSIBLE_VAULT;1.1;AES256
#   62313365396662343061393464336163...

# group_vars/production/secrets.yml (mix of plain and encrypted)
app_environment: production
app_debug: false
db_password: !vault |
  $ANSIBLE_VAULT;1.1;AES256
  62313365396662343061393464336163383764356462376564656232...
api_key: !vault |
  $ANSIBLE_VAULT;1.1;AES256
  33356134653765633035313038376432336531303365616438...

Vault IDs (multiple passwords)

Vault IDs let you use different passwords for different environments or sensitivity levels:

# Encrypt with a vault ID
ansible-vault encrypt --vault-id prod@prompt group_vars/production/secrets.yml
ansible-vault encrypt --vault-id dev@prompt group_vars/staging/secrets.yml

# Use a password file per environment
ansible-vault encrypt --vault-id prod@.vault_pass_prod secrets.yml

# Run playbook with multiple vault IDs
ansible-playbook site.yml \
  --vault-id dev@.vault_pass_dev \
  --vault-id prod@.vault_pass_prod

Using vault in playbooks

# Provide vault password interactively
ansible-playbook site.yml --ask-vault-pass

# Provide vault password from a file
ansible-playbook site.yml --vault-password-file .vault_pass

# Provide vault password from an environment variable (CI/CD pattern)
echo "$VAULT_PASSWORD" > /tmp/vault_pass
ansible-playbook site.yml --vault-password-file /tmp/vault_pass
rm -f /tmp/vault_pass

# Or use a script that outputs the password
ansible-playbook site.yml --vault-password-file get_vault_pass.sh

Vault Limitations

Ansible Vault is file-level encryption, not a secrets manager. It does not support access control, audit logs, secret rotation, or dynamic secrets. For production environments, pair Vault-encrypted files for static config with a proper secrets manager (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) for dynamic secrets. Use the community.hashi_vault.hashi_vault lookup plugin to fetch secrets at runtime without storing them in files at all.

Best Practices

Recommended directory layout

ansible-project/
  ansible.cfg              # Project-level Ansible configuration
  site.yml                 # Main playbook (imports others)
  webservers.yml           # Playbook for web tier
  dbservers.yml            # Playbook for database tier
  requirements.yml         # Galaxy role/collection dependencies
  inventory/
    production/
      hosts.yml            # Production inventory
      group_vars/
        all.yml
        webservers.yml
        dbservers/
          main.yml
          vault.yml        # Encrypted secrets
      host_vars/
    staging/
      hosts.yml
      group_vars/
  roles/
    common/                # Shared role (NTP, users, packages)
    webserver/             # Web server role
    database/              # Database role
  playbooks/               # Additional playbooks
    rolling-update.yml
    backup.yml
  templates/               # Global templates (if not in roles)
  files/                   # Global static files
  library/                 # Custom modules
  filter_plugins/          # Custom Jinja2 filters

Idempotency checklist

Use dedicated modules instead of command/shell whenever possible
When using command/shell, add creates:, removes:, or when: guards
Mark read-only commands with changed_when: false
Use state: present / state: absent instead of install/remove commands
Test with --check mode — a properly idempotent playbook should show zero changes on the second run

Check mode and diff mode

# Dry run - show what WOULD change without making changes
ansible-playbook site.yml --check

# Diff mode - show the exact changes (file diffs)
ansible-playbook site.yml --check --diff

# Combine with limit for safe testing
ansible-playbook site.yml --check --diff --limit web1.example.com

# Some tasks don't support check mode - mark them:
# check_mode: false  (always run, even in check mode)
# check_mode: true   (only run in check mode)

Linting with ansible-lint

# Install
pip install ansible-lint

# Run against a playbook
ansible-lint site.yml

# Run against all YAML in the project
ansible-lint

# Common rules it catches:
# - Using command/shell instead of a dedicated module
# - Missing name on tasks
# - Using deprecated syntax
# - Trailing whitespace
# - Risky file permissions
# - Using bare variables in when clauses

# .ansible-lint (configuration file)
skip_list:
  - yaml[line-length]
  - name[casing]
warn_list:
  - experimental
exclude_paths:
  - .cache/
  - .github/
  - molecule/

Testing with Molecule

Molecule is the standard testing framework for Ansible roles. It creates ephemeral test instances (Docker containers, VMs, cloud instances), runs your role against them, and verifies the result with testinfra or ansible assertions.

# Initialize Molecule for an existing role
cd roles/webserver
molecule init scenario  # configure driver in molecule.yml (default: docker)

# Run the full test lifecycle
molecule test
# This runs: create -> converge -> idempotence -> verify -> destroy

# Run individual steps for development
molecule create       # Spin up test containers
molecule converge     # Run the role
molecule idempotence  # Run again, verify zero changes
molecule verify       # Run verification tests
molecule destroy      # Clean up

Golden Rule

A good Ansible project should pass three tests: (1) ansible-lint reports no errors, (2) --check --diff on a configured system shows zero changes (proving idempotency), and (3) molecule test passes on a clean system (proving the role works from scratch). If all three pass, you have automation you can trust.

Consultant's Checklist

When assessing or setting up Ansible automation for a client, verify the following:

Foundation

Playbooks and roles are in version control (Git)
Inventory is organized by environment (production, staging, dev)
Secrets are encrypted with Ansible Vault or external KMS
SSH key management is centralized (no shared keys)
ansible.cfg is project-scoped, not global

Quality

ansible-lint runs in CI on every PR
Roles have Molecule tests
Playbooks are idempotent (second run = zero changes)
No raw command/shell where modules exist
Templates use {{ ansible_managed }} header comment

Organization

Roles are small and focused (one role = one concern)
Galaxy dependencies are pinned in requirements.yml
Variables follow naming conventions (role-prefixed)
group_vars/host_vars are used instead of inline variables
Tags are used for selective execution

Operations

CI/CD pipeline runs playbooks (not humans from laptops)
Rolling deployments use serial: to limit blast radius
--check --diff is run before applying changes to production
Callback plugins or AWX provide run history and audit trail
Dynamic inventory is used for cloud environments

Maturity Progression

Level 1: Ad-hoc playbooks run manually from a developer's laptop. Level 2: Playbooks in Git, manual execution from a bastion host. Level 3: CI/CD runs playbooks automatically, ansible-lint in PR checks, Vault for secrets. Level 4: AWX/AAP for centralized management, Molecule tests for all roles, dynamic inventory, full audit trail. Most teams should aim for Level 3 as a baseline. Level 4 is for organizations with multiple teams sharing Ansible automation.

Idempotency

The property of producing the same result regardless of how many times an operation is executed.

What it means in Ansible

An idempotent Ansible task checks the current state of the system before making changes. If the system is already in the desired state, the task does nothing and reports "ok". If a change is needed, the task makes the change and reports "changed". This means you can run the same playbook 10 times in a row, and the result is identical to running it once.

Why it matters

Safety — Re-running a playbook after a partial failure is safe; completed tasks are skipped
Drift correction — Running playbooks on a schedule detects and corrects configuration drift
Confidence — You can run --check mode and see exactly what would change
Auditability — "changed" vs "ok" status tells you what actually changed on each run

Examples

# IDEMPOTENT - checks state, only installs if missing
- apt:
    name: nginx
    state: present

# IDEMPOTENT - only modifies if content differs
- copy:
    content: "Hello World"
    dest: /var/www/index.html

# NOT IDEMPOTENT - runs every time
- shell: echo "Hello" >> /var/log/myapp.log

# MADE IDEMPOTENT with a guard
- shell: /opt/myapp/init.sh
  args:
    creates: /opt/myapp/.initialized

Test for idempotency: Run your playbook twice. On the second run, every task should report "ok" with zero "changed". If any task reports "changed" on the second run, that task is not idempotent and needs to be fixed.

Jinja2

Python-based templating engine used by Ansible for variable interpolation, conditionals, and loops.

Where Jinja2 is used

Template files (.j2) — Rendered by the template module and deployed to remote hosts
Playbook expressions — {{ variable }} syntax in task parameters, when: conditionals
Variable definitions — Reference other variables within variable files

Syntax reference

{# This is a comment #}

{# Variable output #}
{{ my_variable }}
{{ my_variable | default("fallback") }}

{# Conditionals #}
{% if environment == "production" %}
log_level = warn
{% elif environment == "staging" %}
log_level = info
{% else %}
log_level = debug
{% endif %}

{# Loops #}
{% for server in servers %}
server {{ server.host }}:{{ server.port }};
{% endfor %}

{# Common filters #}
{{ list_var | join(", ") }}
{{ string_var | upper }}
{{ path_var | basename }}
{{ dict_var | to_json }}
{{ secret_var | hash("sha256") }}
{{ items | selectattr("active", "equalto", true) | list }}

Gotcha: In YAML playbooks, you must quote strings that start with {{ to prevent YAML parsing errors. Write dest: "{{ app_dir }}/config.yml", not dest: {{ app_dir }}/config.yml. Template files (.j2) do not need this quoting because they are not parsed as YAML.

Ansible Vault

Built-in encryption mechanism for protecting sensitive data in Ansible projects.

How it works

Ansible Vault uses AES-256-CTR symmetric encryption with HMAC-SHA256 for authentication. You provide a password (or password file), and Vault encrypts/decrypts files or strings. Encrypted content starts with the header $ANSIBLE_VAULT;1.1;AES256 and can be safely committed to version control.

File encryption vs string encryption

File encryption — Encrypts the entire file. Good for files that are mostly secrets (credentials files). Downside: you cannot see variable names in git diff.
String encryption — Encrypts individual values inline with !vault tag. Variable names remain visible, only values are encrypted. Better for git diff readability and code review.

Best practices

Use a separate vault.yml file alongside main.yml in group_vars — keeps encrypted and plain variables separate
Prefix vaulted variable names with vault_ and reference them in plain vars: db_password: "{{ vault_db_password }}"
Store the vault password in your CI/CD platform's secret store, not in the repo
Use vault IDs to separate encryption by environment

Limitation: Ansible Vault is not a secrets manager. It provides encryption at rest for files in your repo. It does not provide access control, audit logging, secret rotation, or dynamic secret generation. For those capabilities, integrate with HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault using Ansible lookup plugins.

AWX / Ansible Automation Platform

Web-based UI and REST API for centralized Ansible automation management.

What is AWX?

AWX is the open-source upstream project for Red Hat Ansible Automation Platform (AAP) Controller (formerly Ansible Tower). It provides a web UI, REST API, RBAC, credential management, job scheduling, and audit logging on top of ansible-playbook. Think of it as a centralized server that runs your playbooks with governance and visibility.

Key features

Job Templates — Wrap a playbook + inventory + credentials + extra vars into a reusable, one-click job
RBAC — Teams and users with fine-grained permissions (who can run what on which inventory)
Credential management — Store SSH keys, vault passwords, cloud credentials securely; inject at runtime
Schedules — Cron-like scheduling for recurring playbook runs
Notifications — Slack, email, webhook notifications on job success/failure
Audit trail — Full history of every job run, who triggered it, what changed
REST API — Trigger playbook runs from external systems (ServiceNow, Jenkins, scripts)
Workflows — Chain multiple job templates with conditional logic (run playbook B only if A succeeds)

AWX vs AAP

AWX — Free, open-source. Community-supported. No guaranteed release cadence. Good for small teams and non-critical automation.
AAP — Red Hat subscription. Includes Controller, Automation Hub, EDA (Event-Driven Ansible), certified content, and enterprise support. Required for regulated environments that need vendor support and SLAs.

When to use AWX/AAP: When multiple teams share Ansible automation, when you need audit trails for compliance, when non-technical users need to trigger playbooks via a UI, or when you want to replace "that one person who runs Ansible from their laptop" with a centralized, governed platform.

Collections

The modern packaging format for distributing Ansible content — modules, plugins, roles, and playbooks.

What are collections?

Collections are the standard way to package and distribute Ansible content since Ansible 2.10. Before collections, all modules shipped with ansible in one monolithic package. Now, modules are distributed as separate, namespaced collections (e.g., amazon.aws, community.general, ansible.builtin).

Structure

namespace/
  collection_name/
    galaxy.yml          # Collection metadata
    plugins/
      modules/          # Custom modules
      inventory/        # Inventory plugins
      lookup/           # Lookup plugins
      filter/           # Filter plugins
      callback/         # Callback plugins
    roles/              # Bundled roles
    playbooks/          # Example playbooks
    docs/               # Documentation
    tests/              # Integration tests

Common collections

ansible.builtin — Core modules (apt, copy, file, service, template, etc.)
community.general — Broad community modules (1000+ modules)
amazon.aws — AWS modules (EC2, S3, RDS, IAM, etc.)
azure.azcollection — Azure modules
google.cloud — GCP modules
community.docker — Docker and Docker Compose modules
kubernetes.core — Kubernetes modules (k8s, helm, kubectl)
ansible.posix — POSIX modules (cron, sysctl, mount, etc.)

Using collections

# In a playbook, reference modules by FQCN
- name: Create an S3 bucket
  amazon.aws.s3_bucket:
    name: my-app-bucket
    region: us-east-1
    state: present

# Or use collections keyword for shorter names
- hosts: all
  collections:
    - community.general
  tasks:
    - name: Manage a timezone
      timezone:
        name: America/New_York

Migration note: If you are upgrading from Ansible 2.9 or earlier, you will need to install collections separately. The ansible package (not ansible-core) installs many common collections automatically. For ansible-core, you must install collections explicitly via requirements.yml.

Molecule

Testing framework for Ansible roles that creates ephemeral environments for automated testing.

What Molecule does

Molecule automates the test lifecycle for Ansible roles: create a test instance, run the role (converge), test idempotency (run again and verify zero changes), run verification tests, and destroy the instance. It supports multiple drivers (Docker, Podman, Vagrant, cloud providers) and multiple verifiers (Ansible assertions, testinfra).

Test lifecycle

dependency — Install Galaxy dependencies
create — Spin up test instances
prepare — Run preparation tasks (prerequisites)
converge — Run the role against test instances
idempotence — Run the role again, fail if anything reports "changed"
verify — Run verification tests (testinfra, Ansible assertions)
destroy — Clean up test instances

Example molecule.yml

# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu-test
    image: ubuntu:22.04
    pre_build_image: true
    command: /sbin/init
    privileged: true
  - name: rocky-test
    image: rockylinux:9
    pre_build_image: true
    command: /sbin/init
    privileged: true
provisioner:
  name: ansible
  playbooks:
    converge: converge.yml
    verify: verify.yml
verifier:
  name: ansible

# molecule/default/verify.yml
- name: Verify role
  hosts: all
  tasks:
    - name: Check nginx is installed
      command: nginx -v
      register: result
      changed_when: false
      failed_when: result.rc != 0

    - name: Check nginx is running
      service_facts:
      register: services

    - name: Assert nginx is running
      assert:
        that:
          - "'nginx.service' in services.ansible_facts.services"
          - "services.ansible_facts.services['nginx.service'].state == 'running'"

CI integration: Run molecule test in your CI pipeline on every PR that changes a role. This catches breakages before they reach production. For speed, use the Docker driver (containers start in seconds). For accuracy on systemd-dependent roles, use privileged containers or Vagrant VMs.