Ansible Automation
Agentless IT automation — playbooks, inventory, roles, vault & CI/CD integration
Overview
Ansible is an open-source, agentless automation tool that uses SSH (or WinRM/SSH for Windows) to configure systems, deploy software, and orchestrate complex workflows. Everything is defined in YAML — no proprietary DSL, no compiled agents, no daemons running on managed nodes. You write a playbook, run it, and Ansible connects to your targets over SSH, executes tasks, and reports back.
The core design principle is idempotency: running the same playbook multiple times produces the same result. If a package is already installed, Ansible skips it. If a file already has the correct content, Ansible leaves it alone. This makes Ansible safe to re-run and suitable for drift correction.
Red Hat Ansible vs community
Ansible exists in two forms. The community project (ansible-core) is free, open-source, and maintained on GitHub. Red Hat Ansible Automation Platform (AAP) is the commercial product that bundles ansible-core with AWX/AAP Controller (web UI, RBAC, scheduling), Automation Hub (curated content), and enterprise support. Most teams start with community Ansible and move to AAP when they need centralized management, audit trails, or role-based access for multiple teams.
Why Ansible is popular
Strengths
- Agentless — Nothing to install on managed nodes. Just SSH and Python.
- YAML-based — Playbooks are human-readable, version-controllable, and reviewable in PRs
- Low barrier to entry — A sysadmin can be productive in hours, not weeks
- Massive module library — Thousands of modules for cloud, networking, containers, databases, security
- Idempotent by default — Safe to re-run, enables drift correction
- Works everywhere — Linux, Windows, network devices, cloud APIs, containers
- Red Hat backing — Enterprise support, certified content, long-term roadmap
Considerations
- Performance at scale — SSH-based execution is slower than agent-based tools for 1000+ nodes
- State management — No built-in state file (unlike Terraform). You describe desired state, but Ansible does not track what it previously did.
- Error handling — YAML playbooks can get complex with deep conditional logic and error recovery
- Windows support — Works via WinRM or SSH (officially supported since ansible-core 2.18). Improving rapidly, especially with OpenSSH built into Windows Server 2025, but still not as mature as Linux support
- Secret management — Ansible Vault is basic; many teams pair it with HashiCorp Vault or cloud KMS
- No built-in drift detection — Ansible enforces state when run but does not continuously monitor for drift between runs. Event-Driven Ansible (EDA) can help by triggering remediation playbooks in response to external events.
Ansible vs other tools
| Feature | Ansible | Terraform | Puppet | Chef |
|---|---|---|---|---|
| Architecture | Agentless (SSH/WinRM) | Agentless (API) | Agent-based (with agentless option) | Agent-based |
| Language | YAML | HCL | Puppet DSL (Ruby) | Ruby DSL |
| Primary use | Config mgmt + orchestration | Infrastructure provisioning | Config mgmt | Config mgmt |
| State | Stateless (desired state per run) | State file | Agent reports | Agent reports |
| Learning curve | Low | Medium | High | High |
| Idempotency | Module-level | Built-in | Built-in | Built-in |
Ansible and Terraform are complementary, not competing. Terraform provisions infrastructure (VMs, networks, load balancers). Ansible configures what runs on that infrastructure (packages, users, services, files). A common pattern is Terraform to create the VMs, then Ansible to configure them. Trying to use Ansible for cloud infrastructure provisioning or Terraform for OS-level configuration leads to pain.
How Ansible Works
Ansible follows a push-based model. You run ansible-playbook on a control node (your laptop, a CI runner, a bastion host), and it pushes configuration to managed nodes over SSH. There is no central server, no agent, no pull schedule. You decide when to run it.
Architecture
Execution flow
When you run ansible-playbook site.yml, this is what happens under the hood:
- Parse — Ansible reads the playbook YAML, resolves variables, loads roles, and builds a list of plays
- Inventory — Reads the inventory file (or dynamic inventory script) to determine which hosts to target
- Fact gathering — Connects to each host via SSH and runs the
setupmodule to collect system facts (OS, IP, memory, disk, etc.) - Task execution — For each task in each play, Ansible generates a small Python script, copies it to the remote host via SFTP/SCP, executes it, captures the output, and removes the script
- Result collection — Each task returns JSON with status (changed, ok, failed, skipped). Ansible aggregates results and proceeds to the next task.
- Handler notification — If a task reports "changed" and notifies a handler (e.g., restart nginx), the handler runs at the end of the play
Modules and plugins
Modules are the units of work. Each task calls one module (e.g., apt, copy, service). Modules are idempotent — they check current state and only make changes if needed. Modules execute on the remote host.
Plugins extend Ansible's core behavior and run on the control node. Types include connection plugins (SSH, WinRM, Docker), lookup plugins (read from files, environment, Vault), callback plugins (custom output formatting), and filter plugins (Jinja2 filters for data transformation).
Python requirement
Ansible modules are Python scripts that execute on the target. Most modules require Python 3 on managed nodes (Python 2 support was dropped after ansible-core 2.16). The exact minimum Python version depends on your ansible-core release — check the ansible-core support matrix for details. Python is usually already present on modern Linux distributions. For minimal or embedded systems without Python, Ansible provides the raw module which sends raw shell commands without requiring Python, and the script module which copies and executes a script in any language.
Ansible is fundamentally an SSH automation framework. Everything it does could be done manually by SSHing to each host and running commands. Ansible provides structure (playbooks), safety (idempotency), scale (parallel execution across hundreds of hosts), and repeatability (version-controlled YAML). If SSH works, Ansible works.
Inventory
The inventory defines which hosts Ansible manages and how to connect to them. It can be a static file (INI or YAML format), a dynamic script that queries a cloud API, or a plugin that reads from an external source. The inventory also defines groups, which let you target subsets of hosts with specific plays.
INI format (traditional)
# inventory/hosts.ini
[webservers]
web1.example.com
web2.example.com
web3.example.com ansible_port=2222
[dbservers]
db1.example.com ansible_user=postgres
db2.example.com ansible_user=postgres
[loadbalancers]
lb1.example.com
# Group of groups
[production:children]
webservers
dbservers
loadbalancers
# Variables for all hosts in a group
[webservers:vars]
http_port=8080
max_connections=1000
[all:vars]
ansible_python_interpreter=/usr/bin/python3
YAML format (preferred)
# inventory/hosts.yml
all:
vars:
ansible_python_interpreter: /usr/bin/python3
children:
production:
children:
webservers:
vars:
http_port: 8080
max_connections: 1000
hosts:
web1.example.com:
web2.example.com:
web3.example.com:
ansible_port: 2222
dbservers:
vars:
ansible_user: postgres
hosts:
db1.example.com:
db2.example.com:
loadbalancers:
hosts:
lb1.example.com:
Dynamic inventory
For cloud environments where hosts are ephemeral, static files become stale immediately. Dynamic inventory plugins query cloud APIs in real time to build the host list.
# inventory/aws_ec2.yml (dynamic inventory plugin)
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
- us-west-2
keyed_groups:
- key: tags.Environment
prefix: env
- key: tags.Role
prefix: role
- key: placement.availability_zone
prefix: az
filters:
instance-state-name: running
"tag:ManagedBy": ansible
compose:
ansible_host: private_ip_address
# Test dynamic inventory
ansible-inventory -i inventory/aws_ec2.yml --graph
ansible-inventory -i inventory/aws_ec2.yml --list
group_vars and host_vars
Variables can be defined per-group or per-host in separate files. Ansible automatically loads them based on directory structure:
# Directory structure
inventory/
hosts.yml
group_vars/
all.yml # Variables for every host
webservers.yml # Variables for webservers group
dbservers.yml # Variables for dbservers group
production.yml # Variables for production group
host_vars/
web1.example.com.yml # Variables for this specific host
db1.example.com.yml
# inventory/group_vars/webservers.yml
nginx_version: "1.28"
ssl_certificate_path: /etc/ssl/certs/app.crt
worker_processes: auto
worker_connections: 4096
Inventory patterns
# Target specific groups or hosts
ansible-playbook site.yml -i inventory/ -l webservers # only webservers
ansible-playbook site.yml -i inventory/ -l 'webservers:&production' # intersection
ansible-playbook site.yml -i inventory/ -l 'webservers:!web3.example.com' # exclude
ansible-playbook site.yml -i inventory/ -l '*.example.com' # wildcard
Use YAML format for inventory — it is consistent with playbooks and supports complex data structures. Use group_vars/host_vars directories rather than inline variables in the inventory file. This keeps secrets separate (you can vault-encrypt individual var files) and makes the inventory readable. For cloud environments, always use dynamic inventory — static files for ephemeral VMs are a maintenance nightmare.
Playbooks
A playbook is a YAML file containing one or more plays. Each play targets a group of hosts and defines a list of tasks to execute. Tasks call modules, and the order of tasks in a play is the order of execution. Playbooks are the core of Ansible — they are the automation scripts that define your infrastructure as code.
Playbook structure
# site.yml - a realistic multi-task playbook
---
- name: Configure web servers
hosts: webservers
become: true
gather_facts: true
vars:
app_port: 8080
app_user: appuser
pre_tasks:
- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
tasks:
- name: Install required packages
apt:
name:
- nginx
- python3-pip
- certbot
state: present
- name: Create application user
user:
name: "{{ app_user }}"
shell: /bin/bash
create_home: true
system: true
- name: Deploy nginx configuration
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/sites-available/default
owner: root
group: root
mode: '0644'
notify: Reload nginx
- name: Deploy application config
template:
src: templates/app.conf.j2
dest: "/home/{{ app_user }}/app.conf"
owner: "{{ app_user }}"
mode: '0600'
notify: Restart application
- name: Ensure nginx is enabled and running
service:
name: nginx
state: started
enabled: true
- name: Open firewall ports
ufw:
rule: allow
port: "{{ item }}"
proto: tcp
loop:
- '80'
- '443'
- "{{ app_port }}"
handlers:
- name: Reload nginx
service:
name: nginx
state: reloaded
- name: Restart application
systemd:
name: myapp
state: restarted
daemon_reload: true
- name: Configure database servers
hosts: dbservers
become: true
roles:
- role: geerlingguy.postgresql
vars:
postgresql_version: "16"
postgresql_databases:
- name: myapp
postgresql_users:
- name: myapp
password: "{{ vault_db_password }}"
Key playbook concepts
Tasks
Tasks are the individual actions. Each task calls one module. Tasks run in order, and Ansible stops on the first failure (unless ignore_errors: true is set). Tasks should have a descriptive name for readability in output.
Handlers
Handlers are tasks that only run when notified by another task that reported "changed". They run once at the end of the play, regardless of how many tasks notify them. Common use: restarting a service after config changes.
Become
become: true escalates privileges (sudo). Can be set at the play level or per-task. Use become_user to become a specific user. The connecting user must have sudo access on the target.
Tags
Tags let you run a subset of tasks. Add tags: [deploy, config] to tasks, then run with --tags deploy. Use --skip-tags to exclude. Tags are essential for large playbooks where you want to run only specific parts.
Includes and imports
# Import is static (resolved at parse time)
- import_tasks: tasks/common.yml
# Include is dynamic (resolved at runtime, supports loops and conditionals)
- include_tasks: "tasks/{{ ansible_os_family | lower }}.yml"
# Import a playbook
- import_playbook: webservers.yml
- import_playbook: dbservers.yml
import_* is static — resolved at playbook parse time. Tags and conditions on an import apply to all tasks inside it. include_* is dynamic — resolved at runtime. This means you can use variables in the filename, but tags on the include statement itself do not propagate to tasks within the included file. To push tags into included tasks, use the apply keyword (e.g., include_tasks: file: db.yml apply: tags: db). Use imports for static structure, includes for dynamic/conditional loading.
Variables & Facts
Variables in Ansible come from many sources, and understanding variable precedence is critical. Ansible has 22 levels of variable precedence. When the same variable is defined in multiple places, the highest-precedence source wins.
Variable precedence (simplified, highest wins)
| Priority | Source | Notes |
|---|---|---|
| Highest | --extra-vars (-e) | Command line. Always wins. Use for overrides and CI/CD. |
| High | Task vars (block/task level) | Scoped to specific tasks |
| High | include_vars / set_fact | Runtime-defined variables |
| Medium | Play vars, vars_files, vars_prompt | Defined in the playbook |
| Medium | Host facts (ansible_*) | Gathered from target system |
| Low-Med | host_vars/* | Per-host variable files |
| Low-Med | group_vars/* | Per-group variable files (child groups override parents) |
| Low | Inventory variables | Defined inline in inventory |
| Low | Role defaults (defaults/main.yml) | Designed to be overridden. Lowest role-level precedence. |
| Lowest | Command line defaults | Ansible configuration defaults |
Ansible facts
When gather_facts: true (the default), Ansible runs the setup module on each host to collect system information. Facts are available as variables prefixed with ansible_:
# Common facts
ansible_hostname # web1
ansible_fqdn # web1.example.com
ansible_distribution # Ubuntu
ansible_distribution_version # 22.04
ansible_os_family # Debian
ansible_memtotal_mb # 8192
ansible_processor_vcpus # 4
ansible_default_ipv4.address # 10.0.1.50
ansible_devices # disk info
ansible_mounts # mounted filesystems
# Use facts in templates and conditionals
- name: Install packages (Debian)
apt:
name: nginx
state: present
when: ansible_os_family == "Debian"
- name: Install packages (RedHat)
dnf:
name: nginx
state: present
when: ansible_os_family == "RedHat"
Registered variables
- name: Check if application is running
command: systemctl is-active myapp
register: app_status
ignore_errors: true
- name: Start application if not running
service:
name: myapp
state: started
when: app_status.rc != 0
- name: Debug output
debug:
msg: "App status: {{ app_status.stdout }}, return code: {{ app_status.rc }}"
Jinja2 templating
# Variable interpolation
message: "Hello {{ username }}"
# Filters
ip_list: "{{ groups['webservers'] | map('extract', hostvars, 'ansible_host') | list }}"
config_hash: "{{ lookup('file', 'app.conf') | hash('sha256') }}"
default_value: "{{ custom_port | default(8080) }}"
# Conditionals in templates (Jinja2)
{% if environment == 'production' %}
log_level: warn
{% else %}
log_level: debug
{% endif %}
# Loops in templates
{% for host in groups['webservers'] %}
server {{ hostvars[host]['ansible_host'] }}:{{ http_port }};
{% endfor %}
Magic variables
Ansible provides special built-in variables that are always available:
inventory_hostname— The name of the current host as defined in inventorygroups— Dictionary of all groups and their host listshostvars— Dictionary of all host variables (access another host's vars)play_hosts— List of hosts in the current playansible_play_batch— List of hosts in the current batch (respectsserial)role_path— Path to the current role directory
When a variable has an unexpected value, use ansible-playbook site.yml -e @vars.yml --check -vvv to see where variables come from. The debug module with var: is your best friend. For complex precedence issues, remember: extra vars always win, role defaults always lose. Everything else is a spectrum in between.
Roles & Galaxy
Roles are Ansible's mechanism for organizing playbook content into reusable, shareable units. A role bundles tasks, handlers, templates, files, variables, and defaults into a standard directory structure. Instead of a 500-line playbook, you have small, focused roles that can be composed together.
Role directory structure
roles/
webserver/
tasks/
main.yml # Entry point - task list
install.yml # Included by main.yml
configure.yml
handlers/
main.yml # Handlers (restart services, etc.)
templates/
nginx.conf.j2 # Jinja2 templates
vhost.conf.j2
files/
index.html # Static files to copy
vars/
main.yml # Role variables (high precedence)
defaults/
main.yml # Default variables (low precedence, meant to be overridden)
meta/
main.yml # Role metadata, dependencies, Galaxy info
tests/
test.yml # Test playbook
README.md
Using roles in playbooks
---
- name: Configure web servers
hosts: webservers
become: true
roles:
# Simple role inclusion
- webserver
# Role with variables
- role: webserver
vars:
nginx_port: 8080
ssl_enabled: true
# Role with conditional
- role: monitoring
when: enable_monitoring | default(true)
# Role with tags
- role: security
tags: [security, hardening]
Ansible Galaxy
Ansible Galaxy is the public repository for community-shared roles and collections. Instead of writing everything from scratch, you can use battle-tested roles from the community.
# Install a role from Galaxy
ansible-galaxy install geerlingguy.docker
ansible-galaxy install geerlingguy.postgresql
# Install a collection
ansible-galaxy collection install community.general
ansible-galaxy collection install amazon.aws
# Install from a requirements file
ansible-galaxy install -r requirements.yml
ansible-galaxy collection install -r requirements.yml
# requirements.yml
roles:
- name: geerlingguy.docker
version: "7.1.0"
- name: geerlingguy.postgresql
version: "4.0.3"
- name: geerlingguy.certbot
version: "5.1.0"
collections:
- name: community.general
version: ">=8.0.0"
- name: amazon.aws
version: ">=7.0.0"
- name: ansible.posix
version: ">=1.5.0"
Collections vs roles
Roles
- Bundle tasks, templates, handlers, and variables
- One role = one purpose (e.g., install nginx)
- Can contain custom modules (in
library/) and plugins, but collections are the preferred distribution format for reusable modules/plugins - Installed to
~/.ansible/roles/or projectroles/ - Simpler, focused on playbook organization
Collections
- Bundle roles, modules, plugins, and playbooks together
- Namespaced:
amazon.aws,community.general - Can contain custom modules and plugins
- The modern distribution format for Ansible content
- Installed to
~/.ansible/collections/
Pin versions in requirements.yml. An unpinned Galaxy role can break your playbook when the author pushes a breaking change. Use version constraints (version: "7.1.0" or version: ">=7.0.0,<8.0.0") and test upgrades explicitly. Treat Galaxy roles the same way you treat library dependencies in application code — pin, test, upgrade deliberately.
Modules & Plugins
Ansible ships with thousands of modules. Knowing which module to use for a given task is the difference between clean, idempotent automation and fragile shell scripts wrapped in YAML. Here are the modules you will use most often.
Essential modules
| Module | Purpose | Example |
|---|---|---|
apt / yum / dnf | Package management | apt: name=nginx state=present |
copy | Copy files to remote | copy: src=app.conf dest=/etc/app.conf |
template | Deploy Jinja2 templates | template: src=nginx.conf.j2 dest=/etc/nginx/nginx.conf |
file | Manage files/dirs/links | file: path=/data state=directory mode='0755' |
service / systemd | Manage services | service: name=nginx state=started enabled=true |
user | Manage user accounts | user: name=deploy shell=/bin/bash |
lineinfile | Ensure a line in a file | lineinfile: path=/etc/hosts line="10.0.1.5 db1" |
uri | HTTP requests | uri: url=https://api.example.com method=GET |
command | Run a command (no shell) | command: /usr/bin/myapp --init |
shell | Run via shell (pipes, redirects) | shell: cat /etc/hosts | grep db |
When to use command/shell vs dedicated modules
Do not use command or shell when a dedicated module exists. For example, shell: apt-get install -y nginx is not idempotent — it runs every time. apt: name=nginx state=present checks first and only installs if needed. Use command/shell only when no module exists for your use case, and always add creates, removes, or when conditions to make them idempotent.
# BAD - not idempotent, runs every time
- name: Install nginx
shell: apt-get install -y nginx
# GOOD - idempotent, checks state first
- name: Install nginx
apt:
name: nginx
state: present
# ACCEPTABLE - command with idempotency guard
- name: Initialize the application database
command: /opt/myapp/bin/init-db.sh
args:
creates: /opt/myapp/data/.initialized # Skip if this file exists
# ACCEPTABLE - shell with conditional
- name: Check if cluster is healthy
shell: kubectl get nodes | grep -c Ready
register: node_count
changed_when: false # This is a read-only check
Template module deep dive
The template module is one of Ansible's most powerful features. It takes a Jinja2 template file and renders it with Ansible variables, then deploys the result to the remote host.
# templates/nginx.conf.j2
upstream app_servers {
{% for host in groups['webservers'] %}
server {{ hostvars[host]['ansible_host'] }}:{{ app_port }};
{% endfor %}
}
server {
listen {{ nginx_port | default(80) }};
server_name {{ server_name }};
location / {
proxy_pass http://app_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
{% if ssl_enabled | default(false) %}
listen 443 ssl;
ssl_certificate {{ ssl_cert_path }};
ssl_certificate_key {{ ssl_key_path }};
{% endif %}
}
Custom modules
When no built-in module fits your needs, you can write custom modules in Python. Place them in a library/ directory next to your playbook or in a collection. Custom modules receive arguments as JSON, do their work, and return JSON results with changed, failed, and msg fields. Use the ansible.module_utils.basic.AnsibleModule class for argument parsing, check mode support, and result handling.
Use ansible-doc <module_name> to see full documentation, examples, and return values for any module. For example, ansible-doc template shows all parameters, defaults, and usage examples. This is faster than searching the web and works offline.
CI/CD Integration
Running Ansible in CI/CD pipelines is the standard way to automate deployments. The pattern is straightforward: your pipeline checks out the playbook repo, installs Ansible, and runs ansible-playbook with the appropriate inventory and vault credentials. The challenge is managing SSH keys, secrets, and inventory in a CI environment.
GitHub Actions
# .github/workflows/deploy.yml
name: Deploy Application
on:
push:
branches: [main]
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Ansible
run: |
pip install ansible boto3
- name: Install Galaxy dependencies
run: ansible-galaxy install -r requirements.yml
- name: Set up SSH key
run: |
mkdir -p ~/.ssh
echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
ssh-keyscan -H ${{ secrets.DEPLOY_HOST }} >> ~/.ssh/known_hosts
- name: Run playbook
env:
ANSIBLE_VAULT_PASSWORD: ${{ secrets.VAULT_PASSWORD }}
run: |
echo "$ANSIBLE_VAULT_PASSWORD" > .vault_pass
ansible-playbook site.yml \
-i inventory/production/ \
--vault-password-file .vault_pass \
-e "app_version=${{ github.sha }}"
rm -f .vault_pass
GitLab CI
# .gitlab-ci.yml
stages:
- lint
- deploy
lint:
stage: lint
image: python:3.11
script:
- pip install ansible-lint
- ansible-lint site.yml
deploy_staging:
stage: deploy
image: python:3.11
environment:
name: staging
before_script:
- pip install ansible boto3
- ansible-galaxy install -r requirements.yml
- mkdir -p ~/.ssh
- echo "$SSH_PRIVATE_KEY" > ~/.ssh/id_rsa
- chmod 600 ~/.ssh/id_rsa
- echo "$VAULT_PASSWORD" > .vault_pass
script:
- ansible-playbook site.yml
-i inventory/staging/
--vault-password-file .vault_pass
-e "app_version=${CI_COMMIT_SHA}"
after_script:
- rm -f .vault_pass ~/.ssh/id_rsa
only:
- main
Ansible in Docker
# Dockerfile for an Ansible runner
FROM python:3.11-slim
RUN pip install --no-cache-dir \
ansible-core \
boto3 \
jmespath \
ansible-lint
COPY requirements.yml /ansible/
RUN ansible-galaxy install -r /ansible/requirements.yml && \
ansible-galaxy collection install -r /ansible/requirements.yml
WORKDIR /ansible
ENTRYPOINT ["ansible-playbook"]
# Run Ansible from Docker
docker run --rm \
-v $(pwd):/ansible \
-v ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro \
-e ANSIBLE_HOST_KEY_CHECKING=false \
my-ansible-runner site.yml -i inventory/production/
AWX / Ansible Automation Platform
For teams that need a web UI, RBAC, job scheduling, and audit trails, AWX (open-source) or Ansible Automation Platform (Red Hat commercial) provides a centralized platform. AWX stores credentials securely, manages inventories, and provides a REST API for triggering playbook runs from other systems. It is essentially "Jenkins for Ansible" but purpose-built.
Never store vault passwords or SSH keys in your repository. Use your CI/CD platform's secret management (GitHub Secrets, GitLab CI Variables, etc.). The vault password should be injected at runtime via an environment variable or a temporary file that is cleaned up after the run. For production, consider integrating with HashiCorp Vault or cloud KMS to fetch secrets dynamically during playbook execution using lookup plugins.
Ansible Vault provides encryption for sensitive data such as passwords, API keys, and certificates. It uses AES-256 symmetric encryption. You encrypt files or individual strings with a vault password, commit the encrypted content to version control, and provide the vault password at runtime to decrypt.
Encrypting files
# Encrypt an entire file
ansible-vault encrypt group_vars/production/secrets.yml
# Create a new encrypted file
ansible-vault create group_vars/production/secrets.yml
# Edit an encrypted file (decrypts in-place for editing)
ansible-vault edit group_vars/production/secrets.yml
# View encrypted file contents
ansible-vault view group_vars/production/secrets.yml
# Decrypt a file permanently
ansible-vault decrypt group_vars/production/secrets.yml
# Re-key (change the vault password)
ansible-vault rekey group_vars/production/secrets.yml
Encrypting individual strings
# Encrypt a single string (inline in a YAML file)
ansible-vault encrypt_string 'SuperSecretPassword123' --name 'db_password'
# Output (paste this into your vars file):
# db_password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# 62313365396662343061393464336163...
# group_vars/production/secrets.yml (mix of plain and encrypted)
app_environment: production
app_debug: false
db_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
62313365396662343061393464336163383764356462376564656232...
api_key: !vault |
$ANSIBLE_VAULT;1.1;AES256
33356134653765633035313038376432336531303365616438...
Vault IDs (multiple passwords)
Vault IDs let you use different passwords for different environments or sensitivity levels:
# Encrypt with a vault ID
ansible-vault encrypt --vault-id prod@prompt group_vars/production/secrets.yml
ansible-vault encrypt --vault-id dev@prompt group_vars/staging/secrets.yml
# Use a password file per environment
ansible-vault encrypt --vault-id prod@.vault_pass_prod secrets.yml
# Run playbook with multiple vault IDs
ansible-playbook site.yml \
--vault-id dev@.vault_pass_dev \
--vault-id prod@.vault_pass_prod
Using vault in playbooks
# Provide vault password interactively
ansible-playbook site.yml --ask-vault-pass
# Provide vault password from a file
ansible-playbook site.yml --vault-password-file .vault_pass
# Provide vault password from an environment variable (CI/CD pattern)
echo "$VAULT_PASSWORD" > /tmp/vault_pass
ansible-playbook site.yml --vault-password-file /tmp/vault_pass
rm -f /tmp/vault_pass
# Or use a script that outputs the password
ansible-playbook site.yml --vault-password-file get_vault_pass.sh
Ansible Vault is file-level encryption, not a secrets manager. It does not support access control, audit logs, secret rotation, or dynamic secrets. For production environments, pair Vault-encrypted files for static config with a proper secrets manager (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) for dynamic secrets. Use the community.hashi_vault.hashi_vault lookup plugin to fetch secrets at runtime without storing them in files at all.
Best Practices
Recommended directory layout
ansible-project/
ansible.cfg # Project-level Ansible configuration
site.yml # Main playbook (imports others)
webservers.yml # Playbook for web tier
dbservers.yml # Playbook for database tier
requirements.yml # Galaxy role/collection dependencies
inventory/
production/
hosts.yml # Production inventory
group_vars/
all.yml
webservers.yml
dbservers/
main.yml
vault.yml # Encrypted secrets
host_vars/
staging/
hosts.yml
group_vars/
roles/
common/ # Shared role (NTP, users, packages)
webserver/ # Web server role
database/ # Database role
playbooks/ # Additional playbooks
rolling-update.yml
backup.yml
templates/ # Global templates (if not in roles)
files/ # Global static files
library/ # Custom modules
filter_plugins/ # Custom Jinja2 filters
Idempotency checklist
- Use dedicated modules instead of
command/shellwhenever possible - When using
command/shell, addcreates:,removes:, orwhen:guards - Mark read-only commands with
changed_when: false - Use
state: present/state: absentinstead of install/remove commands - Test with
--checkmode — a properly idempotent playbook should show zero changes on the second run
Check mode and diff mode
# Dry run - show what WOULD change without making changes
ansible-playbook site.yml --check
# Diff mode - show the exact changes (file diffs)
ansible-playbook site.yml --check --diff
# Combine with limit for safe testing
ansible-playbook site.yml --check --diff --limit web1.example.com
# Some tasks don't support check mode - mark them:
# check_mode: false (always run, even in check mode)
# check_mode: true (only run in check mode)
Linting with ansible-lint
# Install
pip install ansible-lint
# Run against a playbook
ansible-lint site.yml
# Run against all YAML in the project
ansible-lint
# Common rules it catches:
# - Using command/shell instead of a dedicated module
# - Missing name on tasks
# - Using deprecated syntax
# - Trailing whitespace
# - Risky file permissions
# - Using bare variables in when clauses
# .ansible-lint (configuration file)
skip_list:
- yaml[line-length]
- name[casing]
warn_list:
- experimental
exclude_paths:
- .cache/
- .github/
- molecule/
Testing with Molecule
Molecule is the standard testing framework for Ansible roles. It creates ephemeral test instances (Docker containers, VMs, cloud instances), runs your role against them, and verifies the result with testinfra or ansible assertions.
# Initialize Molecule for an existing role
cd roles/webserver
molecule init scenario # configure driver in molecule.yml (default: docker)
# Run the full test lifecycle
molecule test
# This runs: create -> converge -> idempotence -> verify -> destroy
# Run individual steps for development
molecule create # Spin up test containers
molecule converge # Run the role
molecule idempotence # Run again, verify zero changes
molecule verify # Run verification tests
molecule destroy # Clean up
A good Ansible project should pass three tests: (1) ansible-lint reports no errors, (2) --check --diff on a configured system shows zero changes (proving idempotency), and (3) molecule test passes on a clean system (proving the role works from scratch). If all three pass, you have automation you can trust.
Consultant's Checklist
When assessing or setting up Ansible automation for a client, verify the following:
Foundation
- Playbooks and roles are in version control (Git)
- Inventory is organized by environment (production, staging, dev)
- Secrets are encrypted with Ansible Vault or external KMS
- SSH key management is centralized (no shared keys)
ansible.cfgis project-scoped, not global
Quality
ansible-lintruns in CI on every PR- Roles have Molecule tests
- Playbooks are idempotent (second run = zero changes)
- No raw
command/shellwhere modules exist - Templates use
{{ ansible_managed }}header comment
Organization
- Roles are small and focused (one role = one concern)
- Galaxy dependencies are pinned in
requirements.yml - Variables follow naming conventions (role-prefixed)
- group_vars/host_vars are used instead of inline variables
- Tags are used for selective execution
Operations
- CI/CD pipeline runs playbooks (not humans from laptops)
- Rolling deployments use
serial:to limit blast radius --check --diffis run before applying changes to production- Callback plugins or AWX provide run history and audit trail
- Dynamic inventory is used for cloud environments
Level 1: Ad-hoc playbooks run manually from a developer's laptop. Level 2: Playbooks in Git, manual execution from a bastion host. Level 3: CI/CD runs playbooks automatically, ansible-lint in PR checks, Vault for secrets. Level 4: AWX/AAP for centralized management, Molecule tests for all roles, dynamic inventory, full audit trail. Most teams should aim for Level 3 as a baseline. Level 4 is for organizations with multiple teams sharing Ansible automation.