Functional Infrastructure Testing for Ansible
Validating Multi-Environment IaC with Docker Compose
Testing Ansible is deceptively hard.
Most teams test playbooks. What usually breaks in production, however, are roles interacting with real environments, real variables, and real lifecycle decisions.
This article describes a Functional Infrastructure Testing (FIT) approach that uses Docker Compose to simulate multiple environments and validate Ansible roles as they are actually used — without cloud infrastructure or slow VM pipelines.
Why Testing Ansible Roles Is Hard
Traditional testing approaches have fundamental limitations:
Syntax-Only Validation
ansible-playbook site.yml --check
This validates YAML syntax and task structure but tells you nothing about whether your nginx configuration actually starts nginx, or if your firewall rules lock you out.
Single-Container Tests
# molecule/default/molecule.yml
platforms:
- name: instance
image: debian:12
Molecule with a single container tests roles in isolation. But production issues arise from:
- Role interactions
- Variable precedence across groups
- Environment-specific configurations
- Network segmentation effects
Cloud-Based E2E Tests
Spinning up real AWS/Azure instances for every test run is:
- Slow (5-10 minutes startup)
- Expensive ($0.10-$1.00 per test run)
- Complex to maintain
- Not suitable for rapid development
What's missing is testing that validates:
- Multiple environments (prod, stage, dev, emergency)
- Different security postures per environment
- Different container runtimes across hosts
- Real SSH behavior and connection handling
- Real variable scoping and precedence
- Role lifecycle transitions
Functional Infrastructure Testing (FIT)
FIT means:
Test infrastructure the same way customers consume it.
This isn't about testing individual tasks or roles in isolation. It's about validating the entire stack as it would be deployed:
- Real inventories with group hierarchies
- Real group_vars and host_vars trees
- Real lifecycle modes (install/update/remove)
- Real SSH connections with key authentication
- Real network segmentation between environments
- Real execution order and dependencies
The key insight: treat test infrastructure like production infrastructure.
Docker Compose as a Test Substrate
Docker Compose provides the perfect balance for infrastructure testing:
Why Docker Compose Works
- Fast startup: Containers launch in seconds
- Reproducibility: Same behavior every time
- Isolated networks: Real network segmentation
- SSH accessibility: Containers can run SSH daemons
- Resource efficiency: 10 environments on a laptop
- CI-friendly: Works in any pipeline
Key Distinction
We're not containerizing applications. We're using containers as lightweight VMs to simulate infrastructure nodes.
Each container:
- Runs an SSH daemon
- Has Python installed
- Accepts Ansible connections
- Simulates a minimal Linux system
Multi-Environment Architecture
The test setup simulates five distinct environments, each with its own characteristics:
Environment Matrix
| Environment | Security Level | Purpose | Network |
|---|---|---|---|
| test | Legacy | Compatibility testing | 172.25.0.0/24 |
| prod | Maximum | Full security stack | 172.25.1.0/24 |
| stage | High | Prod-like, manual updates | 172.25.2.0/24 |
| dev | Minimal | Rapid development | 172.25.3.0/24 |
| emerg | Hardened | Break-glass access | 172.25.4.0/24 |
Host Distribution
Each environment contains:
- 2 application nodes
- Different container runtime assignments
- Environment-specific configurations
This creates a realistic test matrix covering the combinations seen in production.
Docker Compose Implementation
Network Isolation
networks:
test_network:
driver: bridge
ipam:
config:
- subnet: 172.25.0.0/24
prod_network:
driver: bridge
ipam:
config:
- subnet: 172.25.1.0/24
# ... other networks
Node Configuration
services:
# Production Node 01
101---prod-node-01:
image: debian:bookworm
hostname: 101---prod-node-01
container_name: fit-101---prod-node-01
networks:
prod_network:
ipv4_address: 172.25.1.11
ports:
- "2211:22" # SSH access
environment:
- DEBIAN_FRONTEND=noninteractive
volumes:
# Mock systemctl for role compatibility
- ./systemd-mock:/usr/bin/systemctl:ro
# SSH key for Ansible access
- ${SSH_KEY_PATH}:/root/.ssh/authorized_keys:ro
command: |
sh -c '
apt-get update &&
apt-get install -y openssh-server python3 sudo &&
mkdir -p /run/sshd &&
echo "PermitRootLogin yes" >> /etc/ssh/sshd_config &&
/usr/sbin/sshd -D
'
Complete Test Stack
version: '3.8'
services:
# Test Environment
001---test-node-01:
extends:
file: docker-compose.base.yml
service: base-debian-node
networks:
test_network:
ipv4_address: 172.25.0.11
ports:
- "2011:22"
002---test-node-02:
extends:
file: docker-compose.base.yml
service: base-debian-node
networks:
test_network:
ipv4_address: 172.25.0.12
ports:
- "2012:22"
# Production Environment
101---prod-node-01:
extends:
file: docker-compose.base.yml
service: base-debian-node
networks:
prod_network:
ipv4_address: 172.25.1.11
ports:
- "2111:22"
# ... continue for all environments
systemd Simulation Strategy
Containers don't run systemd, but Ansible roles expect it. Instead of complex workarounds, we use a lightweight mock:
The Mock Script
#!/bin/bash
# systemd-mock - Minimal systemctl simulator for testing
SERVICE="${2:-unknown}"
ACTION="${1:-status}"
case "$ACTION" in
start)
echo "[Mock] Starting $SERVICE"
touch /tmp/mock-$SERVICE.started
exit 0
;;
stop)
echo "[Mock] Stopping $SERVICE"
rm -f /tmp/mock-$SERVICE.started
exit 0
;;
restart|reload)
echo "[Mock] Restarting $SERVICE"
touch /tmp/mock-$SERVICE.started
exit 0
;;
enable)
echo "[Mock] Enabling $SERVICE"
touch /tmp/mock-$SERVICE.enabled
exit 0
;;
disable)
echo "[Mock] Disabling $SERVICE"
rm -f /tmp/mock-$SERVICE.enabled
exit 0
;;
is-active)
if [ -f /tmp/mock-$SERVICE.started ]; then
echo "active"
exit 0
else
echo "inactive"
exit 3
fi
;;
is-enabled)
if [ -f /tmp/mock-$SERVICE.enabled ]; then
echo "enabled"
exit 0
else
echo "disabled"
exit 1
fi
;;
status)
echo "[Mock] Status of $SERVICE"
exit 0
;;
daemon-reload)
echo "[Mock] Reloading systemd manager"
exit 0
;;
*)
echo "[Mock] Unknown action: $ACTION"
exit 0
;;
esac
This preserves role behavior without requiring a real init system, allowing us to test service management tasks.
The Testing Customer Pattern
All tests operate through a dedicated customer structure:
Directory Structure
infrastructure/
inventories/
c_00000_acme.ini # Test customer inventory
group_vars/
c_00000_acme/
all/
10-base.yml # Customer defaults
test/
20-security.yml # Test environment config
prod/
20-security.yml # Prod environment config
stage/
20-security.yml # Stage environment config
host_vars/
c_00000_acme/
test/
001---test-node-01.yml
002---test-node-02.yml
prod/
101---prod-node-01.yml
102---prod-node-02.yml
Test Inventory
# inventories/c_00000_acme.ini
[all:children]
test
prod
stage
dev
emerg
# Test Environment
[test:children]
test_nodes
[test_nodes]
001---test-node-01 ansible_host=172.25.0.11 ansible_port=2011
002---test-node-02 ansible_host=172.25.0.12 ansible_port=2012
# Production Environment
[prod:children]
prod_nodes
[prod_nodes]
101---prod-node-01 ansible_host=172.25.1.11 ansible_port=2111
102---prod-node-02 ansible_host=172.25.1.12 ansible_port=2112
# Container Runtime Distribution
[container_runtime_docker]
001---test-node-01
101---prod-node-01
201---stage-node-01
[container_runtime_podman]
002---test-node-02
102---prod-node-02
202---stage-node-02
This customer mirrors real production usage — no special test paths or mocked variables.
Test Execution Workflow
1. Environment Setup
# Start all test containers
docker-compose up -d
# Wait for SSH readiness
for port in {2011..2512}; do
timeout 30 bash -c "until nc -z localhost $port; do sleep 1; done"
done
2. Ansible Connectivity Test
# Verify all nodes are reachable
ansible -i inventories/c_00000_acme.ini all -m ping
3. Role Installation Tests
# Test fresh installation
ansible-playbook \
-i inventories/c_00000_acme.ini \
site.yml \
--limit prod \
-e "default_role_mode=install"
# Verify installation
ansible prod -m shell -a "docker --version"
ansible prod -m shell -a "test -f /etc/docker/daemon.json"
4. Role Update Tests
# Test updates preserve state
echo "test-data" | ansible prod -m shell -a "tee /var/lib/docker/test"
ansible-playbook \
-i inventories/c_00000_acme.ini \
site.yml \
--limit prod \
-e "default_role_mode=update"
# Verify data preserved
ansible prod -m shell -a "cat /var/lib/docker/test"
5. Role Removal Tests
# Test clean removal
ansible-playbook \
-i inventories/c_00000_acme.ini \
site.yml \
--limit prod \
-e "container_runtime_role_mode=remove"
# Verify removal
ansible prod -m shell -a "! which docker"
Automated Test Orchestration
A lightweight CLI provides consistent test execution:
Test Runner Implementation
#!/usr/bin/env python3
# platform-test.py
import subprocess
import sys
import time
from pathlib import Path
class FunctionalTest:
def __init__(self):
self.project_root = Path(__file__).parent.parent
self.compose_file = self.project_root / "tests/docker-compose.yml"
def setup(self):
"""Start test infrastructure"""
print("Starting test environment...")
subprocess.run([
"docker-compose", "-f", self.compose_file,
"up", "-d", "--build"
], check=True)
# Wait for SSH
print("Waiting for SSH services...")
time.sleep(5)
def run_playbook(self, limit=None, extra_vars=None):
"""Execute Ansible playbook"""
cmd = [
"ansible-playbook",
"-i", "inventories/c_00000_acme.ini",
"site.yml"
]
if limit:
cmd.extend(["--limit", limit])
if extra_vars:
for key, value in extra_vars.items():
cmd.extend(["-e", f"{key}={value}"])
return subprocess.run(cmd, capture_output=True, text=True)
def verify(self, hosts, command):
"""Run verification command on hosts"""
cmd = [
"ansible", hosts,
"-i", "inventories/c_00000_acme.ini",
"-m", "shell",
"-a", command
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.returncode == 0
def teardown(self):
"""Stop test infrastructure"""
print("Cleaning up test environment...")
subprocess.run([
"docker-compose", "-f", self.compose_file,
"down", "-v"
], check=True)
Test Scenarios
def test_container_runtime_lifecycle():
"""Test container runtime role lifecycle"""
test = FunctionalTest()
try:
test.setup()
# Test installation
result = test.run_playbook(
limit="prod",
extra_vars={"container_runtime_role_mode": "install"}
)
assert result.returncode == 0
assert test.verify("prod", "docker --version")
# Test update
result = test.run_playbook(
limit="prod",
extra_vars={"container_runtime_role_mode": "update"}
)
assert result.returncode == 0
# Test removal
result = test.run_playbook(
limit="prod",
extra_vars={"container_runtime_role_mode": "remove"}
)
assert result.returncode == 0
assert not test.verify("prod", "which docker")
print("✅ Container runtime lifecycle tests passed")
finally:
test.teardown()
Performance Characteristics
The FIT approach delivers impressive performance:
Timing Breakdown
- Environment startup: ~10 seconds (all containers)
- SSH readiness: ~5 seconds
- Full test suite: ~50 seconds
- Teardown: ~2 seconds
Resource Usage
- Memory: ~2GB for 10 containers
- CPU: Minimal (mostly idle)
- Disk: ~500MB (base images cached)
Cost Comparison
| Approach | Time | Cost | Feedback Loop |
|---|---|---|---|
| FIT (Docker Compose) | 50s | $0 | Immediate |
| Cloud VMs | 10-15min | $0.50 | Slow |
| Local VMs | 5-10min | $0 | Medium |
Advanced Testing Patterns
Network Segmentation Validation
# Test that prod cannot reach dev
- name: Verify network isolation
hosts: prod
tasks:
- name: Prod cannot ping dev
shell: "! ping -c 1 172.25.3.11"
register: ping_result
failed_when: ping_result.rc == 0
Security Posture Verification
# Verify environment-specific security
- name: Check security settings
hosts: all
tasks:
- name: Verify firewall in prod
shell: iptables -L -n
when: inventory_hostname in groups['prod']
- name: Verify no firewall in dev
shell: "! which iptables"
when: inventory_hostname in groups['dev']
Cross-Environment Dependencies
# Test emergency environment bastion access
- name: Emergency access pattern
hosts: emerg
tasks:
- name: Only accessible via bastion
assert:
that:
- ansible_ssh_common_args is defined
- "'ProxyJump' in ansible_ssh_common_args"
What This Approach Does Not Test
It's important to understand the boundaries:
Not Tested
- Kernel behavior: Container kernels differ from VMs
- Hardware features: No real device access
- Full systemd: State machines and dependencies
- Performance: Containers have different I/O patterns
- Network latency: Local networks are too fast
Where These Belong
These aspects require higher-level testing:
- Integration tests: Real VMs in cloud
- Performance tests: Production-like hardware
- Security audits: Full system validation
FIT handles the 80% of issues that break deployments. The remaining 20% need specialized testing.
Integration with CI/CD
GitLab CI Example
test:ansible:functional:
stage: test
image: ansible-runner:latest
services:
- docker:dind
script:
- cd tests
- docker-compose up -d
- ./wait-for-ssh.sh
- ansible-playbook -i inventories/c_00000_acme.ini site.yml
- ./run-assertions.sh
after_script:
- docker-compose down -v
artifacts:
when: on_failure
paths:
- tests/logs/
GitHub Actions
name: Functional Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Start test environment
run: |
cd tests
docker-compose up -d
./wait-for-ssh.sh
- name: Run Ansible tests
run: |
ansible-playbook -i inventories/c_00000_acme.ini site.yml
- name: Verify deployment
run: |
cd tests
./run-assertions.sh
- name: Cleanup
if: always()
run: docker-compose down -v
Real-World Benefits
After implementing FIT across multiple projects:
Development Velocity
- Before: 15-20 minute feedback loop (cloud VMs)
- After: 50 second feedback loop
- Impact: 10x more iterations per day
Bug Detection
- Found 37 environment-specific bugs in first month
- Caught variable precedence issues missed by unit tests
- Identified network assumptions in role design
Cost Savings
- Before: $200-300/month in test VM costs
- After: $0 (runs on developer machines)
- CI costs: Reduced by 80%
Confidence
- Every commit tested across all environments
- Role interactions validated continuously
- Production deployments became routine
Lessons Learned
1. Environment Semantics Matter
Single-node tests miss the majority of production issues. Multi-environment testing catches what matters.
2. Real Inventories Find Real Bugs
Using production-like inventories exposes variable precedence issues and group membership bugs.
3. Fast Feedback Loops Change Behavior
When tests run in 50 seconds instead of 15 minutes, developers actually run them.
4. Lifecycle Testing Prevents Regressions
Testing install → update → remove cycles catches state management bugs early.
5. Network Isolation Tests Are Critical
Many production issues come from network assumptions. Test them.
Best Practices
1. Keep Base Images Minimal
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
openssh-server \
python3-minimal \
sudo \
&& rm -rf /var/lib/apt/lists/*
2. Cache Everything Possible
services:
base:
image: test-base:latest
build:
context: .
cache_from:
- test-base:latest
3. Parallelize Test Execution
# Run environment tests in parallel
parallel -j 4 ::: \
"ansible-playbook site.yml --limit test" \
"ansible-playbook site.yml --limit prod" \
"ansible-playbook site.yml --limit stage" \
"ansible-playbook site.yml --limit dev"
4. Make Assertions Explicit
# Don't just run playbooks, verify outcomes
- name: Verify nginx configuration
hosts: webservers
tasks:
- name: Config file exists
stat:
path: /etc/nginx/nginx.conf
register: nginx_config
- name: Validate config
assert:
that:
- nginx_config.stat.exists
- nginx_config.stat.size > 0
- nginx_config.stat.mode == '0644'
Conclusion
Functional Infrastructure Testing with Docker Compose bridges the gap between fast but unrealistic unit tests and accurate but slow integration tests.
By treating test infrastructure like production infrastructure — with real inventories, real environments, and real execution patterns — we can validate the actual behavior that matters in production.
The result is a testing approach that is:
- Fast enough for rapid development
- Realistic enough to catch real bugs
- Simple enough to maintain
- Cheap enough to run everywhere
For teams managing complex Ansible deployments, FIT provides the confidence to deploy frequently without the fear of environment-specific failures.
Technical Deep Dive
For a detailed technical breakdown of the Docker-based test environment, including Dockerfile and docker-compose.yml samples, see the companion article: Inside a Docker-Compose-Based Test Environment for Ansible IaC.
This testing approach has been refined across multiple production Ansible deployments, from small startups to large enterprise environments. The patterns shown here have caught hundreds of environment-specific bugs before they reached production.