Functional Infrastructure Testing for Ansible

Validating Multi-Environment IaC with Docker Compose

Testing Ansible is deceptively hard.

Most teams test playbooks. What usually breaks in production, however, are roles interacting with real environments, real variables, and real lifecycle decisions.

This article describes a Functional Infrastructure Testing (FIT) approach that uses Docker Compose to simulate multiple environments and validate Ansible roles as they are actually used — without cloud infrastructure or slow VM pipelines.

Why Testing Ansible Roles Is Hard

Traditional testing approaches have fundamental limitations:

Syntax-Only Validation

ansible-playbook site.yml --check

This validates YAML syntax and task structure but tells you nothing about whether your nginx configuration actually starts nginx, or if your firewall rules lock you out.

Single-Container Tests

# molecule/default/molecule.yml
platforms:
  - name: instance
    image: debian:12

Molecule with a single container tests roles in isolation. But production issues arise from:

Role interactions
Variable precedence across groups
Environment-specific configurations
Network segmentation effects

Cloud-Based E2E Tests

Spinning up real AWS/Azure instances for every test run is:

Slow (5-10 minutes startup)
Expensive ($0.10-$1.00 per test run)
Complex to maintain
Not suitable for rapid development

What's missing is testing that validates:

Multiple environments (prod, stage, dev, emergency)
Different security postures per environment
Different container runtimes across hosts
Real SSH behavior and connection handling
Real variable scoping and precedence
Role lifecycle transitions

Functional Infrastructure Testing (FIT)

FIT means:

Test infrastructure the same way customers consume it.

This isn't about testing individual tasks or roles in isolation. It's about validating the entire stack as it would be deployed:

Real inventories with group hierarchies
Real group_vars and host_vars trees
Real lifecycle modes (install/update/remove)
Real SSH connections with key authentication
Real network segmentation between environments
Real execution order and dependencies

The key insight: treat test infrastructure like production infrastructure.

Docker Compose as a Test Substrate

Docker Compose provides the perfect balance for infrastructure testing:

Why Docker Compose Works

Fast startup: Containers launch in seconds
Reproducibility: Same behavior every time
Isolated networks: Real network segmentation
SSH accessibility: Containers can run SSH daemons
Resource efficiency: 10 environments on a laptop
CI-friendly: Works in any pipeline

Key Distinction

We're not containerizing applications. We're using containers as lightweight VMs to simulate infrastructure nodes.

Each container:

Runs an SSH daemon
Has Python installed
Accepts Ansible connections
Simulates a minimal Linux system

Multi-Environment Architecture

The test setup simulates five distinct environments, each with its own characteristics:

Environment Matrix

Environment	Security Level	Purpose	Network
test	Legacy	Compatibility testing	`172.25.0.0/24`
prod	Maximum	Full security stack	`172.25.1.0/24`
stage	High	Prod-like, manual updates	`172.25.2.0/24`
dev	Minimal	Rapid development	`172.25.3.0/24`
emerg	Hardened	Break-glass access	`172.25.4.0/24`

Host Distribution

Each environment contains:

2 application nodes
Different container runtime assignments
Environment-specific configurations

This creates a realistic test matrix covering the combinations seen in production.

Docker Compose Implementation

Network Isolation

networks:
  test_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.25.0.0/24
  prod_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.25.1.0/24
  # ... other networks

Node Configuration

services:
  # Production Node 01
  101---prod-node-01:
    image: debian:bookworm
    hostname: 101---prod-node-01
    container_name: fit-101---prod-node-01
    networks:
      prod_network:
        ipv4_address: 172.25.1.11
    ports:
      - "2211:22"  # SSH access
    environment:
      - DEBIAN_FRONTEND=noninteractive
    volumes:
      # Mock systemctl for role compatibility
      - ./systemd-mock:/usr/bin/systemctl:ro
      # SSH key for Ansible access
      - ${SSH_KEY_PATH}:/root/.ssh/authorized_keys:ro
    command: |
      sh -c '
        apt-get update &&
        apt-get install -y openssh-server python3 sudo &&
        mkdir -p /run/sshd &&
        echo "PermitRootLogin yes" >> /etc/ssh/sshd_config &&
        /usr/sbin/sshd -D
      '

Complete Test Stack

version: '3.8'

services:
  # Test Environment
  001---test-node-01:
    extends:
      file: docker-compose.base.yml
      service: base-debian-node
    networks:
      test_network:
        ipv4_address: 172.25.0.11
    ports:
      - "2011:22"
      
  002---test-node-02:
    extends:
      file: docker-compose.base.yml
      service: base-debian-node
    networks:
      test_network:
        ipv4_address: 172.25.0.12
    ports:
      - "2012:22"
      
  # Production Environment  
  101---prod-node-01:
    extends:
      file: docker-compose.base.yml
      service: base-debian-node
    networks:
      prod_network:
        ipv4_address: 172.25.1.11
    ports:
      - "2111:22"
      
  # ... continue for all environments

systemd Simulation Strategy

Containers don't run systemd, but Ansible roles expect it. Instead of complex workarounds, we use a lightweight mock:

The Mock Script

#!/bin/bash
# systemd-mock - Minimal systemctl simulator for testing

SERVICE="${2:-unknown}"
ACTION="${1:-status}"

case "$ACTION" in
  start)
    echo "[Mock] Starting $SERVICE"
    touch /tmp/mock-$SERVICE.started
    exit 0
    ;;
  stop)
    echo "[Mock] Stopping $SERVICE"
    rm -f /tmp/mock-$SERVICE.started
    exit 0
    ;;
  restart|reload)
    echo "[Mock] Restarting $SERVICE"
    touch /tmp/mock-$SERVICE.started
    exit 0
    ;;
  enable)
    echo "[Mock] Enabling $SERVICE"
    touch /tmp/mock-$SERVICE.enabled
    exit 0
    ;;
  disable)
    echo "[Mock] Disabling $SERVICE"
    rm -f /tmp/mock-$SERVICE.enabled
    exit 0
    ;;
  is-active)
    if [ -f /tmp/mock-$SERVICE.started ]; then
      echo "active"
      exit 0
    else
      echo "inactive"
      exit 3
    fi
    ;;
  is-enabled)
    if [ -f /tmp/mock-$SERVICE.enabled ]; then
      echo "enabled"
      exit 0
    else
      echo "disabled"
      exit 1
    fi
    ;;
  status)
    echo "[Mock] Status of $SERVICE"
    exit 0
    ;;
  daemon-reload)
    echo "[Mock] Reloading systemd manager"
    exit 0
    ;;
  *)
    echo "[Mock] Unknown action: $ACTION"
    exit 0
    ;;
esac

This preserves role behavior without requiring a real init system, allowing us to test service management tasks.

The Testing Customer Pattern

All tests operate through a dedicated customer structure:

Directory Structure

infrastructure/
  inventories/
    c_00000_acme.ini        # Test customer inventory
  group_vars/
    c_00000_acme/
      all/
        10-base.yml         # Customer defaults
      test/
        20-security.yml     # Test environment config
      prod/
        20-security.yml     # Prod environment config
      stage/
        20-security.yml     # Stage environment config
  host_vars/
    c_00000_acme/
      test/
        001---test-node-01.yml
        002---test-node-02.yml
      prod/
        101---prod-node-01.yml
        102---prod-node-02.yml

Test Inventory

# inventories/c_00000_acme.ini

[all:children]
test
prod
stage
dev
emerg

# Test Environment
[test:children]
test_nodes

[test_nodes]
001---test-node-01 ansible_host=172.25.0.11 ansible_port=2011
002---test-node-02 ansible_host=172.25.0.12 ansible_port=2012

# Production Environment
[prod:children]
prod_nodes

[prod_nodes]
101---prod-node-01 ansible_host=172.25.1.11 ansible_port=2111
102---prod-node-02 ansible_host=172.25.1.12 ansible_port=2112

# Container Runtime Distribution
[container_runtime_docker]
001---test-node-01
101---prod-node-01
201---stage-node-01

[container_runtime_podman]
002---test-node-02
102---prod-node-02
202---stage-node-02

This customer mirrors real production usage — no special test paths or mocked variables.

Test Execution Workflow

1. Environment Setup

# Start all test containers
docker-compose up -d

# Wait for SSH readiness
for port in {2011..2512}; do
  timeout 30 bash -c "until nc -z localhost $port; do sleep 1; done"
done

2. Ansible Connectivity Test

# Verify all nodes are reachable
ansible -i inventories/c_00000_acme.ini all -m ping

3. Role Installation Tests

# Test fresh installation
ansible-playbook \
  -i inventories/c_00000_acme.ini \
  site.yml \
  --limit prod \
  -e "default_role_mode=install"

# Verify installation
ansible prod -m shell -a "docker --version"
ansible prod -m shell -a "test -f /etc/docker/daemon.json"

4. Role Update Tests

# Test updates preserve state
echo "test-data" | ansible prod -m shell -a "tee /var/lib/docker/test"

ansible-playbook \
  -i inventories/c_00000_acme.ini \
  site.yml \
  --limit prod \
  -e "default_role_mode=update"

# Verify data preserved
ansible prod -m shell -a "cat /var/lib/docker/test"

5. Role Removal Tests

# Test clean removal
ansible-playbook \
  -i inventories/c_00000_acme.ini \
  site.yml \
  --limit prod \
  -e "container_runtime_role_mode=remove"

# Verify removal
ansible prod -m shell -a "! which docker"

Automated Test Orchestration

A lightweight CLI provides consistent test execution:

Test Runner Implementation

#!/usr/bin/env python3
# platform-test.py

import subprocess
import sys
import time
from pathlib import Path

class FunctionalTest:
    def __init__(self):
        self.project_root = Path(__file__).parent.parent
        self.compose_file = self.project_root / "tests/docker-compose.yml"
        
    def setup(self):
        """Start test infrastructure"""
        print("Starting test environment...")
        subprocess.run([
            "docker-compose", "-f", self.compose_file, 
            "up", "-d", "--build"
        ], check=True)
        
        # Wait for SSH
        print("Waiting for SSH services...")
        time.sleep(5)
        
    def run_playbook(self, limit=None, extra_vars=None):
        """Execute Ansible playbook"""
        cmd = [
            "ansible-playbook",
            "-i", "inventories/c_00000_acme.ini",
            "site.yml"
        ]
        
        if limit:
            cmd.extend(["--limit", limit])
            
        if extra_vars:
            for key, value in extra_vars.items():
                cmd.extend(["-e", f"{key}={value}"])
                
        return subprocess.run(cmd, capture_output=True, text=True)
        
    def verify(self, hosts, command):
        """Run verification command on hosts"""
        cmd = [
            "ansible", hosts,
            "-i", "inventories/c_00000_acme.ini",
            "-m", "shell",
            "-a", command
        ]
        result = subprocess.run(cmd, capture_output=True, text=True)
        return result.returncode == 0
        
    def teardown(self):
        """Stop test infrastructure"""
        print("Cleaning up test environment...")
        subprocess.run([
            "docker-compose", "-f", self.compose_file, 
            "down", "-v"
        ], check=True)

Test Scenarios

def test_container_runtime_lifecycle():
    """Test container runtime role lifecycle"""
    test = FunctionalTest()
    
    try:
        test.setup()
        
        # Test installation
        result = test.run_playbook(
            limit="prod",
            extra_vars={"container_runtime_role_mode": "install"}
        )
        assert result.returncode == 0
        assert test.verify("prod", "docker --version")
        
        # Test update
        result = test.run_playbook(
            limit="prod", 
            extra_vars={"container_runtime_role_mode": "update"}
        )
        assert result.returncode == 0
        
        # Test removal
        result = test.run_playbook(
            limit="prod",
            extra_vars={"container_runtime_role_mode": "remove"}
        )
        assert result.returncode == 0
        assert not test.verify("prod", "which docker")
        
        print("✅ Container runtime lifecycle tests passed")
        
    finally:
        test.teardown()

Performance Characteristics

The FIT approach delivers impressive performance:

Timing Breakdown

Environment startup: ~10 seconds (all containers)
SSH readiness: ~5 seconds
Full test suite: ~50 seconds
Teardown: ~2 seconds

Resource Usage

Memory: ~2GB for 10 containers
CPU: Minimal (mostly idle)
Disk: ~500MB (base images cached)

Cost Comparison

Approach	Time	Cost	Feedback Loop
FIT (Docker Compose)	50s	$0	Immediate
Cloud VMs	10-15min	$0.50	Slow
Local VMs	5-10min	$0	Medium

Advanced Testing Patterns

Network Segmentation Validation

# Test that prod cannot reach dev
- name: Verify network isolation
  hosts: prod
  tasks:
    - name: Prod cannot ping dev
      shell: "! ping -c 1 172.25.3.11"
      register: ping_result
      failed_when: ping_result.rc == 0

Security Posture Verification

# Verify environment-specific security
- name: Check security settings
  hosts: all
  tasks:
    - name: Verify firewall in prod
      shell: iptables -L -n
      when: inventory_hostname in groups['prod']
      
    - name: Verify no firewall in dev
      shell: "! which iptables"
      when: inventory_hostname in groups['dev']

Cross-Environment Dependencies

# Test emergency environment bastion access
- name: Emergency access pattern
  hosts: emerg
  tasks:
    - name: Only accessible via bastion
      assert:
        that:
          - ansible_ssh_common_args is defined
          - "'ProxyJump' in ansible_ssh_common_args"

What This Approach Does Not Test

It's important to understand the boundaries:

Not Tested

Kernel behavior: Container kernels differ from VMs
Hardware features: No real device access
Full systemd: State machines and dependencies
Performance: Containers have different I/O patterns
Network latency: Local networks are too fast

Where These Belong

These aspects require higher-level testing:

Integration tests: Real VMs in cloud
Performance tests: Production-like hardware
Security audits: Full system validation

FIT handles the 80% of issues that break deployments. The remaining 20% need specialized testing.

Integration with CI/CD

GitLab CI Example

test:ansible:functional:
  stage: test
  image: ansible-runner:latest
  services:
    - docker:dind
  script:
    - cd tests
    - docker-compose up -d
    - ./wait-for-ssh.sh
    - ansible-playbook -i inventories/c_00000_acme.ini site.yml
    - ./run-assertions.sh
  after_script:
    - docker-compose down -v
  artifacts:
    when: on_failure
    paths:
      - tests/logs/

GitHub Actions

name: Functional Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Start test environment
        run: |
          cd tests
          docker-compose up -d
          ./wait-for-ssh.sh
          
      - name: Run Ansible tests
        run: |
          ansible-playbook -i inventories/c_00000_acme.ini site.yml
          
      - name: Verify deployment
        run: |
          cd tests
          ./run-assertions.sh
          
      - name: Cleanup
        if: always()
        run: docker-compose down -v

Real-World Benefits

After implementing FIT across multiple projects:

Development Velocity

Before: 15-20 minute feedback loop (cloud VMs)
After: 50 second feedback loop
Impact: 10x more iterations per day

Bug Detection

Found 37 environment-specific bugs in first month
Caught variable precedence issues missed by unit tests
Identified network assumptions in role design

Cost Savings

Before: $200-300/month in test VM costs
After: $0 (runs on developer machines)
CI costs: Reduced by 80%

Confidence

Every commit tested across all environments
Role interactions validated continuously
Production deployments became routine

Lessons Learned

1. Environment Semantics Matter

Single-node tests miss the majority of production issues. Multi-environment testing catches what matters.

2. Real Inventories Find Real Bugs

Using production-like inventories exposes variable precedence issues and group membership bugs.

3. Fast Feedback Loops Change Behavior

When tests run in 50 seconds instead of 15 minutes, developers actually run them.

4. Lifecycle Testing Prevents Regressions

Testing install → update → remove cycles catches state management bugs early.

5. Network Isolation Tests Are Critical

Many production issues come from network assumptions. Test them.

Best Practices

1. Keep Base Images Minimal

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
    openssh-server \
    python3-minimal \
    sudo \
    && rm -rf /var/lib/apt/lists/*

2. Cache Everything Possible

services:
  base:
    image: test-base:latest
    build:
      context: .
      cache_from:
        - test-base:latest

3. Parallelize Test Execution

# Run environment tests in parallel
parallel -j 4 ::: \
  "ansible-playbook site.yml --limit test" \
  "ansible-playbook site.yml --limit prod" \
  "ansible-playbook site.yml --limit stage" \
  "ansible-playbook site.yml --limit dev"

4. Make Assertions Explicit

# Don't just run playbooks, verify outcomes
- name: Verify nginx configuration
  hosts: webservers
  tasks:
    - name: Config file exists
      stat:
        path: /etc/nginx/nginx.conf
      register: nginx_config
      
    - name: Validate config
      assert:
        that:
          - nginx_config.stat.exists
          - nginx_config.stat.size > 0
          - nginx_config.stat.mode == '0644'

Conclusion

Functional Infrastructure Testing with Docker Compose bridges the gap between fast but unrealistic unit tests and accurate but slow integration tests.

By treating test infrastructure like production infrastructure — with real inventories, real environments, and real execution patterns — we can validate the actual behavior that matters in production.

The result is a testing approach that is:

Fast enough for rapid development
Realistic enough to catch real bugs
Simple enough to maintain
Cheap enough to run everywhere

For teams managing complex Ansible deployments, FIT provides the confidence to deploy frequently without the fear of environment-specific failures.

Technical Deep Dive

For a detailed technical breakdown of the Docker-based test environment, including Dockerfile and docker-compose.yml samples, see the companion article: Inside a Docker-Compose-Based Test Environment for Ansible IaC.

This testing approach has been refined across multiple production Ansible deployments, from small startups to large enterprise environments. The patterns shown here have caught hundreds of environment-specific bugs before they reached production.