Functional Infrastructure Testing for Ansible

DevOpsAnsibleTestingDockerInfrastructure as CodeCI/CD

Validating Multi-Environment IaC with Docker Compose

Testing Ansible is deceptively hard.

Most teams test playbooks. What usually breaks in production, however, are roles interacting with real environments, real variables, and real lifecycle decisions.

This article describes a Functional Infrastructure Testing (FIT) approach that uses Docker Compose to simulate multiple environments and validate Ansible roles as they are actually used — without cloud infrastructure or slow VM pipelines.

Why Testing Ansible Roles Is Hard

Traditional testing approaches have fundamental limitations:

Syntax-Only Validation

ansible-playbook site.yml --check

This validates YAML syntax and task structure but tells you nothing about whether your nginx configuration actually starts nginx, or if your firewall rules lock you out.

Single-Container Tests

# molecule/default/molecule.yml
platforms:
  - name: instance
    image: debian:12

Molecule with a single container tests roles in isolation. But production issues arise from:

  • Role interactions
  • Variable precedence across groups
  • Environment-specific configurations
  • Network segmentation effects

Cloud-Based E2E Tests

Spinning up real AWS/Azure instances for every test run is:

  • Slow (5-10 minutes startup)
  • Expensive ($0.10-$1.00 per test run)
  • Complex to maintain
  • Not suitable for rapid development

What's missing is testing that validates:

  • Multiple environments (prod, stage, dev, emergency)
  • Different security postures per environment
  • Different container runtimes across hosts
  • Real SSH behavior and connection handling
  • Real variable scoping and precedence
  • Role lifecycle transitions

Functional Infrastructure Testing (FIT)

FIT means:

Test infrastructure the same way customers consume it.

This isn't about testing individual tasks or roles in isolation. It's about validating the entire stack as it would be deployed:

  • Real inventories with group hierarchies
  • Real group_vars and host_vars trees
  • Real lifecycle modes (install/update/remove)
  • Real SSH connections with key authentication
  • Real network segmentation between environments
  • Real execution order and dependencies

The key insight: treat test infrastructure like production infrastructure.

Docker Compose as a Test Substrate

Docker Compose provides the perfect balance for infrastructure testing:

Why Docker Compose Works

  • Fast startup: Containers launch in seconds
  • Reproducibility: Same behavior every time
  • Isolated networks: Real network segmentation
  • SSH accessibility: Containers can run SSH daemons
  • Resource efficiency: 10 environments on a laptop
  • CI-friendly: Works in any pipeline

Key Distinction

We're not containerizing applications. We're using containers as lightweight VMs to simulate infrastructure nodes.

Each container:

  • Runs an SSH daemon
  • Has Python installed
  • Accepts Ansible connections
  • Simulates a minimal Linux system

Multi-Environment Architecture

The test setup simulates five distinct environments, each with its own characteristics:

Environment Matrix

EnvironmentSecurity LevelPurposeNetwork
testLegacyCompatibility testing172.25.0.0/24
prodMaximumFull security stack172.25.1.0/24
stageHighProd-like, manual updates172.25.2.0/24
devMinimalRapid development172.25.3.0/24
emergHardenedBreak-glass access172.25.4.0/24

Host Distribution

Each environment contains:

  • 2 application nodes
  • Different container runtime assignments
  • Environment-specific configurations

This creates a realistic test matrix covering the combinations seen in production.

Docker Compose Implementation

Network Isolation

networks:
  test_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.25.0.0/24
  prod_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.25.1.0/24
  # ... other networks

Node Configuration

services:
  # Production Node 01
  101---prod-node-01:
    image: debian:bookworm
    hostname: 101---prod-node-01
    container_name: fit-101---prod-node-01
    networks:
      prod_network:
        ipv4_address: 172.25.1.11
    ports:
      - "2211:22"  # SSH access
    environment:
      - DEBIAN_FRONTEND=noninteractive
    volumes:
      # Mock systemctl for role compatibility
      - ./systemd-mock:/usr/bin/systemctl:ro
      # SSH key for Ansible access
      - ${SSH_KEY_PATH}:/root/.ssh/authorized_keys:ro
    command: |
      sh -c '
        apt-get update &&
        apt-get install -y openssh-server python3 sudo &&
        mkdir -p /run/sshd &&
        echo "PermitRootLogin yes" >> /etc/ssh/sshd_config &&
        /usr/sbin/sshd -D
      '

Complete Test Stack

version: '3.8'

services:
  # Test Environment
  001---test-node-01:
    extends:
      file: docker-compose.base.yml
      service: base-debian-node
    networks:
      test_network:
        ipv4_address: 172.25.0.11
    ports:
      - "2011:22"
      
  002---test-node-02:
    extends:
      file: docker-compose.base.yml
      service: base-debian-node
    networks:
      test_network:
        ipv4_address: 172.25.0.12
    ports:
      - "2012:22"
      
  # Production Environment  
  101---prod-node-01:
    extends:
      file: docker-compose.base.yml
      service: base-debian-node
    networks:
      prod_network:
        ipv4_address: 172.25.1.11
    ports:
      - "2111:22"
      
  # ... continue for all environments

systemd Simulation Strategy

Containers don't run systemd, but Ansible roles expect it. Instead of complex workarounds, we use a lightweight mock:

The Mock Script

#!/bin/bash
# systemd-mock - Minimal systemctl simulator for testing

SERVICE="${2:-unknown}"
ACTION="${1:-status}"

case "$ACTION" in
  start)
    echo "[Mock] Starting $SERVICE"
    touch /tmp/mock-$SERVICE.started
    exit 0
    ;;
  stop)
    echo "[Mock] Stopping $SERVICE"
    rm -f /tmp/mock-$SERVICE.started
    exit 0
    ;;
  restart|reload)
    echo "[Mock] Restarting $SERVICE"
    touch /tmp/mock-$SERVICE.started
    exit 0
    ;;
  enable)
    echo "[Mock] Enabling $SERVICE"
    touch /tmp/mock-$SERVICE.enabled
    exit 0
    ;;
  disable)
    echo "[Mock] Disabling $SERVICE"
    rm -f /tmp/mock-$SERVICE.enabled
    exit 0
    ;;
  is-active)
    if [ -f /tmp/mock-$SERVICE.started ]; then
      echo "active"
      exit 0
    else
      echo "inactive"
      exit 3
    fi
    ;;
  is-enabled)
    if [ -f /tmp/mock-$SERVICE.enabled ]; then
      echo "enabled"
      exit 0
    else
      echo "disabled"
      exit 1
    fi
    ;;
  status)
    echo "[Mock] Status of $SERVICE"
    exit 0
    ;;
  daemon-reload)
    echo "[Mock] Reloading systemd manager"
    exit 0
    ;;
  *)
    echo "[Mock] Unknown action: $ACTION"
    exit 0
    ;;
esac

This preserves role behavior without requiring a real init system, allowing us to test service management tasks.

The Testing Customer Pattern

All tests operate through a dedicated customer structure:

Directory Structure

infrastructure/
  inventories/
    c_00000_acme.ini        # Test customer inventory
  group_vars/
    c_00000_acme/
      all/
        10-base.yml         # Customer defaults
      test/
        20-security.yml     # Test environment config
      prod/
        20-security.yml     # Prod environment config
      stage/
        20-security.yml     # Stage environment config
  host_vars/
    c_00000_acme/
      test/
        001---test-node-01.yml
        002---test-node-02.yml
      prod/
        101---prod-node-01.yml
        102---prod-node-02.yml

Test Inventory

# inventories/c_00000_acme.ini

[all:children]
test
prod
stage
dev
emerg

# Test Environment
[test:children]
test_nodes

[test_nodes]
001---test-node-01 ansible_host=172.25.0.11 ansible_port=2011
002---test-node-02 ansible_host=172.25.0.12 ansible_port=2012

# Production Environment
[prod:children]
prod_nodes

[prod_nodes]
101---prod-node-01 ansible_host=172.25.1.11 ansible_port=2111
102---prod-node-02 ansible_host=172.25.1.12 ansible_port=2112

# Container Runtime Distribution
[container_runtime_docker]
001---test-node-01
101---prod-node-01
201---stage-node-01

[container_runtime_podman]
002---test-node-02
102---prod-node-02
202---stage-node-02

This customer mirrors real production usage — no special test paths or mocked variables.

Test Execution Workflow

1. Environment Setup

# Start all test containers
docker-compose up -d

# Wait for SSH readiness
for port in {2011..2512}; do
  timeout 30 bash -c "until nc -z localhost $port; do sleep 1; done"
done

2. Ansible Connectivity Test

# Verify all nodes are reachable
ansible -i inventories/c_00000_acme.ini all -m ping

3. Role Installation Tests

# Test fresh installation
ansible-playbook \
  -i inventories/c_00000_acme.ini \
  site.yml \
  --limit prod \
  -e "default_role_mode=install"

# Verify installation
ansible prod -m shell -a "docker --version"
ansible prod -m shell -a "test -f /etc/docker/daemon.json"

4. Role Update Tests

# Test updates preserve state
echo "test-data" | ansible prod -m shell -a "tee /var/lib/docker/test"

ansible-playbook \
  -i inventories/c_00000_acme.ini \
  site.yml \
  --limit prod \
  -e "default_role_mode=update"

# Verify data preserved
ansible prod -m shell -a "cat /var/lib/docker/test"

5. Role Removal Tests

# Test clean removal
ansible-playbook \
  -i inventories/c_00000_acme.ini \
  site.yml \
  --limit prod \
  -e "container_runtime_role_mode=remove"

# Verify removal
ansible prod -m shell -a "! which docker"

Automated Test Orchestration

A lightweight CLI provides consistent test execution:

Test Runner Implementation

#!/usr/bin/env python3
# platform-test.py

import subprocess
import sys
import time
from pathlib import Path

class FunctionalTest:
    def __init__(self):
        self.project_root = Path(__file__).parent.parent
        self.compose_file = self.project_root / "tests/docker-compose.yml"
        
    def setup(self):
        """Start test infrastructure"""
        print("Starting test environment...")
        subprocess.run([
            "docker-compose", "-f", self.compose_file, 
            "up", "-d", "--build"
        ], check=True)
        
        # Wait for SSH
        print("Waiting for SSH services...")
        time.sleep(5)
        
    def run_playbook(self, limit=None, extra_vars=None):
        """Execute Ansible playbook"""
        cmd = [
            "ansible-playbook",
            "-i", "inventories/c_00000_acme.ini",
            "site.yml"
        ]
        
        if limit:
            cmd.extend(["--limit", limit])
            
        if extra_vars:
            for key, value in extra_vars.items():
                cmd.extend(["-e", f"{key}={value}"])
                
        return subprocess.run(cmd, capture_output=True, text=True)
        
    def verify(self, hosts, command):
        """Run verification command on hosts"""
        cmd = [
            "ansible", hosts,
            "-i", "inventories/c_00000_acme.ini",
            "-m", "shell",
            "-a", command
        ]
        result = subprocess.run(cmd, capture_output=True, text=True)
        return result.returncode == 0
        
    def teardown(self):
        """Stop test infrastructure"""
        print("Cleaning up test environment...")
        subprocess.run([
            "docker-compose", "-f", self.compose_file, 
            "down", "-v"
        ], check=True)

Test Scenarios

def test_container_runtime_lifecycle():
    """Test container runtime role lifecycle"""
    test = FunctionalTest()
    
    try:
        test.setup()
        
        # Test installation
        result = test.run_playbook(
            limit="prod",
            extra_vars={"container_runtime_role_mode": "install"}
        )
        assert result.returncode == 0
        assert test.verify("prod", "docker --version")
        
        # Test update
        result = test.run_playbook(
            limit="prod", 
            extra_vars={"container_runtime_role_mode": "update"}
        )
        assert result.returncode == 0
        
        # Test removal
        result = test.run_playbook(
            limit="prod",
            extra_vars={"container_runtime_role_mode": "remove"}
        )
        assert result.returncode == 0
        assert not test.verify("prod", "which docker")
        
        print("✅ Container runtime lifecycle tests passed")
        
    finally:
        test.teardown()

Performance Characteristics

The FIT approach delivers impressive performance:

Timing Breakdown

  • Environment startup: ~10 seconds (all containers)
  • SSH readiness: ~5 seconds
  • Full test suite: ~50 seconds
  • Teardown: ~2 seconds

Resource Usage

  • Memory: ~2GB for 10 containers
  • CPU: Minimal (mostly idle)
  • Disk: ~500MB (base images cached)

Cost Comparison

ApproachTimeCostFeedback Loop
FIT (Docker Compose)50s$0Immediate
Cloud VMs10-15min$0.50Slow
Local VMs5-10min$0Medium

Advanced Testing Patterns

Network Segmentation Validation

# Test that prod cannot reach dev
- name: Verify network isolation
  hosts: prod
  tasks:
    - name: Prod cannot ping dev
      shell: "! ping -c 1 172.25.3.11"
      register: ping_result
      failed_when: ping_result.rc == 0

Security Posture Verification

# Verify environment-specific security
- name: Check security settings
  hosts: all
  tasks:
    - name: Verify firewall in prod
      shell: iptables -L -n
      when: inventory_hostname in groups['prod']
      
    - name: Verify no firewall in dev
      shell: "! which iptables"
      when: inventory_hostname in groups['dev']

Cross-Environment Dependencies

# Test emergency environment bastion access
- name: Emergency access pattern
  hosts: emerg
  tasks:
    - name: Only accessible via bastion
      assert:
        that:
          - ansible_ssh_common_args is defined
          - "'ProxyJump' in ansible_ssh_common_args"

What This Approach Does Not Test

It's important to understand the boundaries:

Not Tested

  • Kernel behavior: Container kernels differ from VMs
  • Hardware features: No real device access
  • Full systemd: State machines and dependencies
  • Performance: Containers have different I/O patterns
  • Network latency: Local networks are too fast

Where These Belong

These aspects require higher-level testing:

  • Integration tests: Real VMs in cloud
  • Performance tests: Production-like hardware
  • Security audits: Full system validation

FIT handles the 80% of issues that break deployments. The remaining 20% need specialized testing.

Integration with CI/CD

GitLab CI Example

test:ansible:functional:
  stage: test
  image: ansible-runner:latest
  services:
    - docker:dind
  script:
    - cd tests
    - docker-compose up -d
    - ./wait-for-ssh.sh
    - ansible-playbook -i inventories/c_00000_acme.ini site.yml
    - ./run-assertions.sh
  after_script:
    - docker-compose down -v
  artifacts:
    when: on_failure
    paths:
      - tests/logs/

GitHub Actions

name: Functional Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Start test environment
        run: |
          cd tests
          docker-compose up -d
          ./wait-for-ssh.sh
          
      - name: Run Ansible tests
        run: |
          ansible-playbook -i inventories/c_00000_acme.ini site.yml
          
      - name: Verify deployment
        run: |
          cd tests
          ./run-assertions.sh
          
      - name: Cleanup
        if: always()
        run: docker-compose down -v

Real-World Benefits

After implementing FIT across multiple projects:

Development Velocity

  • Before: 15-20 minute feedback loop (cloud VMs)
  • After: 50 second feedback loop
  • Impact: 10x more iterations per day

Bug Detection

  • Found 37 environment-specific bugs in first month
  • Caught variable precedence issues missed by unit tests
  • Identified network assumptions in role design

Cost Savings

  • Before: $200-300/month in test VM costs
  • After: $0 (runs on developer machines)
  • CI costs: Reduced by 80%

Confidence

  • Every commit tested across all environments
  • Role interactions validated continuously
  • Production deployments became routine

Lessons Learned

1. Environment Semantics Matter

Single-node tests miss the majority of production issues. Multi-environment testing catches what matters.

2. Real Inventories Find Real Bugs

Using production-like inventories exposes variable precedence issues and group membership bugs.

3. Fast Feedback Loops Change Behavior

When tests run in 50 seconds instead of 15 minutes, developers actually run them.

4. Lifecycle Testing Prevents Regressions

Testing install → update → remove cycles catches state management bugs early.

5. Network Isolation Tests Are Critical

Many production issues come from network assumptions. Test them.

Best Practices

1. Keep Base Images Minimal

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
    openssh-server \
    python3-minimal \
    sudo \
    && rm -rf /var/lib/apt/lists/*

2. Cache Everything Possible

services:
  base:
    image: test-base:latest
    build:
      context: .
      cache_from:
        - test-base:latest

3. Parallelize Test Execution

# Run environment tests in parallel
parallel -j 4 ::: \
  "ansible-playbook site.yml --limit test" \
  "ansible-playbook site.yml --limit prod" \
  "ansible-playbook site.yml --limit stage" \
  "ansible-playbook site.yml --limit dev"

4. Make Assertions Explicit

# Don't just run playbooks, verify outcomes
- name: Verify nginx configuration
  hosts: webservers
  tasks:
    - name: Config file exists
      stat:
        path: /etc/nginx/nginx.conf
      register: nginx_config
      
    - name: Validate config
      assert:
        that:
          - nginx_config.stat.exists
          - nginx_config.stat.size > 0
          - nginx_config.stat.mode == '0644'

Conclusion

Functional Infrastructure Testing with Docker Compose bridges the gap between fast but unrealistic unit tests and accurate but slow integration tests.

By treating test infrastructure like production infrastructure — with real inventories, real environments, and real execution patterns — we can validate the actual behavior that matters in production.

The result is a testing approach that is:

  • Fast enough for rapid development
  • Realistic enough to catch real bugs
  • Simple enough to maintain
  • Cheap enough to run everywhere

For teams managing complex Ansible deployments, FIT provides the confidence to deploy frequently without the fear of environment-specific failures.

Technical Deep Dive

For a detailed technical breakdown of the Docker-based test environment, including Dockerfile and docker-compose.yml samples, see the companion article: Inside a Docker-Compose-Based Test Environment for Ansible IaC.


This testing approach has been refined across multiple production Ansible deployments, from small startups to large enterprise environments. The patterns shown here have caught hundreds of environment-specific bugs before they reached production.

Functional Infrastructure Testing for Ansible - Patrick Paechnatz