Inside a Docker-Compose-Based Test Environment for Ansible IaC

DevOpsDockerAnsibleTestingInfrastructure as CodeTechnical Deep Dive

Building OS-Like Test Nodes for Functional Infrastructure Testing

Functional Infrastructure Testing (FIT) for Ansible lives or dies with the quality of its test substrate.

If your test nodes do not resemble real systems closely enough, you only validate syntax and happy paths. This article provides a technical deep dive into the Docker-based test environment used for FIT, explaining why each decision was made, what it enables, and where its deliberate limits are.

This article complements the higher-level FIT concept and focuses purely on implementation mechanics.

Purpose of the Test Environment

The goal of this environment is not to test containers.

It is designed to provide:

  • OS-like targets
  • reachable via real SSH
  • driven by real inventories
  • capable of running real Ansible roles
  • fast enough for local and CI execution

In short:

Test infrastructure behavior, not container behavior.

Why Docker Compose (and not Molecule)

Docker Compose was chosen deliberately for its transparency and control:

Direct Control

  • Predictable networking with static IPs
  • Explicit port mappings you can see
  • Simple orchestration without magic
  • No hidden abstractions or wrappers

Real-World Fidelity

Unlike Molecule, this setup:

  • Does not invent a new testing DSL
  • Does not wrap Ansible execution
  • Does not hide network topology
  • Does not special-case inventories

Everything that Ansible sees looks like a real deployment.

The Dockerfile: Building an OS-Like Ansible Target

The Dockerfile defines a minimal Debian-based node that behaves like a remote server from Ansible's point of view.

Design Goals

  • SSH-first access (like production)
  • Python available for Ansible modules
  • Minimal userspace for speed
  • No init system assumptions
  • Fast build and startup time

Key Implementation

FROM debian:bookworm

ENV DEBIAN_FRONTEND=noninteractive

# Install required packages
RUN apt-get update && \
    apt-get install -y \
        systemd \
        systemd-sysv \
        openssh-server \
        sudo \
        python3 \
        python3-apt \
        curl \
        ca-certificates \
        gnupg \
        lsb-release \
        iproute2 \
        procps && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Configure SSH
RUN mkdir -p /var/run/sshd && \
    mkdir -p /root/.ssh && \
    chmod 700 /root/.ssh && \
    sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
    sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config

# Remove unnecessary systemd services for container
RUN rm -f /lib/systemd/system/multi-user.target.wants/* && \
    rm -f /etc/systemd/system/*.wants/* && \
    rm -f /lib/systemd/system/local-fs.target.wants/* && \
    rm -f /lib/systemd/system/sockets.target.wants/*udev* && \
    rm -f /lib/systemd/system/sockets.target.wants/*initctl*

# Enable SSH service
RUN systemctl enable ssh.service

EXPOSE 22

CMD ["/sbin/init"]

Notable Choices

  1. Full systemd: Unlike the simplified example in the overview article, this uses real systemd in privileged containers
  2. SSH key only: Password authentication disabled for realism
  3. Minimal services: Only SSH and essential services enabled
  4. Debian base: Matches common production targets

systemd Strategy: Real Init in Privileged Containers

This implementation takes a different approach than the mock shown in the overview:

Real systemd

privileged: true
cap_add:
  - ALL
volumes:
  - /sys/fs/cgroup:/sys/fs/cgroup:ro

This allows:

  • Real service management
  • Actual systemctl commands
  • Service dependency handling
  • More realistic testing

Trade-offs

  • Requires privileged containers
  • Slightly slower startup
  • More resource usage
  • Platform-specific (Linux hosts)

For environments where privileged containers aren't acceptable, fall back to the mock approach.

docker-compose.yml: Modeling Multi-Environment Infrastructure

Each container represents a node, not an application.

Naming Convention

<envID>---<env>-node-<index>

Examples:

  • 101---prod-node-01 (Production node 1)
  • 201---stage-node-01 (Staging node 1)
  • 301---dev-node-01 (Development node 1)

This convention:

  • Matches inventory hostnames exactly
  • Keeps logs and audits readable
  • Allows easy environment filtering
  • Supports numeric sorting

Complete Environment Example

services:
  # Production Environment Nodes
  prod-node-01:
    build: 
      context: .
      dockerfile: Dockerfile
    container_name: 101---prod-node-01
    hostname: prod-node-01
    privileged: true
    cap_add:
      - ALL
    security_opt:
      - apparmor:unconfined
      - seccomp:unconfined
    sysctls:
      - net.ipv4.ip_forward=1
      - net.ipv4.conf.all.rp_filter=0
    networks:
      acme-test:
        ipv4_address: 172.25.1.11
    ports:
      - "2211:22"
    volumes:
      - ../../.vault/.ssh/customers/c_00000_acme/test/id_ed25519.pub:/root/.ssh/authorized_keys:ro
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    environment:
      - ENVIRONMENT=prod
    restart: unless-stopped

Network Architecture

networks:
  acme-test:
    driver: bridge
    ipam:
      config:
        - subnet: 172.25.0.0/16
          gateway: 172.25.0.1

Each environment gets its own subnet within the larger network:

  • Test: 172.25.0.0/24
  • Prod: 172.25.1.0/24
  • Stage: 172.25.2.0/24
  • Dev: 172.25.3.0/24
  • Emergency: 172.25.4.0/24

SSH Key Management for Testing

The test environment uses a pragmatic approach to SSH key distribution:

Directory Structure

.vault/
  .ssh/
    customers/
      c_00000_acme/
        test/
          id_ed25519
          id_ed25519.pub

Key Distribution

volumes:
  - ../../.vault/.ssh/customers/c_00000_acme/test/id_ed25519.pub:/root/.ssh/authorized_keys:ro

Important Security Note

⚠️ Testing Environment Only
This approach of storing SSH keys in .vault/ is designed for ephemeral test environments only.

  • The .vault/ directory is git-ignored
  • Keys are test-only and regularly rotated
  • This mirrors the existing Ansible structure

For production, use proper secret management (HashiCorp Vault, AWS Secrets Manager, etc.)

Port Mapping Strategy

Each node gets a unique SSH port on the host:

Port Assignment Pattern

2<env><node>
  • Test environment: 2201-2202
  • Prod environment: 2211-2212
  • Stage environment: 2221-2222
  • Dev environment: 2231-2232
  • Emergency environment: 2241-2242

This enables:

# Direct SSH access
ssh -p 2211 root@localhost

# Ansible inventory configuration
101---prod-node-01 ansible_host=localhost ansible_port=2211

Environment-Specific Configuration

Each environment models different operational characteristics:

Production (101-102)

  • Full security stack
  • All hardening roles active
  • Restrictive firewall rules
  • Complete monitoring

Stage (201-202)

  • Production-like configuration
  • Manual update processes
  • Testing ground for changes

Development (301-302)

  • Minimal security
  • Fast iteration
  • Developer conveniences

Emergency (401-402)

  • Hardened baseline
  • Bastion-only access
  • Break-glass procedures

Test/Legacy (001-002)

  • Backward compatibility
  • Legacy system simulation
  • Migration testing

Practical Usage Patterns

Starting the Environment

# Start all environments
docker-compose up -d

# Start specific environment
docker-compose up -d prod-node-01 prod-node-02

# View logs
docker-compose logs -f prod-node-01

Ansible Integration

# inventory/c_00000_acme.ini
[prod]
101---prod-node-01 ansible_host=localhost ansible_port=2211
102---prod-node-02 ansible_host=localhost ansible_port=2212

[stage]
201---stage-node-01 ansible_host=localhost ansible_port=2221
202---stage-node-02 ansible_host=localhost ansible_port=2222

Running Tests

# Test connectivity
ansible -i inventory/c_00000_acme.ini all -m ping

# Run playbook
ansible-playbook -i inventory/c_00000_acme.ini site.yml --limit prod

Performance Optimizations

Layer Caching

The Dockerfile is structured for optimal caching:

  1. Package installation (changes rarely)
  2. SSH configuration (static)
  3. systemd cleanup (static)
  4. Service enabling (static)

Parallel Startup

All containers start in parallel:

# Time to full environment
real    0m12.847s
user    0m1.234s
sys     0m0.456s

Resource Limits

deploy:
  resources:
    limits:
      cpus: '0.5'
      memory: 512M
    reservations:
      memory: 256M

Debugging Failed Tests

Container Shell Access

# Get shell in running container
docker exec -it 101---prod-node-01 bash

# Check systemd status
docker exec 101---prod-node-01 systemctl status

# View SSH logs
docker exec 101---prod-node-01 journalctl -u ssh

Ansible Debugging

# Verbose output
ansible-playbook -i inventory/c_00000_acme.ini site.yml -vvv

# Step through tasks
ansible-playbook -i inventory/c_00000_acme.ini site.yml --step

Network Debugging

# Check connectivity between nodes
docker exec 101---prod-node-01 ping 172.25.2.11

# Verify port access
docker exec 101---prod-node-01 nc -zv 172.25.2.11 22

CI/CD Integration

GitHub Actions Example

name: Infrastructure Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Start test infrastructure
        run: |
          cd tests/docker
          docker-compose up -d
          
      - name: Wait for SSH
        run: |
          for port in {2201..2242}; do
            timeout 30 bash -c "until nc -z localhost $port; do sleep 1; done"
          done
          
      - name: Run tests
        run: |
          ansible-playbook -i inventory/c_00000_acme.ini site.yml
          
      - name: Cleanup
        if: always()
        run: docker-compose down -v

Limitations and Boundaries

What This Environment Does NOT Test

  1. Kernel-specific behavior

    • Container kernel ≠ VM kernel
    • Kernel modules not available
    • Some sysctl values read-only
  2. Hardware interactions

    • No real network interfaces
    • No block devices
    • No hardware crypto
  3. Full systemd complexity

    • Some unit types unsupported
    • Resource limits differ
    • cgroup v2 differences
  4. Performance characteristics

    • Different I/O patterns
    • Memory behavior varies
    • CPU scheduling differences

When to Use Higher-Level Testing

Use real VMs or cloud instances for:

  • Kernel module testing
  • Network performance validation
  • Storage subsystem testing
  • Full security audits

Download the Reference Implementation

These files provide a complete reference implementation for multi-environment Ansible testing. Adapt them to your specific requirements and constraints.

Best Practices for Production Use

  1. Separate test keys from production keys

    • Use dedicated test CA
    • Rotate regularly
    • Never commit to git
  2. Resource limits in CI

    • Set memory limits
    • Limit CPU usage
    • Use --parallel flag carefully
  3. Container registry

    • Build base image once
    • Push to registry
    • Pull in CI for speed
  4. Test data management

    • Reset between test runs
    • Use volumes for persistence
    • Clean up after tests

Closing Thoughts

Infrastructure testing fails when abstraction hides reality.

This Docker-Compose-based environment keeps abstraction low and behavior explicit. It trades completeness for speed, clarity, and reproducibility — the right trade-off for functional infrastructure testing.

The privileged systemd approach provides more realism than mocks, while the network isolation and SSH access ensure tests exercise the same code paths as production.

For teams serious about Ansible testing, this environment provides a foundation that scales from developer laptops to CI pipelines, catching real issues before they reach real infrastructure.


This implementation has been battle-tested across multiple Ansible projects, from small startups to large enterprises. The patterns shown here represent the sweet spot between realism and practicality for infrastructure testing.

Inside a Docker-Compose-Based Test Environment for Ansible IaC - Patrick Paechnatz