Inside a Docker-Compose-Based Test Environment for Ansible IaC

Building OS-Like Test Nodes for Functional Infrastructure Testing

Functional Infrastructure Testing (FIT) for Ansible lives or dies with the quality of its test substrate.

If your test nodes do not resemble real systems closely enough, you only validate syntax and happy paths. This article provides a technical deep dive into the Docker-based test environment used for FIT, explaining why each decision was made, what it enables, and where its deliberate limits are.

This article complements the higher-level FIT concept and focuses purely on implementation mechanics.

Purpose of the Test Environment

The goal of this environment is not to test containers.

It is designed to provide:

OS-like targets
reachable via real SSH
driven by real inventories
capable of running real Ansible roles
fast enough for local and CI execution

In short:

Test infrastructure behavior, not container behavior.

Why Docker Compose (and not Molecule)

Docker Compose was chosen deliberately for its transparency and control:

Direct Control

Predictable networking with static IPs
Explicit port mappings you can see
Simple orchestration without magic
No hidden abstractions or wrappers

Real-World Fidelity

Unlike Molecule, this setup:

Does not invent a new testing DSL
Does not wrap Ansible execution
Does not hide network topology
Does not special-case inventories

Everything that Ansible sees looks like a real deployment.

The Dockerfile: Building an OS-Like Ansible Target

The Dockerfile defines a minimal Debian-based node that behaves like a remote server from Ansible's point of view.

Design Goals

SSH-first access (like production)
Python available for Ansible modules
Minimal userspace for speed
No init system assumptions
Fast build and startup time

Key Implementation

FROM debian:bookworm

ENV DEBIAN_FRONTEND=noninteractive

# Install required packages
RUN apt-get update && \
    apt-get install -y \
        systemd \
        systemd-sysv \
        openssh-server \
        sudo \
        python3 \
        python3-apt \
        curl \
        ca-certificates \
        gnupg \
        lsb-release \
        iproute2 \
        procps && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Configure SSH
RUN mkdir -p /var/run/sshd && \
    mkdir -p /root/.ssh && \
    chmod 700 /root/.ssh && \
    sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
    sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config

# Remove unnecessary systemd services for container
RUN rm -f /lib/systemd/system/multi-user.target.wants/* && \
    rm -f /etc/systemd/system/*.wants/* && \
    rm -f /lib/systemd/system/local-fs.target.wants/* && \
    rm -f /lib/systemd/system/sockets.target.wants/*udev* && \
    rm -f /lib/systemd/system/sockets.target.wants/*initctl*

# Enable SSH service
RUN systemctl enable ssh.service

EXPOSE 22

CMD ["/sbin/init"]

Notable Choices

Full systemd: Unlike the simplified example in the overview article, this uses real systemd in privileged containers
SSH key only: Password authentication disabled for realism
Minimal services: Only SSH and essential services enabled
Debian base: Matches common production targets

systemd Strategy: Real Init in Privileged Containers

This implementation takes a different approach than the mock shown in the overview:

Real systemd

privileged: true
cap_add:
  - ALL
volumes:
  - /sys/fs/cgroup:/sys/fs/cgroup:ro

This allows:

Real service management
Actual systemctl commands
Service dependency handling
More realistic testing

Trade-offs

Requires privileged containers
Slightly slower startup
More resource usage
Platform-specific (Linux hosts)

For environments where privileged containers aren't acceptable, fall back to the mock approach.

docker-compose.yml: Modeling Multi-Environment Infrastructure

Each container represents a node, not an application.

Naming Convention

<envID>---<env>-node-<index>

Examples:

101---prod-node-01 (Production node 1)
201---stage-node-01 (Staging node 1)
301---dev-node-01 (Development node 1)

This convention:

Matches inventory hostnames exactly
Keeps logs and audits readable
Allows easy environment filtering
Supports numeric sorting

Complete Environment Example

services:
  # Production Environment Nodes
  prod-node-01:
    build: 
      context: .
      dockerfile: Dockerfile
    container_name: 101---prod-node-01
    hostname: prod-node-01
    privileged: true
    cap_add:
      - ALL
    security_opt:
      - apparmor:unconfined
      - seccomp:unconfined
    sysctls:
      - net.ipv4.ip_forward=1
      - net.ipv4.conf.all.rp_filter=0
    networks:
      acme-test:
        ipv4_address: 172.25.1.11
    ports:
      - "2211:22"
    volumes:
      - ../../.vault/.ssh/customers/c_00000_acme/test/id_ed25519.pub:/root/.ssh/authorized_keys:ro
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    environment:
      - ENVIRONMENT=prod
    restart: unless-stopped

Network Architecture

networks:
  acme-test:
    driver: bridge
    ipam:
      config:
        - subnet: 172.25.0.0/16
          gateway: 172.25.0.1

Each environment gets its own subnet within the larger network:

Test: 172.25.0.0/24
Prod: 172.25.1.0/24
Stage: 172.25.2.0/24
Dev: 172.25.3.0/24
Emergency: 172.25.4.0/24

SSH Key Management for Testing

The test environment uses a pragmatic approach to SSH key distribution:

Directory Structure

.vault/
  .ssh/
    customers/
      c_00000_acme/
        test/
          id_ed25519
          id_ed25519.pub

Key Distribution

volumes:
  - ../../.vault/.ssh/customers/c_00000_acme/test/id_ed25519.pub:/root/.ssh/authorized_keys:ro

Important Security Note

⚠️ Testing Environment Only
This approach of storing SSH keys in .vault/ is designed for ephemeral test environments only.

The .vault/ directory is git-ignored

Keys are test-only and regularly rotated

This mirrors the existing Ansible structure

For production, use proper secret management (HashiCorp Vault, AWS Secrets Manager, etc.)

Port Mapping Strategy

Each node gets a unique SSH port on the host:

Port Assignment Pattern

2<env><node>

Test environment: 2201-2202
Prod environment: 2211-2212
Stage environment: 2221-2222
Dev environment: 2231-2232
Emergency environment: 2241-2242

This enables:

# Direct SSH access
ssh -p 2211 root@localhost

# Ansible inventory configuration
101---prod-node-01 ansible_host=localhost ansible_port=2211

Environment-Specific Configuration

Each environment models different operational characteristics:

Production (101-102)

Full security stack
All hardening roles active
Restrictive firewall rules
Complete monitoring

Stage (201-202)

Production-like configuration
Manual update processes
Testing ground for changes

Development (301-302)

Minimal security
Fast iteration
Developer conveniences

Emergency (401-402)

Hardened baseline
Bastion-only access
Break-glass procedures

Test/Legacy (001-002)

Backward compatibility
Legacy system simulation
Migration testing

Practical Usage Patterns

Starting the Environment

# Start all environments
docker-compose up -d

# Start specific environment
docker-compose up -d prod-node-01 prod-node-02

# View logs
docker-compose logs -f prod-node-01

Ansible Integration

# inventory/c_00000_acme.ini
[prod]
101---prod-node-01 ansible_host=localhost ansible_port=2211
102---prod-node-02 ansible_host=localhost ansible_port=2212

[stage]
201---stage-node-01 ansible_host=localhost ansible_port=2221
202---stage-node-02 ansible_host=localhost ansible_port=2222

Running Tests

# Test connectivity
ansible -i inventory/c_00000_acme.ini all -m ping

# Run playbook
ansible-playbook -i inventory/c_00000_acme.ini site.yml --limit prod

Performance Optimizations

Layer Caching

The Dockerfile is structured for optimal caching:

Package installation (changes rarely)
SSH configuration (static)
systemd cleanup (static)
Service enabling (static)

Parallel Startup

All containers start in parallel:

# Time to full environment
real    0m12.847s
user    0m1.234s
sys     0m0.456s

Resource Limits

deploy:
  resources:
    limits:
      cpus: '0.5'
      memory: 512M
    reservations:
      memory: 256M

Debugging Failed Tests

Container Shell Access

# Get shell in running container
docker exec -it 101---prod-node-01 bash

# Check systemd status
docker exec 101---prod-node-01 systemctl status

# View SSH logs
docker exec 101---prod-node-01 journalctl -u ssh

Ansible Debugging

# Verbose output
ansible-playbook -i inventory/c_00000_acme.ini site.yml -vvv

# Step through tasks
ansible-playbook -i inventory/c_00000_acme.ini site.yml --step

Network Debugging

# Check connectivity between nodes
docker exec 101---prod-node-01 ping 172.25.2.11

# Verify port access
docker exec 101---prod-node-01 nc -zv 172.25.2.11 22

CI/CD Integration

GitHub Actions Example

name: Infrastructure Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Start test infrastructure
        run: |
          cd tests/docker
          docker-compose up -d
          
      - name: Wait for SSH
        run: |
          for port in {2201..2242}; do
            timeout 30 bash -c "until nc -z localhost $port; do sleep 1; done"
          done
          
      - name: Run tests
        run: |
          ansible-playbook -i inventory/c_00000_acme.ini site.yml
          
      - name: Cleanup
        if: always()
        run: docker-compose down -v

Limitations and Boundaries

What This Environment Does NOT Test

Kernel-specific behavior
- Container kernel ≠ VM kernel
- Kernel modules not available
- Some sysctl values read-only
Hardware interactions
- No real network interfaces
- No block devices
- No hardware crypto
Full systemd complexity
- Some unit types unsupported
- Resource limits differ
- cgroup v2 differences
Performance characteristics
- Different I/O patterns
- Memory behavior varies
- CPU scheduling differences

When to Use Higher-Level Testing

Use real VMs or cloud instances for:

Kernel module testing
Network performance validation
Storage subsystem testing
Full security audits

Download the Reference Implementation

Dockerfile

docker-compose.yml

These files provide a complete reference implementation for multi-environment Ansible testing. Adapt them to your specific requirements and constraints.

Best Practices for Production Use

Separate test keys from production keys
- Use dedicated test CA
- Rotate regularly
- Never commit to git
Resource limits in CI
- Set memory limits
- Limit CPU usage
- Use --parallel flag carefully
Container registry
- Build base image once
- Push to registry
- Pull in CI for speed
Test data management
- Reset between test runs
- Use volumes for persistence
- Clean up after tests

Closing Thoughts

Infrastructure testing fails when abstraction hides reality.

This Docker-Compose-based environment keeps abstraction low and behavior explicit. It trades completeness for speed, clarity, and reproducibility — the right trade-off for functional infrastructure testing.

The privileged systemd approach provides more realism than mocks, while the network isolation and SSH access ensure tests exercise the same code paths as production.

For teams serious about Ansible testing, this environment provides a foundation that scales from developer laptops to CI pipelines, catching real issues before they reach real infrastructure.

This implementation has been battle-tested across multiple Ansible projects, from small startups to large enterprises. The patterns shown here represent the sweet spot between realism and practicality for infrastructure testing.