Inside a Docker-Compose-Based Test Environment for Ansible IaC
Building OS-Like Test Nodes for Functional Infrastructure Testing
Functional Infrastructure Testing (FIT) for Ansible lives or dies with the quality of its test substrate.
If your test nodes do not resemble real systems closely enough, you only validate syntax and happy paths. This article provides a technical deep dive into the Docker-based test environment used for FIT, explaining why each decision was made, what it enables, and where its deliberate limits are.
This article complements the higher-level FIT concept and focuses purely on implementation mechanics.
Purpose of the Test Environment
The goal of this environment is not to test containers.
It is designed to provide:
- OS-like targets
- reachable via real SSH
- driven by real inventories
- capable of running real Ansible roles
- fast enough for local and CI execution
In short:
Test infrastructure behavior, not container behavior.
Why Docker Compose (and not Molecule)
Docker Compose was chosen deliberately for its transparency and control:
Direct Control
- Predictable networking with static IPs
- Explicit port mappings you can see
- Simple orchestration without magic
- No hidden abstractions or wrappers
Real-World Fidelity
Unlike Molecule, this setup:
- Does not invent a new testing DSL
- Does not wrap Ansible execution
- Does not hide network topology
- Does not special-case inventories
Everything that Ansible sees looks like a real deployment.
The Dockerfile: Building an OS-Like Ansible Target
The Dockerfile defines a minimal Debian-based node that behaves like a remote server from Ansible's point of view.
Design Goals
- SSH-first access (like production)
- Python available for Ansible modules
- Minimal userspace for speed
- No init system assumptions
- Fast build and startup time
Key Implementation
FROM debian:bookworm
ENV DEBIAN_FRONTEND=noninteractive
# Install required packages
RUN apt-get update && \
apt-get install -y \
systemd \
systemd-sysv \
openssh-server \
sudo \
python3 \
python3-apt \
curl \
ca-certificates \
gnupg \
lsb-release \
iproute2 \
procps && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Configure SSH
RUN mkdir -p /var/run/sshd && \
mkdir -p /root/.ssh && \
chmod 700 /root/.ssh && \
sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
# Remove unnecessary systemd services for container
RUN rm -f /lib/systemd/system/multi-user.target.wants/* && \
rm -f /etc/systemd/system/*.wants/* && \
rm -f /lib/systemd/system/local-fs.target.wants/* && \
rm -f /lib/systemd/system/sockets.target.wants/*udev* && \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*
# Enable SSH service
RUN systemctl enable ssh.service
EXPOSE 22
CMD ["/sbin/init"]
Notable Choices
- Full systemd: Unlike the simplified example in the overview article, this uses real systemd in privileged containers
- SSH key only: Password authentication disabled for realism
- Minimal services: Only SSH and essential services enabled
- Debian base: Matches common production targets
systemd Strategy: Real Init in Privileged Containers
This implementation takes a different approach than the mock shown in the overview:
Real systemd
privileged: true
cap_add:
- ALL
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
This allows:
- Real service management
- Actual systemctl commands
- Service dependency handling
- More realistic testing
Trade-offs
- Requires privileged containers
- Slightly slower startup
- More resource usage
- Platform-specific (Linux hosts)
For environments where privileged containers aren't acceptable, fall back to the mock approach.
docker-compose.yml: Modeling Multi-Environment Infrastructure
Each container represents a node, not an application.
Naming Convention
<envID>---<env>-node-<index>
Examples:
101---prod-node-01(Production node 1)201---stage-node-01(Staging node 1)301---dev-node-01(Development node 1)
This convention:
- Matches inventory hostnames exactly
- Keeps logs and audits readable
- Allows easy environment filtering
- Supports numeric sorting
Complete Environment Example
services:
# Production Environment Nodes
prod-node-01:
build:
context: .
dockerfile: Dockerfile
container_name: 101---prod-node-01
hostname: prod-node-01
privileged: true
cap_add:
- ALL
security_opt:
- apparmor:unconfined
- seccomp:unconfined
sysctls:
- net.ipv4.ip_forward=1
- net.ipv4.conf.all.rp_filter=0
networks:
acme-test:
ipv4_address: 172.25.1.11
ports:
- "2211:22"
volumes:
- ../../.vault/.ssh/customers/c_00000_acme/test/id_ed25519.pub:/root/.ssh/authorized_keys:ro
- /sys/fs/cgroup:/sys/fs/cgroup:ro
environment:
- ENVIRONMENT=prod
restart: unless-stopped
Network Architecture
networks:
acme-test:
driver: bridge
ipam:
config:
- subnet: 172.25.0.0/16
gateway: 172.25.0.1
Each environment gets its own subnet within the larger network:
- Test:
172.25.0.0/24 - Prod:
172.25.1.0/24 - Stage:
172.25.2.0/24 - Dev:
172.25.3.0/24 - Emergency:
172.25.4.0/24
SSH Key Management for Testing
The test environment uses a pragmatic approach to SSH key distribution:
Directory Structure
.vault/
.ssh/
customers/
c_00000_acme/
test/
id_ed25519
id_ed25519.pub
Key Distribution
volumes:
- ../../.vault/.ssh/customers/c_00000_acme/test/id_ed25519.pub:/root/.ssh/authorized_keys:ro
Important Security Note
⚠️ Testing Environment Only
This approach of storing SSH keys in.vault/is designed for ephemeral test environments only.
- The
.vault/directory is git-ignored- Keys are test-only and regularly rotated
- This mirrors the existing Ansible structure
For production, use proper secret management (HashiCorp Vault, AWS Secrets Manager, etc.)
Port Mapping Strategy
Each node gets a unique SSH port on the host:
Port Assignment Pattern
2<env><node>
- Test environment:
2201-2202 - Prod environment:
2211-2212 - Stage environment:
2221-2222 - Dev environment:
2231-2232 - Emergency environment:
2241-2242
This enables:
# Direct SSH access
ssh -p 2211 root@localhost
# Ansible inventory configuration
101---prod-node-01 ansible_host=localhost ansible_port=2211
Environment-Specific Configuration
Each environment models different operational characteristics:
Production (101-102)
- Full security stack
- All hardening roles active
- Restrictive firewall rules
- Complete monitoring
Stage (201-202)
- Production-like configuration
- Manual update processes
- Testing ground for changes
Development (301-302)
- Minimal security
- Fast iteration
- Developer conveniences
Emergency (401-402)
- Hardened baseline
- Bastion-only access
- Break-glass procedures
Test/Legacy (001-002)
- Backward compatibility
- Legacy system simulation
- Migration testing
Practical Usage Patterns
Starting the Environment
# Start all environments
docker-compose up -d
# Start specific environment
docker-compose up -d prod-node-01 prod-node-02
# View logs
docker-compose logs -f prod-node-01
Ansible Integration
# inventory/c_00000_acme.ini
[prod]
101---prod-node-01 ansible_host=localhost ansible_port=2211
102---prod-node-02 ansible_host=localhost ansible_port=2212
[stage]
201---stage-node-01 ansible_host=localhost ansible_port=2221
202---stage-node-02 ansible_host=localhost ansible_port=2222
Running Tests
# Test connectivity
ansible -i inventory/c_00000_acme.ini all -m ping
# Run playbook
ansible-playbook -i inventory/c_00000_acme.ini site.yml --limit prod
Performance Optimizations
Layer Caching
The Dockerfile is structured for optimal caching:
- Package installation (changes rarely)
- SSH configuration (static)
- systemd cleanup (static)
- Service enabling (static)
Parallel Startup
All containers start in parallel:
# Time to full environment
real 0m12.847s
user 0m1.234s
sys 0m0.456s
Resource Limits
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
memory: 256M
Debugging Failed Tests
Container Shell Access
# Get shell in running container
docker exec -it 101---prod-node-01 bash
# Check systemd status
docker exec 101---prod-node-01 systemctl status
# View SSH logs
docker exec 101---prod-node-01 journalctl -u ssh
Ansible Debugging
# Verbose output
ansible-playbook -i inventory/c_00000_acme.ini site.yml -vvv
# Step through tasks
ansible-playbook -i inventory/c_00000_acme.ini site.yml --step
Network Debugging
# Check connectivity between nodes
docker exec 101---prod-node-01 ping 172.25.2.11
# Verify port access
docker exec 101---prod-node-01 nc -zv 172.25.2.11 22
CI/CD Integration
GitHub Actions Example
name: Infrastructure Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Start test infrastructure
run: |
cd tests/docker
docker-compose up -d
- name: Wait for SSH
run: |
for port in {2201..2242}; do
timeout 30 bash -c "until nc -z localhost $port; do sleep 1; done"
done
- name: Run tests
run: |
ansible-playbook -i inventory/c_00000_acme.ini site.yml
- name: Cleanup
if: always()
run: docker-compose down -v
Limitations and Boundaries
What This Environment Does NOT Test
-
Kernel-specific behavior
- Container kernel ≠ VM kernel
- Kernel modules not available
- Some sysctl values read-only
-
Hardware interactions
- No real network interfaces
- No block devices
- No hardware crypto
-
Full systemd complexity
- Some unit types unsupported
- Resource limits differ
- cgroup v2 differences
-
Performance characteristics
- Different I/O patterns
- Memory behavior varies
- CPU scheduling differences
When to Use Higher-Level Testing
Use real VMs or cloud instances for:
- Kernel module testing
- Network performance validation
- Storage subsystem testing
- Full security audits
Download the Reference Implementation
These files provide a complete reference implementation for multi-environment Ansible testing. Adapt them to your specific requirements and constraints.
Best Practices for Production Use
-
Separate test keys from production keys
- Use dedicated test CA
- Rotate regularly
- Never commit to git
-
Resource limits in CI
- Set memory limits
- Limit CPU usage
- Use --parallel flag carefully
-
Container registry
- Build base image once
- Push to registry
- Pull in CI for speed
-
Test data management
- Reset between test runs
- Use volumes for persistence
- Clean up after tests
Closing Thoughts
Infrastructure testing fails when abstraction hides reality.
This Docker-Compose-based environment keeps abstraction low and behavior explicit. It trades completeness for speed, clarity, and reproducibility — the right trade-off for functional infrastructure testing.
The privileged systemd approach provides more realism than mocks, while the network isolation and SSH access ensure tests exercise the same code paths as production.
For teams serious about Ansible testing, this environment provides a foundation that scales from developer laptops to CI pipelines, catching real issues before they reach real infrastructure.
This implementation has been battle-tested across multiple Ansible projects, from small startups to large enterprises. The patterns shown here represent the sweet spot between realism and practicality for infrastructure testing.