ADR-036 — Testing strategy: pytest with three tiers¶

Status · Accepted Date · 2026-04-27 TA anchor · /components Related RFC · None (engineering choice)

Context¶

Chemigram has heterogeneous testability concerns. The XMP synthesizer is pure logic — fast, deterministic, easy to unit test. The render pipeline invokes a darktable-cli subprocess against real raws — slow, environment-dependent, hard to run in CI. The MCP server adapts the engine — naturally tested via integration tests with real .dtstyle files. A naive single-tier test approach either runs slowly (full E2E in every iteration) or has poor coverage (only fast tests).

Decision¶

Use pytest as the test framework with a three-tier structure:

tests/
├── unit/                    # pure logic, no I/O, no subprocess
├── integration/             # exercise XMP synthesis with real .dtstyle files, temp filesystem
└── e2e/                     # invoke darktable-cli, validate rendered output

Test selection via pytest markers: pytest tests/unit (fast iteration), pytest tests/integration (CI), pytest tests/e2e (local pre-release, not in CI for v1).

Rationale¶

pytest is the de-facto standard for Python testing. Better fixtures, better assertions, better plugins than unittest.
Three tiers separate concerns by speed and dependency:
Unit tests stay under 1 second total. Run on every save during development. Cover synthesizer logic, dtstyle parser, versioning DAG operations, mask registry, manifest validation.
Integration tests run in CI (no darktable required). Cover XMP synthesis with real .dtstyle files in temp directories, full chemigram_core API exercised against real fixtures.
E2E tests require a darktable installation. Cover the actual render pipeline producing JPEGs from raws. Run locally before releases; not in CI for v1.
Coverage targets are pragmatic, not strict. High coverage on the synthesizer (pure logic, easy). Lower coverage acceptable on subprocess-handling code (focus on integration tests for those). No "must hit 90%" gate.

Alternatives considered¶

unittest: stdlib and bulletproof but verbose; pytest's fixtures and assertion rewriting genuinely improve test ergonomics.
Single-tier (all tests run together): simpler structure but forces every test run to be slow; loses the "fast iteration" workflow.
Two-tier (unit + integration only): considered but the E2E tier is genuinely different (requires darktable), worth separating.
Snapshot testing of rendered JPEGs: considered for E2E but too brittle (Apple Silicon vs Linux render differences, darktable version drift); spot-check assertions on rendered JPEGs (file exists, expected dimensions, color near expected) are more durable.

Consequences¶

Positive: - Fast unit feedback loop during development - CI runs cover the engine without darktable as a CI dependency - E2E tier validates the full pipeline before releases - Clear conventions about where each test type lives

Negative: - Three test directories adds a small amount of structure to learn (mitigation: documented in CONTRIBUTING.md) - E2E tests not in CI means a darktable upgrade can break things invisibly until pre-release validation (mitigation: pre-release E2E run is part of the release checklist)

Implementation notes¶

pyproject.toml configures pytest under [tool.pytest.ini_options] with markers, addopts, and testpaths. Fixtures shared across tiers go in tests/conftest.py; tier-specific fixtures in tests/<tier>/conftest.py. Pre-release script (scripts/pre-release-check.sh) runs all three tiers including E2E.