ADR-022: Flaky Test Defense¶
- Status: Accepted
- Date: 2026-01-11
- Authors: Podcast Scraper Team
- Related RFCs: RFC-025
Context & Problem Statement¶
ML-based tests (Whisper, NER) and complex integration tests often suffer from intermittent "flakiness" due to timing, hardware variation, or model non-determinism. This causes CI to fail randomly, wasting developer time.
Decision¶
We implement a Flaky Test Defense strategy:
- Automated Retries: Critical tests are run with
pytest-rerunfailures(max 3 retries) in CI. - Visibility: All failures (including successful retries) are recorded in JUnit XML and surfaced in the dashboard.
- Thresholds: Any test failing >10% of the time is tagged as
@pytest.mark.flakyand moved to a separate health-tracking tier.
Rationale¶
- CI Stability: Prevents "false alarms" from blocking PRs.
- Actionable Data: Instead of just ignoring flakiness, we track it over time to identify which modules need stabilization work.
Alternatives Considered¶
- Ignore Failures: Rejected as it compromises the integrity of the test suite.
- Increase Timeouts: Used as a secondary measure, but doesn't solve model non-determinism.
Consequences¶
- Positive: Stable PR builds; clear visibility into test health.
- Negative: Can hide real (but intermittent) bugs if retries are used too aggressively.