ADR-022: Flaky Test Defense¶

Status: Accepted
Date: 2026-01-11
Authors: Podcast Scraper Team
Related RFCs: RFC-025

Context & Problem Statement¶

ML-based tests (Whisper, NER) and complex integration tests often suffer from intermittent "flakiness" due to timing, hardware variation, or model non-determinism. This causes CI to fail randomly, wasting developer time.

Decision¶

We implement a Flaky Test Defense strategy:

Automated Retries: Critical tests are run with pytest-rerunfailures (max 3 retries) in CI.
Visibility: All failures (including successful retries) are recorded in JUnit XML and surfaced in the dashboard.
Thresholds: Any test failing >10% of the time is tagged as @pytest.mark.flaky and moved to a separate health-tracking tier.

Rationale¶

CI Stability: Prevents "false alarms" from blocking PRs.
Actionable Data: Instead of just ignoring flakiness, we track it over time to identify which modules need stabilization work.

Alternatives Considered¶

Ignore Failures: Rejected as it compromises the integrity of the test suite.
Increase Timeouts: Used as a secondary measure, but doesn't solve model non-determinism.

Consequences¶

Positive: Stable PR builds; clear visibility into test health.
Negative: Can hide real (but intermittent) bugs if retries are used too aggressively.

References¶

RFC-025: Test Metrics and Health Tracking