Data Source:

📊 Test Metrics Dashboard

⚠️ Alerts

Loading alerts...

📈 Test run history

How CI tracks this: each point is one GitHub Actions metrics deploy. The workflow pulls prior rows from gh-pages into history-ci.jsonl or history-nightly.jsonl, generates latest-*.json, then appends one compact JSON line for that run. The chart uses date + short commit on the x-axis so several runs on the same calendar day do not collapse into one dot. Left: total pytest wall time (s) for that run. Right: combined line coverage (%). Summary cards above show the latest snapshot only (test counts, flaky count, docstrings are not plotted here).

📊 Code quality history

How CI tracks this: same history-*.jsonl points as Test run history—one row per metrics deploy. For each run, generate_metrics.py reads radon outputs in reports/ (complexity.json, maintainability.json) and stores package-wide averages in the snapshot. Left: mean cyclomatic complexity. Right: mean maintainability index (MI). This line chart is not wily’s per-commit file graph; for that, run make complexity-track locally—see CI → Code quality trends. Docstring %, dead-code count, and spelling from the same run appear in the summary cards only, not on this chart.

🚀 Sample pipeline run (history)

Optional metrics from collect_pipeline_metrics.py when that step succeeds (often 1 episode in CI).

🐌 Slowest Tests (Top 10)

⚠️ Flaky Tests

Tests that failed initially but passed on rerun