Metrics documentation and dashboard — redesign (WIP)¶
Status: Partially implemented (2026-03): dashboard-data.json, consolidate_dashboard_data.py, unified HTML bundle-first fetch + legacy fallback, workflows + make metrics-preview-check, doc/nav updates. 2026-03-31: slowest + flaky metrics pipeline fixes documented in ci/METRICS.md (see Slowest / flaky fixes and Verify metrics fixes locally).
Goal: One robust story for test & CI metrics (dashboard + local preview), clear pairing of the two CI docs you care about, and room for additional pages where needed — plus validation so bad local files fail fast instead of “mystery charts.”
1. Documentation pairs (corrected)¶
Primary pair (this redesign focuses here):
| Page | Role |
|---|---|
| ci/METRICS.md | Unified dashboard (GitHub Pages): CI vs nightly snapshots, latest-*.json, history-*.jsonl, what the charts/cards mean |
| ci/CODE_QUALITY_TRENDS.md | Wily + radon over git history: local make complexity-track, per-file trends, how this differs from the dashboard’s “code quality history” chart |
They already cross-link; the redesign keeps two separate pages (or more if we add a tiny index) — no need to merge them into one long doc.
Separate product (leave as its own page):
| Page | Role |
|---|---|
| guides/METRICS_GUIDE.md | Experiment / eval metrics (run_experiment, metrics.json, scorer) — not the CI dashboard |
Problem this WIP still solves: The dashboard depends on four files and strict JSONL; local copies often break that contract. That is independent of how many MkDocs pages we have.
2. Data sources we actually have (test / CI dashboard)¶
| Source | Where it lives | Format | Notes |
|---|---|---|---|
| GitHub Pages | gh-pages branch metrics/ |
index.html + latest-*.json + history-*.jsonl |
Canonical public view; JSONL one compact object per line |
| CI workflow output | Artifacts: metrics zip, pytest-* JSON |
Per-job + merged pytest.json |
Metrics job generates latest-ci.json; history merged from prior gh-pages |
| Local preview | artifacts/dashboard-preview/ |
Copy of CI bundle + metrics/ nightly |
Built by build_local_metrics_preview.sh |
| Local dev copies | metrics/*.json, *.jsonl, index.html |
Often wrong (pretty-printed blob in .jsonl) |
Ignored by git on main; populate with make fetch-* / preview builds — see ci/METRICS.md Local metrics/ in your clone |
Nightly vs CI is a logical split (two latest-* / history-* pairs), not two different schemas — same generate_metrics.py shape.
3. Root causes of pain (so we fix the right layer)¶
- JSONL contract is strict; humans/tools save one pretty JSON into
history-*.jsonl→ parser recovers one object → one chart point. - Local preview mixes downloaded
run-*(CI) withmetrics/(nightly) without a single validated bundle step. - Browser loads four URLs; cache + toggle order caused confusion; partially mitigated with
cache: 'no-store'and load sequencing. - Optional nav clarity: MkDocs nav could group CI metrics (METRICS + CODE_QUALITY_TRENDS) next to each other so the pair is obvious; experiment guide stays under Guides.
4. Proposed documentation information architecture¶
Keep separate pages (your preference: two or more is fine).
| Change | Detail |
|---|---|
| METRICS.md | Optional subtitle or intro line: “Companion: Code quality trends (wily / git history).” Already points there for wily; can tighten once. |
| CODE_QUALITY_TRENDS.md | Already contrasts itself with METRICS.md; keep. |
| METRICS_GUIDE.md | One-line pointer at top to Test dashboard — done. |
| mkdocs.yml | Optional: nest under CI or rename nav labels for clarity, e.g. “Test dashboard (GitHub Pages)” and “Code quality trends (wily)”. |
Optional: Short docs/ci/README.md or index bullet list linking METRICS + CODE_QUALITY_TRENDS only — only if you want a third navigation hop without merging content.
5. Proposed technical architecture (robust preview + parity with CI)¶
5.1 Normalization pipeline (single entry point)¶
Introduce a Python step used by both local preview and (optionally) CI before deploy:
Inputs:
- CI: newest
artifacts/ci-metrics-runs/run-*/ormetrics/latest-ci.json+history-ci.jsonl - Nightly:
metrics/latest-nightly.json+history-nightly.jsonl
Behavior:
- Load history with
metrics_jsonl.load_metrics_history(already tolerates some legacy shapes). - If history parses to ≤1 record and file has multiple physical lines of
{-heavy content → emit warning or error: “File looks like pretty JSON, not JSONL.” - Emit a single artifact for the browser, e.g.
dashboard-data.json:
{
"generated_at": "...",
"ci": { "latest": {...}, "history": [ ... ] },
"nightly": { "latest": {...}, "history": [ ... ] }
}
- Dashboard JS performs one
fetch('dashboard-data.json')and switches source in memory — no four-file drift, no JSONL parsing in the browser for history.
CI / gh-pages: Same generator runs in workflow; deploy dashboard-data.json next to index.html. Keep individual latest-*.json / history-*.jsonl for backward compatibility and append_metrics_history_line.py, or deprecate after one release (decision below).
5.2 Validation target¶
make metrics-preview-check (name TBD):
- Runs after normalization (or as part of
build_local_metrics_preview) - Exits non-zero if:
history-*.jsonlfails JSONL repair contract, or- parsed history count ≠ expected minimum for “chart smoke” (optional flag)
5.3 Slowest tests / pytest shards (updated 2026-03-31)¶
Implemented (not only planned):
- Shard-first pytest JSON for slowest when shards exist; plus all
junit*.xmlunderreports/are always parsed and merged (dedupe by test name, keep max duration). Avoids “5 slow tests only” when JSON has sparse xdist timings but JUnit has fulltestcase@timedata. - CI (
python-app.yml): per-job--junitxml=reports/junit-….xml; artifacts include those files;coverage-unifiedcopiesjunit*.xmlfrom downloaded coverage artifacts intoreports/beforegenerate_metrics.py. - Nightly (
nightly.yml): same--junitxmlpattern on unit / integration / E2E so merged nightlyreports/matches CI behavior. generate_metrics.py:--slowest-top-n(default 10, aligned with the dashboard table); log line prints how many slow rows were written.
Flaky: extract_test_metrics_from_reports_dir merges all pytest-*.json + pytest.json by nodeid so rerun-pass signals are not lost when merged JSON looks like a clean pass.
Local verification: See ci/METRICS.md § Verify metrics fixes locally — generate_metrics.py on a real reports/ tree + the two unit test modules above.
6. Phased rollout (recommended)¶
| Phase | Scope | Risk |
|---|---|---|
| P0 | Small doc tweaks: METRICS ↔ CODE_QUALITY_TRENDS cross-promotion; optional one-line pointer on METRICS_GUIDE; optional mkdocs nav labels | Low |
| P1 | consolidate_dashboard_data.py + extend build_local_metrics_preview.sh to write dashboard-data.json; HTML uses single fetch (feature-flag or cutover) |
Medium |
| P2 | Wire same generator into python-app.yml / gh-pages deploy; optional retention of old files |
Medium |
| P3 | make metrics-preview-check in CI optional job or pre-commit |
Low |
7. Open decisions¶
- Single-file cutover: Drop four separate fetches on GitHub Pages immediately vs keep both during transition.
- History source of truth: Continue appending JSONL on CI vs move to a small SQLite / parquet (probably not needed).
- Experiment metrics: Leave in eval pipeline only, or add a future read-only panel in the same HTML (out of scope unless product asks).
8. Success criteria¶
- Local:
make build-metrics-dashboard-previewwarns or fails on badhistory-*.jsonlshape. - Local: Nightly chart shows as many points as parsed history records (given real gh-pages history).
- Docs: Under CI/CD, Test dashboard (GitHub Pages) vs Code quality trends (wily) read as a deliberate pair; experiment metrics stay under Guides (
METRICS_GUIDE.md).
9. Checklist: what to run locally after pulling fixes¶
| Check | Command / action |
|---|---|
| Slowest + flaky extraction | python scripts/dashboard/generate_metrics.py --reports-dir reports --output /tmp/m.json with a populated reports/ (see METRICS.md) |
| Unit tests | python -m pytest tests/unit/scripts/dashboard/test_generate_metrics_slowest.py tests/unit/scripts/dashboard/test_generate_metrics_flaky.py -q --no-cov |
| Dashboard HTML | make build-metrics-dashboard-preview (with metrics/*.jsonl / fetched bundles as you already use) |
| JSONL shape | make metrics-preview-check |