RFC-026: Metrics Consumption and Dashboards¶
- Status: ✅ Completed (Phases 0-3 complete, Phase 4 extracted to RFC-040)
- Related ADRs:
- ADR-023: Public Operational Metrics
- Authors:
- Stakeholders: Maintainers, developers, CI/CD pipeline maintainers
- Completed: 2026-01-07
- Related PRDs:
docs/prd/PRD-001-transcript-pipeline.md(core pipeline)- Related RFCs:
docs/rfc/RFC-025-test-metrics-and-health-tracking.md(metrics collection - prerequisite)docs/rfc/RFC-024-test-execution-optimization.md(test execution optimization)docs/rfc/RFC-040-automated-metrics-alerts.md(Phase 4 - extracted for independent evolution)- Related Documents:
🚨 DEPENDENCY NOTE:
RFC-026 assumes RFC-024 and RFC-025 are implemented.
This RFC builds on the test execution optimization (RFC-024) and metrics collection (RFC-025) foundations. Ensure those RFCs are implemented before proceeding with metrics consumption and dashboards.
docs/architecture/TESTING_STRATEGY.md- Overall testing strategy and test categoriesdocs/guides/DEVELOPMENT_GUIDE.md- Development workflow and testing requirements.github/workflows/python-app.yml- CI test jobs
Abstract¶
This RFC defines a strategy for consuming and visualizing test metrics to enable quick deviation detection and trend analysis. The strategy focuses on:
- Easy access: Multiple consumption methods (browser, API, PR checks)
- Quick detection: Identify deviations in < 60 seconds
- Visual dashboards: Human-readable charts and alerts
- Machine-readable API: JSON endpoints for automation
- Zero infrastructure: Uses GitHub Pages (free, no setup)
Key Principle: Metrics are only valuable if they can be consumed quickly. Enable < 60 second deviation detection through multiple access patterns.
System Overview¶
This RFC is part of a three-RFC system (RFC-024, RFC-025, RFC-026) that optimizes test execution, metrics collection, and consumption. The complete flow:
├─ PR: Fast tests (Tier 0 + Tier 1 fast)
├─ Main: All tests (Tier 0 + Tier 1 + Tier 2)
└─ Nightly: Full suite + comprehensive metrics
↓
Artifacts Generated
├─ JUnit XML (test results, timing)
├─ Coverage reports (XML, HTML, terminal)
└─ JSON metrics (structured data)
↓
Consumption Methods
├─ Job Summary (PR authors, 0s)
├─ metrics.json (automation, 5s)
└─ Dashboard (maintainers, 10s)
```yaml
**See also:**
- RFC-024: Test execution optimization (pytest + markers → CI tiers)
- RFC-025: Metrics collection (artifacts generation)
## Core Principles
These principles are shared across RFC-024, RFC-025, and RFC-026:
- **Developer flow > completeness** - Fast feedback loops protect developer state and enable rapid iteration
- **Metrics must be cheap to collect** - Automated collection with zero manual work required
- **Humans consume summaries, machines consume JSON** - Job summaries for quick checks, JSON API for automation
## Problem Statement
**Current Issues:**
1. **No Easy Metrics Access**
- Metrics exist in CI artifacts but require manual download
- No public dashboard for quick checks
- No machine-readable API for automation
- Historical trends not easily accessible
2. **Slow Deviation Detection**
- Manual comparison of artifacts takes minutes
- No automatic alerts for regressions
- Difficult to spot trends without visualization
- No quick way to check if metrics are degrading
**Impact:**
- Developers don't check metrics regularly (too much effort)
- Regressions go undetected until they become severe
- No visibility into long-term trends
- Difficult to make data-driven optimization decisions
## Goals
### Primary Goal
**Quick Metrics Consumption:**
- Enable deviation detection in **< 60 seconds**
- Multiple access methods (browser, API, PR checks)
- Automatic alerts for regressions
- Visual dashboards for trend analysis
- Zero infrastructure overhead (GitHub Pages)
### Success Criteria
- ✅ Metrics accessible via public URL (< 10 seconds to view)
- ✅ JSON API for automation (< 5 seconds to query)
- ✅ Deviation detection in < 60 seconds
- ✅ Visual dashboards with trend charts
- ✅ Automatic alerts for regressions
- ✅ Historical data available for analysis
## Solution: GitHub Pages Unified Metrics Dashboard
**Approach:** Publish metrics to GitHub Pages as both human-readable unified dashboard and machine-readable JSON.
**Benefits:**
- ✅ **Always accessible**: Public URL (e.g., `https://chipi.github.io/podcast_scraper/metrics/`)
- ✅ **No authentication**: Anyone can view metrics
- ✅ **Auto-updated**: Metrics published after each CI run and nightly schedule
- ✅ **Unified interface**: Single dashboard with data source selector (CI or Nightly)
- ✅ **Quick consumption**: View dashboard in browser (< 10 seconds)
- ✅ **Machine-readable**: JSON API for automation (separate files for CI and Nightly)
- ✅ **Historical trends**: Visual charts showing deviations (last 30 runs per source)
- ✅ **Zero infrastructure**: Uses GitHub Pages (free, no setup)
**Dashboard Features:**
- **Data Source Selector**: Dropdown to switch between CI Metrics and Nightly Metrics
- **Auto-detection**: Automatically loads available data source on page load
- **Dynamic Loading**: JavaScript fetches appropriate JSON files based on selection
- **Same Features**: All dashboard features work for both data sources (charts, alerts, slowest tests, etc.)
## Implementation Strategy
### Phase 0: Minimum Viable Consumption (Mandatory, Before Dashboards)
**🚨 CRITICAL: This phase must be completed before any dashboard work.**
**Goal:** Enable basic metrics consumption without visual dashboards.
**Deliverables:**
- ✅ **GitHub Actions job summaries** - Display key metrics in PR checks (0 seconds to view)
- ✅ **`metrics/latest.json` published** - Machine-readable metrics available via GitHub Pages
- ❌ **No charts** - Visual dashboards are not required in this phase
- ❌ **No history UI** - Historical visualization is not required in this phase
**Rationale:**
- **Summaries ≫ dashboards** - Job summaries provide immediate value with zero infrastructure
- **Dashboards are earned, not required** - Visual dashboards come after basic consumption is proven
- **Focus on consumption, not visualization** - Enable metrics access first, add visuals later
**Success Criteria:**
- ✅ Job summaries show key metrics (runtime, coverage, pass rate) in every PR
- ✅ `metrics/latest-ci.json` and `metrics/latest-nightly.json` are accessible via public URL
- ✅ Metrics can be consumed via `curl` + `jq` in < 5 seconds
- ✅ No visual dashboard required
**Status:** 🚧 To Be Implemented (prerequisite for all other phases)
### 1. Metrics JSON API (Machine-Readable)
**Locations:**
- CI Metrics: `https://chipi.github.io/podcast_scraper/metrics/latest-ci.json`
- Nightly Metrics: `https://chipi.github.io/podcast_scraper/metrics/latest-nightly.json`
- CI History: `https://chipi.github.io/podcast_scraper/metrics/history-ci.jsonl`
- Nightly History: `https://chipi.github.io/podcast_scraper/metrics/history-nightly.jsonl`
**Format:**
```json
{
"timestamp": "2024-12-28T20:00:00Z",
"commit": "def456",
"branch": "main",
"workflow_run": "https://github.com/chipi/podcast_scraper/actions/runs/12345",
"metrics": {
"runtime": {
"unit_tests": 2.1,
"integration_tests": 33.6,
"e2e_tests": 0,
"total": 35.7
},
"test_health": {
"total": 250,
"passed": 248,
"failed": 0,
"skipped": 2,
"flaky": 0,
"pass_rate": 0.992
},
"coverage": {
"overall": 65.3,
"by_module": {
"podcast_scraper": 65.3,
"podcast_scraper.workflow": 72.1
}
},
"performance": {
"tests_per_second": 7.0,
"parallel_efficiency": 0.95
},
"slowest_tests": [
{"name": "test_full_pipeline", "duration": 12.3},
{"name": "test_transcription", "duration": 8.7}
]
},
"trends": {
"runtime_change": "+0.5s",
"coverage_change": "+0.2%",
"test_count_change": "+1"
},
"alerts": [
{
"type": "regression",
"metric": "runtime",
"severity": "warning",
"message": "Runtime increased by 15% compared to last 5 runs"
}
]
}
```bash
# Fetch CI metrics
curl -s https://chipi.github.io/podcast_scraper/metrics/latest-ci.json | jq '.metrics.runtime.total'
# Fetch nightly metrics
curl -s https://chipi.github.io/podcast_scraper/metrics/latest-nightly.json | jq '.metrics.runtime.total'
# Check for regressions (CI)
curl -s https://chipi.github.io/podcast_scraper/metrics/latest-ci.json | jq '.alerts[]'
# Check for regressions (Nightly)
curl -s https://chipi.github.io/podcast_scraper/metrics/latest-nightly.json | jq '.alerts[]'
```text
- **Current metrics** (latest run)
- **Trend charts** (last 30 runs)
- **Deviation alerts** (highlighted in red/yellow)
- **Quick comparison** (vs. previous run, vs. baseline)
- **Slowest tests** (top 10)
- **Coverage trends** (visual chart)
**Visual Elements:**
- ✅ Green: Metrics within normal range
- ⚠️ Yellow: Minor deviation (< 10%)
- 🔴 Red: Significant deviation (> 10%)
- 📊 Charts: Line graphs for trends
**Example Dashboard:**
```html
<!-- Simplified example -->
<div class="metrics-dashboard">
<h1>Test Metrics Dashboard</h1>
<div class="current-metrics">
<h2>Latest Run (2024-12-28 20:00:00)</h2>
<div class="metric">
<span>Runtime:</span> 35.7s <span class="trend up">+0.5s</span>
</div>
<div class="metric">
<span>Coverage:</span> 65.3% <span class="trend up">+0.2%</span>
</div>
<div class="metric">
<span>Tests:</span> 250 <span class="status pass">248 passed</span>
</div>
</div>
<div class="trends">
<h2>Trends (Last 30 Runs)</h2>
<canvas id="runtime-chart"></canvas>
<canvas id="coverage-chart"></canvas>
</div>
<div class="alerts">
<h2>Alerts</h2>
<div class="alert warning">
Runtime increased by 15% compared to baseline
</div>
</div>
</div>
```json
{"timestamp":"2024-12-28T19:00:00Z","commit":"abc123","runtime":35.2,"coverage":65.1,"passed":248}
{"timestamp":"2024-12-28T20:00:00Z","commit":"def456","runtime":35.7,"coverage":65.3,"passed":248}
```text
- Can append without rewriting entire file
## 4. GitHub Actions Integration
**Workflow Step:**
```yaml
- name: Generate and publish metrics
if: always() && github.ref == 'refs/heads/main'
run: |
# Extract metrics from JUnit XML and coverage
python scripts/dashboard/generate_metrics.py \
--junit reports/junit.xml \
--coverage reports/coverage.xml \
--output metrics/
# Generate HTML dashboard
python scripts/dashboard/generate_dashboard.py \
--metrics metrics/latest.json \
--history metrics/history.jsonl \
--output metrics/index.html
```text
# Publish to gh-pages branch
```bash
git checkout gh-pages || git checkout --orphan gh-pages
git add metrics/
git commit -m "Update metrics: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
git push origin gh-pages
```text
3. Check trend charts for spikes
### Method 2: JSON API (5 seconds)
```bash
# Check latest metrics
curl -s https://chipi.github.io/podcast_scraper/metrics/latest.json | jq '.alerts'
# Compare with previous run
curl -s https://chipi.github.io/podcast_scraper/metrics/latest.json | jq '.trends'
```text
- No external access needed
## Method 4: Automated Alerts (0 seconds)
- GitHub Actions can comment on PRs with metric changes
- Slack/Discord webhooks for significant deviations
- Email notifications (optional)
## Deviation Detection Logic
### Thresholds
- **Minor deviation**: 5-10% change
- **Significant deviation**: > 10% change
- **Critical deviation**: > 20% change
### Alert Behavior
**🚨 CRITICAL: Alerts are informational initially (no CI failures)**
- Alerts are displayed in job summaries and dashboards
- Alerts do NOT cause CI failures or block merges
- Alerts are informational only - they highlight potential issues for review
- This prevents teams from fearing noise and disabling alerts
**Future Enhancement:** After alerts are proven useful and accurate, consider optional CI gates (opt-in per team).
### Metrics to Monitor
1. **Runtime**: Compare against last 5 runs (median)
2. **Coverage**: Compare against last 10 runs (trend)
3. **Test count**: Alert if tests added/removed
4. **Slowest tests**: Alert if new slow tests appear
5. **Flaky tests**: Alert if flaky count increases
### Example Detection
```python
def detect_deviations(current, history):
alerts = []
# Runtime deviation
median_runtime = median([r['runtime'] for r in history[-5:]])
if current['runtime'] > median_runtime * 1.1:
alerts.append({
"type": "regression",
"metric": "runtime",
"severity": "warning",
"message": f"Runtime increased by {((current['runtime'] / median_runtime) - 1) * 100:.1f}%"
})
# Coverage drop
avg_coverage = mean([r['coverage'] for r in history[-10:]])
if current['coverage'] < avg_coverage - 1.0:
alerts.append({
"type": "regression",
"metric": "coverage",
"severity": "error",
"message": f"Coverage dropped by {avg_coverage - current['coverage']:.1f}%"
})
```text
return alerts
```python
- Simple deviation detection
**Deliverables:**
- `scripts/dashboard/generate_metrics.py` - Extract metrics from JUnit/coverage
- GitHub Actions step to publish to gh-pages
- `metrics/latest-ci.json` and `metrics/latest-nightly.json` accessible via GitHub Pages
### Phase 2: HTML Dashboard (2-3 days)
- Generate HTML dashboard
- Add trend charts (using Chart.js or similar)
- Visual alerts and highlights
**Deliverables:**
- `scripts/dashboard/generate_dashboard.py` - Generate HTML dashboard
- `metrics/index.html` with charts and alerts
- CSS styling for visual indicators
### Phase 3: Historical Tracking (1-2 days)
- Append to `history.jsonl` on each run
- Load history into dashboard
- Show trend lines
**Deliverables:**
- Append logic to `generate_metrics.py`
- Dashboard loads and displays historical data
- Trend charts show last 30 runs
### Phase 4: Automated Alerts (MOVED TO RFC-040)
**Status:** ⏭️ **Extracted to RFC-040** for independent evolution
Phase 4 (automated alerts) has been extracted to a separate RFC to enable:
- Independent evolution of alerting strategy
- Clear completion milestone for RFC-026 (Phases 0-3)
- Focused implementation tracking in v2.7 milestone
**See:** [RFC-040: Automated Metrics Alerts](RFC-040-automated-metrics-alerts.md)
**Original scope (now in RFC-040):**
- PR comments on metric changes
- Webhook notifications (optional)
- Email alerts (optional)
**Rationale for extraction:**
- Phases 0-3 are complete and production-ready
- Phase 4 is substantial work (~1 day) not yet started
- Separating allows RFC-026 to be marked as complete
- Issue #216 now tracks RFC-040 implementation
## Access Patterns
### Consumption Methods by Audience
| Method | Audience | Use Case | Access Time |
| -------- | ---------- | ---------- | ------------- |
| **Job Summary** | PR authors | "Did I break something?" | 0s (view in PR checks) |
| **JSON API** | Automation | Gates, scripts, CI integration | 5s (`curl` + `jq`) |
| **Unified Dashboard** | Maintainers | Trend spotting, historical analysis, compare CI vs Nightly | 10s (browser) |
**Rationale:**
- **Job Summary** - Immediate feedback for PR authors checking if their changes broke tests
- **JSON API** - Machine-readable for automation, gates, and scripts (separate endpoints for CI and Nightly)
- **Unified Dashboard** - Visual tool for maintainers to spot trends, analyze historical data, and compare CI vs Nightly metrics using the data source selector
### For Quick Checks (< 60 seconds)
1. **GitHub Actions Job Summary** (0s) - View in PR checks
2. **JSON API** (5s) - `curl` + `jq` for automation
3. **HTML Dashboard** (10s) - Browser for visual inspection
### For Deep Analysis
- Download `history.jsonl` for custom analysis
- Use JSON API for integration with other tools
- Export to CSV for spreadsheet analysis
## Design Decisions
### 1. GitHub Pages vs. External Service
**Decision:** Use GitHub Pages for metrics publishing
**Rationale:**
- Zero infrastructure overhead
- Free and always available
- No authentication required
- Version-controlled history
- Easy to set up and maintain
**Future:** Can migrate to external service (Datadog, Grafana) if needed
### 2. JSONL vs. CSV for History
**Decision:** Use JSONL (JSON Lines) format
**Rationale:**
- Easy to append (no file rewrite)
- Machine-readable (JSON)
- Efficient for streaming
- Can parse line-by-line
- More flexible than CSV
### 3. Dashboard Technology
**Decision:** Static HTML with Chart.js (or similar)
**Rationale:**
- No server-side rendering needed
- Works with GitHub Pages (static hosting)
- Lightweight and fast
- Easy to customize
- No dependencies on external services
## Benefits
### Developer Experience
- ✅ **Quick access**: View metrics in < 10 seconds
- ✅ **Visual insights**: Charts show trends at a glance
- ✅ **Automatic alerts**: Regressions highlighted automatically
- ✅ **Multiple methods**: Browser, API, or PR checks
### Automation
- ✅ **JSON API**: Easy integration with other tools
- ✅ **Webhooks**: Can trigger alerts on deviations
- ✅ **CI integration**: Metrics published automatically
- ✅ **Historical data**: Available for custom analysis
### Maintenance
- ✅ **Zero infrastructure**: Uses GitHub Pages
- ✅ **Auto-updates**: Metrics published after each CI run
- ✅ **Version-controlled**: History stored in git
- ✅ **Low maintenance**: Minimal ongoing work
## Related Files
- `.github/workflows/python-app.yml`: CI test jobs
- `scripts/dashboard/generate_metrics.py`: Extract metrics from test artifacts (to be created)
- `scripts/dashboard/generate_dashboard.py`: Generate HTML dashboard (to be created)
- `docs/rfc/RFC-025-test-metrics-and-health-tracking.md`: Metrics collection (prerequisite)
## Notes
- Requires RFC-025 Phase 1 (Basic Metrics Collection) to be completed first
- GitHub Pages must be enabled for the repository
- Metrics are public (no authentication)
- Historical data grows over time (consider retention policy)