RFC-071: Corpus Intelligence Dashboard (GI/KG Viewer)¶
- Status: Completed (v2.6.0) — retrospective RFC; Dashboard tab and
/api/corpus/*metrics routes shipped before this document was added. - Authors: Podcast Scraper Team
- Stakeholders: Maintainers, operators using
podcast serve+ viewer, Playwright E2E owners - Related PRDs:
- PRD-025: Corpus intelligence dashboard (viewer)
- PRD-016: Operational observability — adjacent (CI/GitHub/monitor/profiles); this RFC is viewer-local corpus analytics only
- PRD-017, PRD-019, PRD-021, PRD-022, PRD-023
- Related ADRs:
- ADR-074: Multi-feed corpus parent layout and manifest — manifest and parent-root semantics for Pipeline charts
- ADR-064: Canonical server layer
- ADR-065: Vue 3 + Vite + Cytoscape
- ADR-066: Playwright for UI E2E
- Related RFCs:
- RFC-062: GI/KG viewer v2 — SPA shell; Dashboard is a first-class main tab (see Delivered scope table there + this RFC for detail)
- RFC-063: Multi-feed corpus —
corpus_manifest.json,run.jsonlayout consumed for Pipeline charts - RFC-061: Semantic corpus search —
/api/index/statsshape - RFC-067 —
GET /api/corpus/feeds(feeds-in-index vs catalog) - RFC-068 —
GET /api/corpus/digest?compact=trueglance line - Related UX specs:
- UXS-006: Dashboard — Dashboard tab (charts)
- VIEWER_IA: Viewer information architecture — shell IA
- UXS-001: GI/KG viewer — shared tokens and visual chrome
- Related Documents:
- E2E surface map
- Server guide (HTTP overview)
- Updated: 2026-04-19 (Align with shipped Dashboard: Coverage / Intelligence / Pipeline; VIEWER_IA + status bar artifacts; remove stale
CorpusDataWorkspacenarrative)
Abstract¶
The Dashboard view in web/gi-kg-viewer aggregates pipeline execution signals (corpus
manifest, discovered run.json files, stage timings, episode outcomes) and content
intelligence signals (FAISS index stats, optional digest snapshot, GI/KG artifact mtimes, catalog
publish-month histogram vs list counts, loaded-graph node types vs vector doc_type counts). The
browser composes Chart.js panels inside Coverage, Intelligence, and Pipeline sub-tabs
calling FastAPI corpus_metrics routes under /api/ plus existing index, digest, and
library endpoints. This RFC records the as-built architecture and boundaries relative to
RFC-062 (shell) and PRD-025 (product intent).
Problem Statement¶
Operators needed corpus-scale answers (runs, feeds, index health, artifact freshness) without exporting data to separate BI tools or reading raw JSON trees. The former API · Data left-panel cards were retired in favor of status bar corpus operations (List / Load into graph) plus Dashboard briefing and Coverage / Intelligence / Pipeline charts. Without a written RFC, the split between RFC-062 (monolithic viewer RFC) and corpus_metrics behavior was hard to navigate for contributors.
Delivered architecture¶
Frontend (web/gi-kg-viewer/)¶
| Piece | Role |
|---|---|
DashboardView.vue |
Fetches runs summary, coverage, feeds, digest, top persons; wires Pinia |
| indexStats / dashboardNav; hosts Coverage / Intelligence / Pipeline sub-tab UI. | |
BriefingCard.vue (and related) |
Briefing strip + handoffs; tab panels per UXS-006. |
| Chart / panel components | ArtifactActivityChart, CoverageByMonthChart, FeedCoverageTable, IndexStatusCard, IntelligenceSnapshot, PipelineRunHistoryStrip, PipelineStageChart, TopicClustersStatusBlock, TopicLandscape, TopVoices, VerticalBarChart. |
api/corpusMetricsApi.ts |
fetchCorpusRunsSummary (and related run helpers as used). |
api/corpusCoverageApi.ts |
fetchCorpusCoverage. |
api/corpusLibraryApi.ts |
fetchCorpusFeeds. |
api/corpusPersonsApi.ts |
fetchCorpusTopPersons. |
api/digestApi.ts |
Compact digest for dashboard one-liner. |
utils/artifactMtimeBuckets.ts |
Client-side GI/KG mtime bucketing (caps documented in code). |
Behavioral rules (refresh generation, loading flags, error handling) belong in this RFC; visual density, tokens, and aria labels for the Dashboard row belong in UXS-006 (tokens per UXS-001).
Backend (src/podcast_scraper/server/routes/corpus_metrics.py)¶
Mounted under the app /api prefix:
| Method | Path | Purpose |
|---|---|---|
| GET | /corpus/stats |
CorpusStatsResponse — feeds, episodes, digest topic config, publish-month |
| rollups, optional list counts when catalog builder runs. | ||
| GET | /corpus/documents/manifest |
Parsed corpus_manifest.json document for throughput bars. |
| GET | /corpus/documents/run-summary |
Single-run style summary helper (when used). |
| GET | /corpus/runs/summary |
CorpusRunsSummaryResponse — bounded scan of run.json under |
| corpus root (cap 150 files in module). |
Related routers (not defined in corpus_metrics.py but consumed by the same view):
GET /api/index/stats,POST /api/index/rebuild— index routes (RFC-061).GET /api/corpus/digest?compact=true— RFC-068.GET /api/corpus/feeds— RFC-067 (feeds in index vs catalog bars).
Data sources¶
- Filesystem under resolved corpus root:
run.json,corpus_manifest.json, metadata trees (RFC-063). - In-memory / client: merged GI/KG artifact list from
GET /api/artifacts+ loaded JSON for graph metrics and mtime timelines (subject to client caps).
Non-goals¶
- Not implementing new chart types or ML-based anomaly detection in this RFC’s scope.
- Not merging with Streamlit run-compare or RFC-064 profile YAML.
- Not adding Postgres for dashboard queries (RFC-051).
Testing¶
- E2E:
web/gi-kg-viewer/e2e/dashboard.spec.ts(and related mocks) — see E2E surface map Dashboard row. - Server: extend
tests/unit/podcast_scraper/server/andtests/integration/server/when changingcorpus_metricsresponse shapes (existing tests may already cover stats).
Relationship to RFC-062¶
RFC-062 remains the umbrella viewer + server seed RFC. RFC-071 is a focused slice for the Dashboard product surface so PRD/RFC indexes and cross-links stay precise. Prefer editing RFC-071 for Dashboard-only API or chart-behavior notes; edit RFC-062 / VIEWER_IA when shell navigation, status bar corpus flows, or shared stores change across tabs.