PRD-025: Corpus Intelligence Dashboard (GI/KG Viewer)¶
- Status: Implemented (v2.6.0) — documented retrospectively; behavior shipped in viewer Dashboard
tab and supporting
/api/corpus/*aggregates. - Authors: Podcast Scraper Team
- Related RFCs:
- RFC-071: Corpus intelligence dashboard (viewer) — technical design and API consumption
- RFC-062: GI/KG viewer v2 — SPA shell, Dashboard main tab
- RFC-063: Multi-feed corpus — manifest /
run.jsonlayout feeding Pipeline charts - RFC-061: Semantic corpus search (FAISS) — vector index stats in Content intelligence
- RFC-067 / RFC-068 — catalog + digest data mixed into dashboard copy
- Related PRDs (adjacent product surfaces):
- PRD-016: Operational observability — distinct from this PRD: PRD-016 covers CI metrics, GitHub Pages, frozen profiles (RFC-064/066), live monitor (RFC-065). This PRD is corpus-local analytics in the viewer only.
- PRD-017, PRD-019 — GI/KG artifacts summarized in charts
- PRD-021 — index footprint and doc-type bars
- PRD-022, PRD-023 — feeds catalog + digest glance strings
- Related UX specs:
- VIEWER_IA: Viewer information architecture — shell IA (status bar corpus flows)
- UXS-006: Dashboard — Dashboard tab (charts) layout, tokens, accessibility targets
- UXS-001: GI/KG viewer — shared design system
- Related Documents:
- E2E surface map — Dashboard row (Playwright contract)
- Updated: 2026-04-19 (VIEWER_IA + status bar; remove
CorpusDataWorkspace/ left API · Data narrative; Coverage / Intelligence / Pipeline sub-tabs)
Summary¶
Operators and developers need a single in-viewer place to see whether a corpus is healthy:
pipeline runs (manifest throughput, latest run.json stages, episode outcomes), and content
intelligence (vector index vs catalog, digest one-liner, GI/KG write-time timelines, publish-month
histograms, graph node types vs indexed doc types). The Dashboard main tab in web/gi-kg-viewer
delivers Chart.js-based panels under Coverage, Intelligence, and Pipeline sub-tabs, fed by FastAPI
/api/corpus/* routes and client-side merges with the loaded graph and index stores.
Background¶
Before the Dashboard tab, operators relied on API · Data left-panel cards (retired); Health + corpus
List / Load into graph now live on the status bar (VIEWER_IA), while Dashboard focuses on
briefing + Coverage / Intelligence / Pipeline charts, or on ad hoc cat, or on external
tools to correlate run.json, corpus_manifest.json, catalog stats, and index health. The
Dashboard does not replace RFC-064 frozen profiles or RFC-066 Streamlit run compare; it answers
“what does this corpus root look like right now?” inside the same session as graph and search.
Goals¶
- At-a-glance corpus summary — feeds, episodes, digest topic bands, coverage rollups, GI list counts when artifacts
are listed (from
/api/corpus/*aggregates +GET /api/index/stats+ client context). - Pipeline visibility — manifest document, run summaries, cumulative growth, latest run stage bars,
episode outcome bars from
run.jsondiscovery (RFC-063). - Content intelligence — vector index glance (
GET /api/index/stats), optional compact digest (GET /api/corpus/digest?compact=true), GI/KG mtime timelines (client-bucketed), publish-month catalog vs histogram insight, graph node-type vs index doc-type bars. - Trust and navigation — blurbs point operators to status bar corpus controls and Dashboard Coverage / Intelligence / Pipeline for corrective actions (reindex, refresh catalog, inspect runs).
- Testable contract — Playwright
dashboard.spec.tsand E2E surface map.
Non-goals¶
- Not a replacement for Streamlit run comparison (RFC-047, RFC-066) or frozen release profiles (RFC-064).
- Not real-time pipeline monitoring during a run — that is RFC-065
(
--monitor). - Not arbitrary SQL or Postgres — RFC-051 remains separate.
- Not natural-language dashboard queries; structured charts and labels only.
User-facing requirements¶
| Area | Requirement |
|---|---|
| Entry | Dashboard appears in Main views navigation with Digest, Library, Graph. |
| Summary strip | When API + corpus path healthy, show Corpus summary counts (role="group", |
aria-label="Corpus summary counts"). |
|
| Sections | Dashboard sections tablist: Pipeline vs Content intelligence; one hint line |
| under tabs. | |
| Pipeline charts | Manifest-related bars, run duration, cumulative growth, latest-run stage stacked |
| bars, episode-outcome horizontal bars when data exists. | |
| Content intelligence | Vector index and digest glance region; GI+KG timelines (subject to client |
| caps); publish-month bars + catalog vs bar-sum insight; node-type and doc-type bars with optional | |
| % of vectors. | |
| Loading / errors | Optional loading copy; errors surfaced without breaking the shell. |
| Visual contract | UXS-006 Dashboard tab (charts) (tokens per UXS-001). |
Success criteria¶
- With a healthy server and corpus path, an operator can open Dashboard and see Pipeline and Content intelligence without loading the graph canvas.
- Charts use the same corpus root as the status bar field and Dashboard corpus workspace (same logical root as the former API · Data cards) and respect multi-feed layout where applicable.
make test-ui-e2ecovers Dashboard surfaces per E2E map.- Documentation chain: PRD-025 (this) → RFC-071 → UXS-006.