Skip to content

PRD-025: Corpus Intelligence Dashboard (GI/KG Viewer)

Summary

Operators and developers need a single in-viewer place to see whether a corpus is healthy: pipeline runs (manifest throughput, latest run.json stages, episode outcomes), and content intelligence (vector index vs catalog, digest one-liner, GI/KG write-time timelines, publish-month histograms, graph node types vs indexed doc types). The Dashboard main tab in web/gi-kg-viewer delivers Chart.js-based panels under Coverage, Intelligence, and Pipeline sub-tabs, fed by FastAPI /api/corpus/* routes and client-side merges with the loaded graph and index stores.

Background

Before the Dashboard tab, operators relied on API · Data left-panel cards (retired); Health + corpus List / Load into graph now live on the status bar (VIEWER_IA), while Dashboard focuses on briefing + Coverage / Intelligence / Pipeline charts, or on ad hoc cat, or on external tools to correlate run.json, corpus_manifest.json, catalog stats, and index health. The Dashboard does not replace RFC-064 frozen profiles or RFC-066 Streamlit run compare; it answers “what does this corpus root look like right now?” inside the same session as graph and search.

Goals

  1. At-a-glance corpus summary — feeds, episodes, digest topic bands, coverage rollups, GI list counts when artifacts are listed (from /api/corpus/* aggregates + GET /api/index/stats + client context).
  2. Pipeline visibility — manifest document, run summaries, cumulative growth, latest run stage bars, episode outcome bars from run.json discovery (RFC-063).
  3. Content intelligence — vector index glance (GET /api/index/stats), optional compact digest (GET /api/corpus/digest?compact=true), GI/KG mtime timelines (client-bucketed), publish-month catalog vs histogram insight, graph node-type vs index doc-type bars.
  4. Trust and navigation — blurbs point operators to status bar corpus controls and Dashboard Coverage / Intelligence / Pipeline for corrective actions (reindex, refresh catalog, inspect runs).
  5. Testable contract — Playwright dashboard.spec.ts and E2E surface map.

Non-goals

  • Not a replacement for Streamlit run comparison (RFC-047, RFC-066) or frozen release profiles (RFC-064).
  • Not real-time pipeline monitoring during a run — that is RFC-065 (--monitor).
  • Not arbitrary SQL or Postgres — RFC-051 remains separate.
  • Not natural-language dashboard queries; structured charts and labels only.

User-facing requirements

Area Requirement
Entry Dashboard appears in Main views navigation with Digest, Library, Graph.
Summary strip When API + corpus path healthy, show Corpus summary counts (role="group",
aria-label="Corpus summary counts").
Sections Dashboard sections tablist: Pipeline vs Content intelligence; one hint line
under tabs.
Pipeline charts Manifest-related bars, run duration, cumulative growth, latest-run stage stacked
bars, episode-outcome horizontal bars when data exists.
Content intelligence Vector index and digest glance region; GI+KG timelines (subject to client
caps); publish-month bars + catalog vs bar-sum insight; node-type and doc-type bars with optional
% of vectors.
Loading / errors Optional loading copy; errors surfaced without breaking the shell.
Visual contract UXS-006 Dashboard tab (charts) (tokens per UXS-001).

Success criteria

  1. With a healthy server and corpus path, an operator can open Dashboard and see Pipeline and Content intelligence without loading the graph canvas.
  2. Charts use the same corpus root as the status bar field and Dashboard corpus workspace (same logical root as the former API · Data cards) and respect multi-feed layout where applicable.
  3. make test-ui-e2e covers Dashboard surfaces per E2E map.
  4. Documentation chain: PRD-025 (this) → RFC-071UXS-006.

References