Skip to content

RFC-071: Corpus Intelligence Dashboard (GI/KG Viewer)

Abstract

The Dashboard view in web/gi-kg-viewer aggregates pipeline execution signals (corpus manifest, discovered run.json files, stage timings, episode outcomes) and content intelligence signals (FAISS index stats, optional digest snapshot, GI/KG artifact mtimes, catalog publish-month histogram vs list counts, loaded-graph node types vs vector doc_type counts). The browser composes Chart.js panels inside Coverage, Intelligence, and Pipeline sub-tabs calling FastAPI corpus_metrics routes under /api/ plus existing index, digest, and library endpoints. This RFC records the as-built architecture and boundaries relative to RFC-062 (shell) and PRD-025 (product intent).

Problem Statement

Operators needed corpus-scale answers (runs, feeds, index health, artifact freshness) without exporting data to separate BI tools or reading raw JSON trees. The former API · Data left-panel cards were retired in favor of status bar corpus operations (List / Load into graph) plus Dashboard briefing and Coverage / Intelligence / Pipeline charts. Without a written RFC, the split between RFC-062 (monolithic viewer RFC) and corpus_metrics behavior was hard to navigate for contributors.

Delivered architecture

Frontend (web/gi-kg-viewer/)

Piece Role
DashboardView.vue Fetches runs summary, coverage, feeds, digest, top persons; wires Pinia
indexStats / dashboardNav; hosts Coverage / Intelligence / Pipeline sub-tab UI.
BriefingCard.vue (and related) Briefing strip + handoffs; tab panels per UXS-006.
Chart / panel components ArtifactActivityChart, CoverageByMonthChart, FeedCoverageTable, IndexStatusCard, IntelligenceSnapshot, PipelineRunHistoryStrip, PipelineStageChart, TopicClustersStatusBlock, TopicLandscape, TopVoices, VerticalBarChart.
api/corpusMetricsApi.ts fetchCorpusRunsSummary (and related run helpers as used).
api/corpusCoverageApi.ts fetchCorpusCoverage.
api/corpusLibraryApi.ts fetchCorpusFeeds.
api/corpusPersonsApi.ts fetchCorpusTopPersons.
api/digestApi.ts Compact digest for dashboard one-liner.
utils/artifactMtimeBuckets.ts Client-side GI/KG mtime bucketing (caps documented in code).

Behavioral rules (refresh generation, loading flags, error handling) belong in this RFC; visual density, tokens, and aria labels for the Dashboard row belong in UXS-006 (tokens per UXS-001).

Backend (src/podcast_scraper/server/routes/corpus_metrics.py)

Mounted under the app /api prefix:

Method Path Purpose
GET /corpus/stats CorpusStatsResponse — feeds, episodes, digest topic config, publish-month
rollups, optional list counts when catalog builder runs.
GET /corpus/documents/manifest Parsed corpus_manifest.json document for throughput bars.
GET /corpus/documents/run-summary Single-run style summary helper (when used).
GET /corpus/runs/summary CorpusRunsSummaryResponse — bounded scan of run.json under
corpus root (cap 150 files in module).

Related routers (not defined in corpus_metrics.py but consumed by the same view):

  • GET /api/index/stats, POST /api/index/rebuildindex routes (RFC-061).
  • GET /api/corpus/digest?compact=trueRFC-068.
  • GET /api/corpus/feedsRFC-067 (feeds in index vs catalog bars).

Data sources

  • Filesystem under resolved corpus root: run.json, corpus_manifest.json, metadata trees (RFC-063).
  • In-memory / client: merged GI/KG artifact list from GET /api/artifacts + loaded JSON for graph metrics and mtime timelines (subject to client caps).

Non-goals

  • Not implementing new chart types or ML-based anomaly detection in this RFC’s scope.
  • Not merging with Streamlit run-compare or RFC-064 profile YAML.
  • Not adding Postgres for dashboard queries (RFC-051).

Testing

  • E2E: web/gi-kg-viewer/e2e/dashboard.spec.ts (and related mocks) — see E2E surface map Dashboard row.
  • Server: extend tests/unit/podcast_scraper/server/ and tests/integration/server/ when changing corpus_metrics response shapes (existing tests may already cover stats).

Relationship to RFC-062

RFC-062 remains the umbrella viewer + server seed RFC. RFC-071 is a focused slice for the Dashboard product surface so PRD/RFC indexes and cross-links stay precise. Prefer editing RFC-071 for Dashboard-only API or chart-behavior notes; edit RFC-062 / VIEWER_IA when shell navigation, status bar corpus flows, or shared stores change across tabs.

References