Skip to content

ADR-076: Streamlit for Operator Run Comparison and Performance Views

Context & Problem Statement

ADR-065 and RFC-062 standardize Vue 3 + Vite for the GI/KG viewer served by FastAPI. Separately, RFC-047 introduced a Streamlit app over data/eval/ artifacts for ML run comparison. RFC-066 extended that app with a Performance page joining eval runs and frozen YAML profiles (ADR-075).

Without an explicit decision, contributors might duplicate run-compare or performance charts inside the Vue app (splitting maintenance, auth, and data loading) or deprecate Streamlit prematurely.

Decision

  1. Streamlit remains the home for operator-facing eval tooling: tools/run_compare/ — quality comparisons, diagnostics, and the Performance page — stay on Streamlit + Plotly (optional [compare] extra), not in web/gi-kg-viewer/.
  2. Vue viewer scope: The SPA focuses on corpus exploration (graph, search, library, digest, dashboard) against a resolved corpus root and /api/* — not on batch eval directory workflows.
  3. Join semantics: When UI needs both eval metrics and frozen profiles, release tag is the primary join key (RFC-066); implementation stays in tools/run_compare/.
  4. Optional extra: Keeping Streamlit behind [compare] preserves lean installs for users who never open eval tools (RFC-047).

Rationale

  • Different data roots: Eval runs live under data/eval/; the viewer consumes live corpus roots — merging them in one SPA would couple unrelated release cycles.
  • Velocity: Streamlit is fast for internal Plotly dashboards; the viewer stack optimizes for Cytoscape, Pinia, and Playwright E2E.
  • Clear ownership: ML operators use make run-compare; corpus operators use podcast serve + viewer.

Alternatives Considered

  1. Rebuild run compare in Vue + FastAPI: Rejected; large duplicate of charts, file scanners, and session state; slower iteration for eval workflows.
  2. Single “mega” Streamlit for viewer + eval: Rejected; loses Cytoscape-first UX, typed API contracts, and ADR-064 server architecture.
  3. Jupyter-only notebooks for comparison: Rejected for onboarding; Streamlit gives one command and shared README entrypoint.

Consequences

  • Positive: Stable split of stacks; RFC-047/066 remain authoritative for Streamlit behavior.
  • Negative: Two UI stacks to maintain (Python extras vs Node); acceptable given distinct users.
  • Neutral: Links from docs may point operators to both make run-compare and make serve.

Implementation Notes

  • Module: tools/run_compare/ (app.py, data.py, README).
  • Install: pip install -e ".[compare]"; run make run-compare or streamlit run tools/run_compare/app.py.

References