Release v2.6.0 — Corpus Intelligence UI, Semantic Search, and Operator Tooling¶
Release Date: April 2026
Type: Minor Release
Last Updated: April 2026
Summary¶
v2.6.0 is a minor release that ships a full browser surface for exploring processed corpora alongside the pipeline: a Vue 3 + Vite GI/KG viewer on a FastAPI server, semantic corpus search (FAISS) exposed in the UI and CLI, Corpus Library, Digest, and Dashboard experiences, and a graph exploration toolkit for Cytoscape-based GI/KG views. On the provider and evaluation side, the release tightens how you compare quality and cost: the Run Comparison Streamlit app gains a Performance tab tied to frozen YAML profiles (RFC-064), sitting on top of the seven LLM providers and hybrid ML stack delivered in v2.5.0.
The Python library surface (Config, run_pipeline, service.run) stays backward compatible. HTTP, the SPA, and server-only routes are additive behind pip install -e '.[dev]'. GIL and KG reach Completed status for single-layer artifacts and consumption (RFC-049 / RFC-050, RFC-055 / RFC-056); cross-layer identity remains future work (Draft RFC-072). Multi-feed corpus layout, manifest, and unified indexing ship per RFC-063.
Spotlight — GI/KG viewer and corpus UI¶
v2.5.0 expanded who summarizes and detects speakers (providers). v2.6.0 expands where you inspect the results: a dedicated GI/KG viewer aligned with ADR-064–ADR-066 and RFC-062.
Stack and entrypoints
- FastAPI —
podcast serve(optional[dev]extra), CORS and static SPA, OpenAPI at/docs. - Vue 3 + Vite + Cytoscape — SPA under
web/gi-kg-viewer, shellcorpusPath, shared design rules in UXS-001 and feature UXSs (Digest, Library, Graph, Search, Dashboard, …). - Playwright — UI E2E in
web/gi-kg-viewer/e2e/; contract summarized ine2e/E2E_SURFACE_MAP.md.
Tabs and flows (high level)
| Area | What you get | PRD / RFC |
|---|---|---|
| Library | Feeds and episodes from disk, pagination and filters, episode detail (summary bullets, GI/KG paths), FAISS similar episodes when indexed, handoffs to Graph and Search | PRD-022, RFC-067 |
| Digest | Rolling digest of recent work across feeds, 24h glance from Library, optional semantic topic bands when a vector index exists | PRD-023, RFC-068 |
| Dashboard | Pipeline vs Content intelligence Chart.js panels, corpus stats, manifest awareness, capped run.json discovery, timelines for index / digest / GI-KG |
PRD-025, RFC-071 |
| Graph | Zoom (100% / numeric), Shift+drag box zoom, minimap v1, degree-bucket filter, built-in layouts, edge filters | PRD-024, RFC-069 |
| Search | Semantic search over the corpus in the UI, consistent with CLI podcast search / indexing |
PRD-021, RFC-061, RFC-062 |
HTTP surface
/api/corpus/*— Library and Dashboard data (feeds, episodes, detail, similar, aggregates as documented).GET /api/corpus/digest— Digest JSON for the Digest tab and health discovery.GET /api/search— Semantic search API used by the viewer (Server Guide).POST /api/index/rebuild/GET /api/index/stats— Background vector index rebuild (202 / 409 semantics) and staleness-oriented stats for operators.GET /api/health— May includecorpus_library_api,corpus_digest_api, and related flags for capability discovery (Migration Guide).
Design and UX references
- Server Guide — full route table and behavior notes.
- E2E Testing Guide — Playwright workflow.
- Development Guide — local serve,
SERVE_OUTPUT_DIR, viewer development. - docs/uxs/index.md — UX specifications for shared tokens and feature surfaces.
Spotlight — Semantic search, Grounded Insights (GIL), and Knowledge Graph (KG)¶
v2.6.0 ships the retrieval and structured artifact layers that the viewer tabs sit on: vector search over transcript chunks, GIL (gi.json) for grounded quotes and insights, and KG (kg.json) for entities, topics, and relationships. Together they explain why Library can show GI/KG paths, Digest can show topic bands when indexed, Graph can render Cytoscape views, and Search can return semantic hits.
Semantic corpus search¶
- FAISS —
FaissVectorStoreimplements the vector-store contract (ADR-060); embed, index, and query paths are specified in RFC-061 and PRD-021. - CLI —
podcast indexandpodcast searchfor building and querying the corpus index; semantic exploration hooks forgi explorewhere documented in the CLI and guides. - HTTP —
GET /api/searchfor the viewer Search panel (same corpus root as the shell); index rebuild and stats under/api/index/*(see Server Guide). - Viewer — Semantic search wired to the API (left query column in the current shell; similar episodes in Library depend on the same index when present). Post–v2.6 viewer work moved corpus artifacts + Data cards onto Dashboard — see RFC-062 and UXS-006.
- After v2.6.0 — Draft RFC-070 tracks optional backends (Qdrant, pgvector, and so on); not part of the v2.6.0 FAISS ship.
Grounded Insight Layer (GIL)¶
- Completed (single layer) — RFC-049 (core model,
gi.json, grounding contract) and RFC-050 (consumption patterns, CLI and product use cases without cross-layer bridge). - What you can do — Inspect grounded insights and quotes per episode; drive Insight-oriented flows in the viewer and documentation under PRD-017 for the single-layer slice.
- Not in v2.6.0 — Canonical identity and
bridge.jsonfor cross-layer joins remain Draft (RFC-072); PRD-017 stays Partial until that work lands.
Knowledge Graph (KG)¶
- Completed (single layer) — RFC-055 (ontology,
kg.json, separation from GIL) and RFC-056 (roll-ups,kgCLI patterns, export-oriented use cases). - What you can do — Graph tab and related viewer flows consume KG artifacts alongside GIL for exploration (PRD-019 single-layer scope; RFC-069 for interaction toolkit).
- Same boundary as GIL — Cross-layer alignment is RFC-072, not shipped in v2.6.0.
Guides
- Semantic Search Guide — indexing,
GET /api/search, chunking, and operator notes. - Grounded Insights Guide —
gi.json, schema, CLI. - Knowledge Graph Guide —
kg.json, entities, relationships, bridge placeholder for future CIL.
Spotlight — Comparing providers and runs¶
v2.5.0 added five cloud LLM families plus Ollama on top of OpenAI and Gemini, with unified config and CLI flags. v2.6.0 improves how you compare them in practice—not only by reading docs, but by joining quality metrics with frozen resource profiles.
1. Provider landscape (carried forward from v2.5.0, still the reference in v2.6.0)
- Local ML — Whisper, spaCy, Transformers (default path).
- Hybrid ML (RFC-042) — MAP–REDUCE summarization with configurable REDUCE backends (transformers, Ollama, llama.cpp).
- Cloud LLM — OpenAI, Gemini, Anthropic, Mistral, DeepSeek, Grok, plus Ollama as local LLM host.
2. Documentation-first comparison
- AI Provider Comparison Guide — decision matrices, cost and quality framing, “which provider?” narrative.
- Provider Deep Dives — per-provider cards and quadrant-style comparisons.
- Evaluation reports — methodology (ROUGE, BLEU, embeddings, and related metrics) and report index.
- ML Model Comparison Guide — local and hybrid model tradeoffs.
3. Tooling new in the v2.6.0 track
- Run Comparison — Performance tab (RFC-066) — Streamlit
?page=performancejoins run metrics from experiments with frozen RFC-064 YAML profiles so you can relate summary quality (eval runs underdata/eval/) to resource shape (RSS, CPU, wall time by stage) on comparable fixtures. - Performance profiling framework (RFC-064) —
config/profiles/, captured artifacts underdata/profiles/,make profile-freeze/make profile-diff, scripts described in Performance profile guide. - AutoResearch closure (RFC-057 / ADR-073) — optimization loop and eval matrix work brought to a documented closure; silver references and broad config sweeps support evidence-backed model and prompt choices.
4. Live pipeline visibility (developers)
- Live Pipeline Monitor (RFC-065) —
--monitor,.pipeline_status.json, terminal or log-friendly status; optional[monitor]extras (memray, py-spy). See Live Pipeline Monitor guide.
Together, the v2.5.0 provider breadth and v2.6.0 Performance tab + frozen profiles + eval library give a coherent story: pick a provider or model, run smoke or benchmark evals, capture profiles on the same corpus shape, and inspect quality vs cost in one place.
Multi-feed corpus, manifest, and indexing¶
- RFC-063 — Multiple feeds, append/resume semantics, layout A, unified index behavior,
corpus_manifest.jsonand run-summary hooks. See CORPUS_MULTI_FEED_ARTIFACTS.md.
Pipeline download resilience and run metrics¶
- Configurable HTTP retries for media, transcripts, and RSS (
http_*,rss_*onConfig), plus application-level episode retries (episode_retry_max,episode_retry_delay_sec) after urllib3 exhaustion. - CLI —
--http-retry-total,--http-backoff-factor,--rss-retry-total,--rss-backoff-factor,--episode-retry-max,--episode-retry-delay-sec(CLI). metrics.json—http_urllib3_retry_events,episode_download_retries,episode_download_retry_sleep_seconds(Experiment Guide).- Optional Issue #522-class extensions — per-host throttling,
Retry-After, circuit breaker, RSS conditional GET; fields and flags documented under CONFIGURATION — Download resilience. failure_summaryinrun.jsonwhen episodes fail (counts by error type, failed episode identifiers).- Download resilience: documented canonically under CONFIGURATION.md — Download resilience (inline YAML presets; no separate example file required).
Operational observability (partial PRD-016)¶
Shipped in this release train: test metrics and GitHub Pages dashboards (RFC-025 / RFC-026), live monitor (RFC-065), frozen profiles and Run Compare Performance (RFC-064 / RFC-066). RFC-027 items (for example CSV export) remain open.
Documentation¶
- API overview — HTTP / viewer
- Server Guide
- Migration Guide — v2.6.0
- E2E Testing Guide
- Experiment Guide — pipeline
metrics.jsonand download resilience - RFC index — v2.6.0 rows — RFC-049, 050, 055, 056, 057, 061–071
Upgrade notes¶
- Library users — No code changes required for
run_pipeline/service.run. - Viewer or HTTP consumers — Install
[dev], runpodcast servewith a valid output directory, and align viewercorpusPathwith your corpus root. See Migration Guide. - Health JSON — Prefer explicit
corpus_digest_apionce on a current server build; older servers may omit it (see migration notes for Digest behavior).
Related release¶
- v2.5.0 — LLM provider expansion, MPS exclusive mode, entity reconciliation, run manifests, LLM metrics.