RFC-062: GI/KG Viewer v2 — Semantic Search UI & Web Application Architecture¶
- Status: Draft
- Authors: Podcast Scraper Team
- Stakeholders: Core team, GIL/KG consumers, viewer users
- Related PRDs:
docs/prd/PRD-021-semantic-corpus-search.md(semantic search — surfaced in viewer)docs/prd/PRD-017-grounded-insight-layer.md(GIL — primary visualization content)docs/prd/PRD-019-knowledge-graph-layer.md(KG — primary visualization content)- Related RFCs:
docs/rfc/RFC-061-semantic-corpus-search.md(search backend this viewer consumes)docs/rfc/RFC-049-grounded-insight-layer-core.md(GIL artifact format)docs/rfc/RFC-050-grounded-insight-layer-use-cases.md(UC4/UC5 — semantic QA, Insight Explorer)docs/rfc/RFC-055-knowledge-graph-layer-core.md(KG artifact format)docs/rfc/RFC-056-knowledge-graph-layer-use-cases.md(KG use cases)docs/rfc/RFC-051-database-projection-gil-kg.md(future structured query backend)- Related UX specs:
docs/uxs/UXS-001-gi-kg-viewer.md(visual and token contract for viewer v1/v2)- Related Documents:
- GitHub #489 — Implementation issue for this RFC
- GitHub #445 — Viewer v1 implementation
- GitHub #484 — Semantic Corpus Search Phase 1
- GitHub #466 — GI + KG depth roadmap
- GitHub #50 — Simple UI + server (v2.7)
- GitHub #347 — UI to access output from DB (v2.7)
- GitHub #46 — Docker architecture (v2.7)
docs/architecture/PLATFORM_ARCHITECTURE_BLUEPRINT.md— Platform architecture vision
Abstract¶
This RFC proposes a full rebuild of the GI/KG viewer (web/gi-kg-viz/) as a proper
Vue 3 web application backed by the project's first FastAPI server layer. The viewer v2
replaces the current collection of vanilla-JS pages with a single-page application
featuring semantic search integration (RFC-061), a Cytoscape.js graph engine, typed
state management via Pinia, and a REST API layer wrapping the VectorStore,
gi explore, and artifact loading capabilities of the core Python library.
The FastAPI backend is intentionally placed in src/podcast_scraper/server/ — not
viewer/ — because it is the seed of the platform API. The viewer is the first
consumer of this server; the platform UI (#50,
#347, megasketch Part A) is the
second. One server, pluggable route groups, feature flags to activate platform routes
when ready.
Architecture Alignment: The server is a consumption and coordination layer — it
does not modify artifacts, pipeline stages, or CLI commands. It wraps existing Python
APIs (VectorStore.search(), gi explore, artifact loading) behind REST endpoints.
The frontend is a static SPA served by the same process, with a file-picker fallback
for offline use. This follows the megasketch constraint A.2: "One pipeline core,
multiple shells" — the CLI is one shell, the server is another.
Problem Statement¶
The viewer v1 (web/gi-kg-viz/, #445)
was built as an exploratory prototype to visualize GI and KG artifacts:
- No search capability — no text search, no semantic search, no query interface. The only "filtering" is by node type or file name substring. With the semantic search layer landing in #484 (RFC-061), the viewer has no way to surface the most powerful new capability in the stack.
- Duplicate graph engines — both vis-network and Cytoscape.js render the same data
in parallel pages (
graph-vis.html,graph-cyto.html). This was useful for comparison during prototyping but doubles maintenance for no ongoing value. - No shared state — each HTML page manages its own global state via
window.GiKgVizIIFEs. There is no reactive data layer, no store, no event bus. Adding features requires duplicating logic across pages. - No framework — vanilla JS with CDN script tags. No bundler, no type checking, no component model. The codebase is fragile at ~153 KB of hand-managed JavaScript.
- No backend integration — the viewer loads files via browser File API or an
optional dev server (
gi_kg_viz_server.py). It cannot call Python APIs (search, explore, index stats) without a proper backend layer. - Multiple entry points —
index.html,graph-vis.html,graph-cyto.html,json-only.html— fragmenting the experience instead of offering a unified application.
Semantic Corpus Search (RFC-061) adds podcast search, podcast index --stats, and
upgraded gi explore --topic / gi query with semantic matching. These capabilities
need a visual consumption layer that preserves GIL provenance (grounding, quotes,
timestamps) and enables cross-feed discovery — the same use cases RFC-061 targets but
through a graphical interface.
Use Cases:
- Visual Semantic Search: Type a natural-language query, see ranked results with type badges, scores, episode attribution, and grounding status — then click a result to focus the corresponding node in the graph
- Corpus Dashboard: At-a-glance view of index health — vector counts, type
breakdown, feed coverage, freshness, model info — equivalent to
podcast index --statsbut always visible - Semantic Explore/QA: Ask "what do my podcasts say about X?" and get the UC4 answer envelope rendered as a card with supporting evidence, not just terminal JSON
- Graph + Search Integration: Search results and graph exploration in the same view — click a search result to see its neighborhood, double-click a node to search for similar content
- Offline Fallback: Load
gi.json/kg.jsonfiles from disk when no backend is running — degraded mode (no search, no explore) but graph and metrics still work
Goals¶
- Single-page Vue 3 application replacing the multi-page vanilla-JS prototype
- Cytoscape.js as sole graph engine with proper Vue component wrapping
- Semantic search panel consuming the RFC-061
VectorStore.search()API via a FastAPI endpoint - Index health dashboard surfacing
IndexStatsvisually - Explore/QA integration exposing
gi explore --topicandgi querythrough the viewer - FastAPI server in
src/podcast_scraper/server/— the project's canonical server layer, with viewer routes as the first route group and platform routes (#50, #347) added later via feature flags - Clean frontend architecture with Pinia stores, typed API client, and composables
- File-picker fallback for offline/no-backend usage (graph and metrics only)
- Playwright E2E test layer for browser-based regression testing of viewer functionality
- Platform-ready design — server, frontend, and API contracts designed to grow into the full platform vision (megasketch Part A/B) without architectural rewrites
Constraints & Assumptions¶
Constraints:
- Must consume existing artifacts and APIs — no changes to
gi.json,kg.json, or pipeline output formats - Backend must wrap existing Python functions, not reimplement search or explore logic
- Must work without a backend (degraded mode) for quick file inspection
- Must run with
podcast serveCLI command andmake serveMakefile target — single command to start both frontend and backend - No authentication required (single-user local tool; auth deferred to platform mode)
- Server module naming must be
server/notviewer/— it is the project's server layer, not just the viewer's backend
Assumptions:
- RFC-061 (Semantic Corpus Search) will be implemented before or in parallel with the search panel (M4). The viewer backend can be built first; the search endpoint activates when a vector index exists.
- The v1 viewer (
web/gi-kg-viz/) remains in the repo during v2 development and is removed after v2 reaches feature parity. - The target user is a developer or power user running the tool locally.
Design & Implementation¶
1. Tech Stack¶
Frontend:
| Layer | Technology | Rationale |
|---|---|---|
| Framework | Vue 3 (Composition API, <script setup>) |
Lightweight reactivity, SFCs, no JSX overhead. Right weight for a data visualization tool. |
| Build | Vite | Fast HMR, native ESM, zero-config Vue support. Static output for production. |
| Graph | Cytoscape.js | Programmatic API for filtering/styling/selection. Compound nodes for clustering. Extensions for layouts (cola, dagre) and interaction (popper, cxtmenu). |
| Charts | Chart.js 4 (via vue-chartjs) | Familiar from v1, lightweight, covers distribution and stats charts. |
| CSS | Tailwind CSS | Utility-first, built-in dark mode, replaces 600+ line custom stylesheet. |
| State | Pinia | Vue's official store. Replaces window.GiKgViz globals with reactive typed stores. |
| HTTP | Native fetch (or ofetch) | No need for axios at this scale. Typed wrappers in api/ layer. |
| Types | TypeScript | Type safety for artifact shapes, search results, API contracts. |
Backend:
| Layer | Technology | Rationale |
|---|---|---|
| Server | FastAPI | Async, typed, auto-docs. Extends gi_kg_viz_server.py pattern. Aligns with platform megasketch (Part A). |
| API style | REST + JSON | Matches CLI --format json contracts. Same shapes frontend consumes. |
| Search | Thin wrapper over VectorStore.search() |
No new search logic — exposes what RFC-061 builds. |
| Artifacts | File-system reads | gi.json, kg.json loaded from output_dir. |
| Static files | FastAPI StaticFiles mount |
Production: FastAPI serves Vite build output. Dev: Vite proxy. |
Why Vue 3 over React:
- Reactivity without ceremony — filter toggles, search state, graph selection all
naturally reactive without
useCallback/useMemopatterns - Single-file components — template + script + scoped style in one file, cleaner for a small codebase
- Lighter bundle — Vue 3 core ~16 KB gzipped vs React + ReactDOM ~42 KB; matters when main payload is Cytoscape + Chart.js
- Composition API — same composable/hooks pattern as React but without rules-of-hooks constraints
Why Cytoscape.js (dropping vis-network):
- Programmatic API — richer API for computed styles, filtering, batch operations. Critical for search-result-to-node highlighting.
- Compound nodes — native parent/child grouping for feed clusters, semantic clusters
- Extension ecosystem — cola/dagre layouts, popper tooltips, cxtmenu context menus
- WebGL escape hatch —
cytoscape-canvasrenderer for large corpora - Active maintenance — stronger community; better TypeScript support
2. Frontend Architecture¶
web/gi-kg-viewer/
├── package.json
├── vite.config.ts
├── tailwind.config.js
├── tsconfig.json
├── index.html
├── src/
│ ├── main.ts # Vue app bootstrap
│ ├── App.vue # Root component + router-view
│ ├── router/
│ │ └── index.ts # vue-router: explore, dashboard, detail
│ ├── stores/
│ │ ├── artifacts.ts # loaded GI/KG artifacts, merge logic
│ │ ├── search.ts # query, results, filters, loading state
│ │ ├── indexHealth.ts # IndexStats, feed coverage, freshness
│ │ └── graph.ts # focus node, layout config, visual state
│ ├── api/
│ │ ├── client.ts # base fetch wrapper, error handling
│ │ ├── artifacts.ts # GET /api/artifacts, /api/artifacts/:name
│ │ ├── search.ts # GET /api/search
│ │ └── index.ts # GET /api/index/stats
│ ├── composables/
│ │ ├── useSearch.ts # search + debounce + result mapping
│ │ ├── useGraph.ts # cytoscape instance lifecycle
│ │ ├── useFilters.ts # filter state + apply to graph/search
│ │ └── useArtifacts.ts # artifact loading + file-picker fallback
│ ├── components/
│ │ ├── graph/
│ │ │ ├── GraphCanvas.vue # cytoscape wrapper
│ │ │ ├── GraphLegend.vue # color legend with click-to-solo
│ │ │ ├── GraphControls.vue # fit, re-layout, export PNG/SVG
│ │ │ └── NodeDetail.vue # click-to-inspect detail panel
│ │ ├── search/
│ │ │ ├── SearchBar.vue # query input + submit
│ │ │ ├── SearchResults.vue # scrollable result list
│ │ │ ├── SearchFilters.vue # type, feed, date, speaker, grounded
│ │ │ └── ResultCard.vue # typed result card (insight/quote/chunk)
│ │ ├── dashboard/
│ │ │ ├── IndexStats.vue # vector counts, model info
│ │ │ ├── CorpusTimeline.vue # episodes on time axis by feed
│ │ │ └── FeedCoverage.vue # feed × indexed status matrix
│ │ ├── common/
│ │ │ ├── MetricsPanel.vue # reusable key-value metrics display
│ │ │ ├── FilterChips.vue # active filter badges
│ │ │ └── LoadingState.vue # spinner / skeleton
│ │ └── layout/
│ │ ├── AppShell.vue # sidebar + main + optional right panel
│ │ ├── Sidebar.vue # navigation + artifact list
│ │ └── TopBar.vue # search bar + mode toggle
│ ├── views/
│ │ ├── ExploreView.vue # graph + search + filters (main view)
│ │ ├── DashboardView.vue # index health + corpus overview
│ │ └── DetailView.vue # single episode/insight deep dive
│ ├── types/
│ │ ├── artifact.ts # GI/KG TypeScript types
│ │ ├── search.ts # SearchResult, IndexStats, SearchFilters
│ │ └── graph.ts # CyNode, CyEdge, VisualConfig
│ ├── theme/
│ │ ├── tokens.css # CSS custom-property definitions (UXS-001 tokens)
│ │ ├── theme.ts # runtime helpers: resolve token, apply preset
│ │ └── presets/ # optional alternate value sets for experimentation
│ │ ├── default.css # production token values (= UXS-001 table)
│ │ └── compact.css # denser spacing / smaller font experiment
│ └── utils/
│ ├── colors.ts # node type → color mapping (from shared.js)
│ ├── parsing.ts # artifact parsing (ported from shared.js)
│ └── formatting.ts # dates, scores, text truncation
└── public/
└── favicon.svg
Key architectural decisions:
- Stores own the data and derived state. Components read from stores reactively.
API calls are made via the
api/layer and results are written to stores. - Composables encapsulate reusable logic (graph lifecycle, debounced search, filter application) without coupling to specific components.
- Types mirror the Python Pydantic models (
SearchResult,IndexStats,ParsedArtifact) and provide compile-time safety. - API client is a thin typed wrapper over fetch. Each endpoint module exports
functions like
searchCorpus(query, filters)→Promise<SearchResponse>.
3. Server Architecture (Platform Seed)¶
The server lives in src/podcast_scraper/server/ — not viewer/. This is the
project's canonical server layer. The viewer is the first consumer; the platform
UI (#50, #347, megasketch) is the second. Route groups are pluggable via feature flags.
src/podcast_scraper/server/
├── __init__.py
├── app.py # FastAPI app factory (feature-flagged routers)
├── dependencies.py # shared: config, output_dir, vector_store
├── schemas.py # Pydantic response models (shared across routers)
├── middleware/
│ ├── __init__.py
│ └── cors.py # CORS config; future: auth, request logging
├── routes/
│ ├── __init__.py
│ ├── health.py # GET /api/health (always mounted)
│ │
│ │ # --- Viewer routes (v2.6, this RFC) ---
│ ├── artifacts.py # GET /api/artifacts, /api/artifacts/:name
│ ├── search.py # GET /api/search (semantic search)
│ ├── explore.py # GET /api/explore (gi explore / gi query)
│ ├── index_stats.py # GET /api/index/stats
│ │
│ │ # --- Platform routes (v2.7, added later via #50, #347) ---
│ ├── feeds.py # CRUD /api/feeds (catalog, placeholder)
│ ├── episodes.py # GET /api/episodes (browsing, placeholder)
│ ├── jobs.py # POST/GET /api/jobs (pipeline runs, placeholder)
│ └── status.py # GET /api/status (pipeline status, placeholder)
FastAPI app factory (app.py):
def create_app(
output_dir: Path,
*,
enable_viewer: bool = True,
enable_platform: bool = False,
vector_index_path: Path | None = None,
) -> FastAPI:
app = FastAPI(title="podcast_scraper")
# Always available
app.include_router(health_router)
# Viewer routes (v2.6 — this RFC)
if enable_viewer:
app.include_router(artifacts_router, prefix="/api")
app.include_router(search_router, prefix="/api")
app.include_router(explore_router, prefix="/api")
app.include_router(index_stats_router, prefix="/api")
app.mount("/", StaticFiles(...), name="viewer")
# Platform routes (v2.7, future — #50, #347, megasketch)
if enable_platform:
app.include_router(feeds_router, prefix="/api")
app.include_router(episodes_router, prefix="/api")
app.include_router(jobs_router, prefix="/api")
return app
- Accepts
output_dir(path to corpus outputs) and optionalvector_index_path - Lazily loads
FaissVectorStoreon first search request - Mounts Vite build output as static files at
/(viewer mode) - CORS enabled for dev mode (Vite dev server on different port)
- Platform routes are placeholder files until v2.7 work begins — they exist in the tree to make the growth path visible but are not mounted by default
CLI entry point (podcast serve):
# v2.6 — viewer mode (default)
podcast serve --output-dir ./output
# v2.7 — full platform mode (future)
podcast serve --output-dir ./output --platform
# Development shorthand
make serve # Start backend + frontend dev servers
The podcast serve command is a new CLI subcommand that starts the FastAPI server.
It replaces the ad-hoc scripts/gi_kg_viz_server.py script with a proper entry point.
Route contracts (viewer routes — implemented in this RFC):
# GET /api/health
@router.get("/api/health")
async def health() -> dict:
"""Server health check. Always mounted."""
...
# GET /api/artifacts?path=<output_dir>
@router.get("/api/artifacts")
async def list_artifacts(path: str) -> ArtifactListResponse:
"""List all gi.json and kg.json files in the output directory."""
...
# GET /api/artifacts/{name}
@router.get("/api/artifacts/{name}")
async def get_artifact(name: str) -> dict:
"""Load and return a parsed artifact by filename."""
...
# GET /api/search?q=<query>&type=insight,quote&top_k=20&feed=...
@router.get("/api/search")
async def search_corpus(
q: str,
type: list[str] | None = None,
feed: str | None = None,
since: str | None = None,
speaker: str | None = None,
grounded_only: bool = False,
top_k: int = 10,
) -> SearchResponse:
"""Semantic search over the vector index."""
...
# GET /api/index/stats
@router.get("/api/index/stats")
async def index_stats() -> IndexStatsResponse:
"""Return vector index statistics."""
...
# GET /api/explore?topic=<topic>&speaker=<speaker>&limit=20
@router.get("/api/explore")
async def explore_insights(
topic: str | None = None,
speaker: str | None = None,
limit: int = 20,
grounded_only: bool = False,
) -> ExploreResponse:
"""Run gi explore and return ExploreOutput."""
...
Platform route contracts (v2.7, placeholder stubs only):
# Future: CRUD /api/feeds — catalog management (#50)
# Future: GET /api/episodes — episode browsing (#347)
# Future: POST/GET /api/jobs — pipeline job management (#50)
# Future: GET /api/status — pipeline status monitoring (#50)
Integration with existing code:
The server imports and calls existing functions — it does not reimplement logic:
VectorStore.search()for/api/searchrun_uc5_insight_explorer()for/api/exploreIndexStatsfrom the vector store for/api/index/stats- Direct file reads for
/api/artifacts
Relationship to existing service.py:
The existing podcast_scraper/service.py provides service.run() — a one-shot
pipeline execution entry point. The new server/ module is the long-lived server
that wraps pipeline capabilities behind HTTP. When platform mode (#50) lands, the
server will use service.run() internally for job execution. They are complementary:
service.py = "run once," server/ = "serve continuously."
4. Graph ↔ Search Integration¶
The key UX innovation of v2 is bidirectional linking between search results and graph nodes:
Search → Graph:
- User types a query in
SearchBar useSearchcomposable calls/api/search, writes results tosearchstoreSearchResultsrenders rankedResultCardcomponents- Clicking a
ResultCarddispatchesgraph.focusNode(docId)to the graph store GraphCanvasreceives the focus event, highlights the target node (glow + zoom), and dims non-neighborhood nodes
Graph → Search:
- User double-clicks a node (e.g., an Insight node) in
GraphCanvas useGraphcomposable reads the node's text content- Dispatches
search.searchSimilar(nodeText)— pre-fills the search bar and executes - Search results show content semantically similar to the clicked node
Focus + Context:
NodeDetailpanel shows full node properties: text, grounding status, supporting quotes, episode attribution, confidence score- For Insight nodes: expandable supporting quotes with transcript references
- For Transcript Chunk results: timestamp + "jump to transcript" link
5. Offline / No-Backend Fallback¶
The viewer must degrade gracefully when no backend is running:
| Feature | With backend | Without backend |
|---|---|---|
| Load artifacts | /api/artifacts |
File picker / showDirectoryPicker() |
| Graph rendering | Full | Full |
| Metrics panel | Full | Full (computed from loaded artifacts) |
| Semantic search | Full | Disabled (message: "Run podcast serve to enable search") |
| Explore/QA | Full | Disabled (same message) |
| Index stats | Full | Disabled |
Detection: On mount, api/client.ts pings GET /api/health. If it fails, the
artifacts store switches to file-picker mode and search/explore components show a
disabled state with an instructional message.
6. Dev Server & Build¶
Development:
# Terminal 1: FastAPI backend
make serve-api
# Runs: uvicorn podcast_scraper.server.app:create_app --reload --port 8100
# Terminal 2: Vite dev server
make serve-ui-dev
# Runs: cd web/gi-kg-viewer && npm run dev
# Vite proxies /api/* to localhost:8100
# Or combined:
make serve
# Runs both via a process manager or background + fg
CLI entry point (production):
# Viewer mode (serves built frontend + API)
podcast serve --output-dir ./output --port 8100
# With platform routes enabled (v2.7, future)
podcast serve --output-dir ./output --platform
Production build:
make build-ui
# Runs: cd web/gi-kg-viewer && npm run build
# Output: web/gi-kg-viewer/dist/
# FastAPI serves dist/ as static files
Makefile targets:
| Target | Action |
|---|---|
make serve |
Start backend + frontend dev servers |
make serve-api |
Start FastAPI backend only |
make serve-ui-dev |
Start Vite dev server only |
make build-ui |
Production build of frontend |
make lint-ui |
ESLint + vue-tsc type check |
make test-ui |
Vitest unit tests for frontend |
make test-ui-e2e |
Playwright E2E tests for viewer |
Key Decisions¶
- Full rebuild vs incremental port
- Decision: Full rebuild in
web/gi-kg-viewer/(new directory) -
Rationale: The v1 architecture (vanilla JS, globals, multi-page, dual engines) is fundamentally incompatible with the reactive component model, typed stores, and search integration needed for v2. Porting would require rewriting every file while fighting the existing structure. A clean start with proper tooling (Vite, TypeScript, Vue) is faster and produces a better result. v1 remains as reference until v2 reaches parity.
-
Cytoscape.js only (drop vis-network)
- Decision: Consolidate to Cytoscape.js as the sole graph engine
-
Rationale: The vis-network comparison served its purpose in #445 exploration. For a production viewer, maintaining two engines doubles code (graph-vis.js + graph-cyto.js) for no user value. Cytoscape has the stronger programmatic API for search-driven highlighting, compound nodes for clustering, and extension ecosystem for advanced layouts.
-
Vue 3 over React
- Decision: Vue 3 with Composition API
-
Rationale: Lighter bundle (~16 KB vs ~42 KB gzipped), natural reactivity for filter/search state without
useCallback/useMemooverhead, single-file components for a small codebase. The Composition API provides the same composable/hooks pattern. Either would work; Vue is the better fit for this scale. This decision also holds for future platform UI views (#50, #347) — Vue scales cleanly from data visualization to CRUD forms with the same component model. -
src/podcast_scraper/server/— platform-first naming - Decision: Backend lives in
server/, notviewer/, inside the Python package -
Rationale: This is the project's canonical server layer, not just the viewer's backend. The viewer is the first consumer; platform routes (#50, #347, megasketch) are the second. Naming it
server/avoids a rename when platform work starts. Feature flags (enable_viewer,enable_platform) control which route groups are mounted. This follows megasketch constraint A.2: "One pipeline core, multiple shells." Thepodcast serveCLI command starts this server. The existingscripts/gi_kg_viz_server.pyis replaced by this proper module. -
Tailwind CSS (no component library)
- Decision: Tailwind utility classes + custom components
-
Rationale: The v1
styles.css(600+ lines) would need rewriting regardless. A component library (PrimeVue, Vuetify) adds weight and opinion that fights a data-visualization UI. Tailwind gives dark mode, responsive, and consistent spacing out of the box with full design control. -
Token-based theming with preset support
- Decision: All visual tokens (colors, typography, spacing, radii) are defined as
CSS custom properties in
src/theme/tokens.cssand consumed by Tailwind viatailwind.config.jsextend.colors/extend.fontFamily. Atheme.tshelper exposes runtime getters for use in Chart.js and Cytoscape (which cannot read CSS vars directly). Optional CSS preset files (src/theme/presets/) can override tunable token values (e.g. fonts, radii, spacing) for rapid experimentation. -
Rationale: UXS-001 distinguishes frozen token names and conventions from open values (see "Tunable parameters" section). Separating token definitions from component code lets a developer swap font families, adjust spacing scales, or compare compact vs relaxed density by loading a single preset file -- without touching any Vue component. This is lightweight (no runtime theming library; just CSS cascade) and aligns with the Tailwind workflow.
-
Pinia state management
- Decision: Pinia stores for artifacts, search, graph, and index state
-
Rationale: Replaces the
window.GiKgViz/window.GiKgVizShellglobals with typed reactive stores. Components subscribe to store state and dispatch actions. Cross-component coordination (search result → graph focus) happens through store watchers, not DOM events or callback chains. Scales cleanly when platform stores (feeds.ts,jobs.ts,episodes.ts) are added in v2.7. -
Playwright for UI E2E testing
- Decision: Playwright as the browser E2E test framework
-
Rationale: Official Vue recommendation for E2E. Headless by default (CI friendly). Multi-browser (Chromium, Firefox, WebKit). Lighter than Cypress (no Electron). Built-in web assertions. Test structure mirrors the component structure for maintainability.
-
podcast serveCLI command - Decision: New top-level CLI subcommand to start the server
- Rationale: Replaces the ad-hoc
scripts/gi_kg_viz_server.pywith a proper entry point. Supports--output-dir,--port, and future--platformflag. Aligns with the "CLI stays first-class" constraint (megasketch A.2.1) — the server is started via CLI, not a separate tool.
Alternatives Considered¶
- Incremental refactor of v1
- Description: Add Vue components to existing HTML pages, gradually replace vanilla JS
- Pros: No big-bang rewrite; preserves working code
- Cons: Mixing vanilla JS globals with Vue reactivity creates bugs; no TypeScript; can't introduce Vite without restructuring; "half-ported" state is worse than either extreme
-
Why Rejected: The v1 architecture fundamentally conflicts with the target. A clean rebuild is faster for this codebase size (~153 KB).
-
React + Next.js
- Description: Full React SPA (or Next.js for SSR)
- Pros: Larger ecosystem; more developer familiarity broadly
- Cons: Heavier bundle; SSR unnecessary for a local tool; more boilerplate for state management
-
Why Rejected: Vue is a better fit at this scale. No SSR or SEO requirements. Either would work; Vue is lighter and more ergonomic for reactive data visualization.
-
Svelte + SvelteKit
- Description: Svelte for minimal bundle, SvelteKit for routing
- Pros: Smallest bundle; no virtual DOM; compiler-based reactivity
- Cons: Smaller ecosystem for Cytoscape/Chart.js integration; fewer community examples for the specific integration needed
-
Why Rejected: Vue has better Cytoscape integration examples and a more mature ecosystem for this specific use case. Svelte is a valid alternative.
-
Keep vis-network alongside Cytoscape
- Description: Maintain both graph engines, let user choose
- Pros: Comparison; user preference
- Cons: Doubles graph component code; doubles layout logic; no user has requested the comparison post-#445
-
Why Rejected: The comparison served its #445 purpose. Consolidation halves the graph code and focuses effort on search integration.
-
No backend (WASM embeddings in browser)
- Description: Run
all-MiniLM-L6-v2via ONNX Runtime Web; load FAISS index client-side - Pros: Fully offline; no Python server needed
- Cons: ~80 MB model download in browser; slow inference on CPU; complex build;
can't reuse Python
VectorStorecode; limits future API growth - Why Rejected: Heavy engineering for limited gain. The FastAPI backend is simpler, faster, and aligns with the platform vision.
Testing Strategy¶
Test Layers¶
| Layer | Tool | Scope | Speed |
|---|---|---|---|
| Frontend unit | Vitest | Store logic, composables, utilities, component rendering | Fast (no browser) |
| Frontend component | Vitest + Vue Test Utils | Component behavior with mocked stores/API | Fast (no browser) |
| API integration | pytest + httpx | Backend routes return correct shapes, error handling | Fast (no browser) |
| UI E2E | Playwright | Full browser: load → search → click → verify graph | Slow (~30s per test) |
| Visual regression | Playwright screenshots | Graph renders consistently, dark mode, layout | Optional, in CI |
Frontend Unit + Component Tests (Vitest)¶
- Store logic: artifact parsing, search result mapping, filter application, merge
logic. Mock API calls via
vi.mock. - Composables: debounced search, graph instance lifecycle, filter state management.
- Utility functions: color mapping, date formatting, text truncation, parsing.
- Component rendering: key components render correctly with given props/store state. Search bar emits correct events. Result cards render all fields for each doc type. Graph canvas initializes Cytoscape instance.
Backend Unit + Integration Tests (pytest)¶
- Unit tests: Route handlers return correct shapes. Error cases (missing index, invalid query, nonexistent artifact) return proper HTTP status codes. Artifact listing matches file system state.
- Integration tests: Full round-trip with httpx
TestClient: load test artifacts → call/api/artifacts→ verify response shape. Search with mockVectorStore→ verify result mapping and filtering. Health endpoint returns expected payload.
UI E2E Tests (Playwright)¶
Playwright provides the browser automation layer for end-to-end regression testing.
Test structure:
web/gi-kg-viewer/
├── e2e/
│ ├── playwright.config.ts # Playwright config (headless Chromium)
│ ├── fixtures/
│ │ ├── sample-gi.json # small test GI artifact (~5 insights)
│ │ ├── sample-kg.json # small test KG artifact (~10 entities)
│ │ └── sample-index/ # pre-built small FAISS index for search
│ │ ├── vectors.faiss
│ │ ├── metadata.json
│ │ └── index_meta.json
│ ├── helpers/
│ │ ├── server.ts # programmatic FastAPI start/stop for tests
│ │ └── test-data.ts # artifact loading + expectation helpers
│ └── tests/
│ ├── graph-load.spec.ts # load artifacts → graph renders nodes/edges
│ ├── graph-filters.spec.ts # toggle type filters → node count changes
│ ├── graph-focus.spec.ts # double-click node → 1-hop ego focus
│ ├── search.spec.ts # type query → results appear → score ordering
│ ├── search-to-graph.spec.ts # click result card → graph highlights node
│ ├── dashboard.spec.ts # navigate to dashboard → index stats render
│ ├── offline-mode.spec.ts # no backend → file picker fallback works
│ └── dark-mode.spec.ts # toggle theme → verify no visual breaks
Key E2E scenarios for regression testing:
- Graph renders — load sample GI+KG → verify node count matches artifact → verify edges visible → verify legend populates
- Search works — type query → verify result cards appear → verify score ordering → verify doc type badges render
- Search → Graph focus — click result card → verify node highlighted with glow → verify non-neighborhood dimmed → verify detail panel opens
- Filters apply — toggle "hide ungrounded" → verify node count decreases → reset → verify original count restored
- Dashboard loads — navigate to dashboard → verify index stats display → verify Chart.js distribution renders
- Offline fallback — no backend → verify file picker appears → load files → verify graph renders → verify search shows "start server" message
- Dark mode — toggle theme → take screenshot → compare with baseline
Playwright configuration:
import { defineConfig } from "@playwright/test";
export default defineConfig({
testDir: "./e2e/tests",
timeout: 30_000,
retries: 1,
use: {
baseURL: "http://localhost:8100",
headless: true,
screenshot: "only-on-failure",
trace: "retain-on-failure",
},
webServer: {
command: "make serve",
port: 8100,
reuseExistingServer: !process.env.CI,
},
});
Test Organization¶
web/gi-kg-viewer/src/__tests__/— frontend unit + component tests (Vitest)web/gi-kg-viewer/e2e/— browser E2E tests (Playwright)tests/unit/podcast_scraper/server/— backend unit tests (pytest)tests/integration/test_server_integration.py— backend API integration (pytest)
Test Execution¶
| Command | What Runs | When |
|---|---|---|
make test-ui |
Vitest (frontend unit + component) | During dev, pre-commit |
make test-ui-e2e |
Playwright (browser E2E) | Pre-merge, CI |
make ci-fast |
pytest (includes server unit + integration) | Pre-commit |
make ci |
Full suite (pytest + Vitest + Playwright) | CI pipeline |
Rollout & Monitoring¶
Rollout Plan:
- M1 — Scaffold + Server Shell (~2-3 days): Vite + Vue + Tailwind + Pinia project
in
web/gi-kg-viewer/. Token-based theming layer insrc/theme/—tokens.css(UXS-001 semantic tokens as CSS custom properties),theme.tsruntime helper, default preset, Tailwind config wired to tokens. FastAPI server skeleton insrc/podcast_scraper/server/with/api/healthand/api/artifactsendpoints.podcast serveCLI command.make servetarget. Verify Vite dev proxy works. Platform route placeholder stubs (empty files, not mounted). - M2 — Graph Port (~3-4 days):
GraphCanvas.vuewrapping Cytoscape.js with all v1 capabilities: load artifacts, render nodes/edges, filter by type, grounded-only toggle, legend with click-to-solo, 1-hop ego focus on double-click, node detail panel. Parsing logic ported fromshared.jsto typed TypeScript (parsing.ts,colors.ts). Merge logic (same-layer, GI+KG cross-layer) ported toartifactsstore. - M3 — Metrics + Dashboard (~2-3 days):
MetricsPanel.vue(key-value display),IndexStats.vue(vector counts, model info, freshness), distribution chart (Chart.js bar). File-picker fallback for no-backend mode.DashboardViewwith corpus overview. - M4 — Search Panel + Backend (~3-4 days):
SearchBar,SearchResults,SearchFilters,ResultCard./api/searchwrappingVectorStore.search(). Click result → graph focus. Search filters (type, feed, date, speaker, grounded-only, top-k). Depends on RFC-061VectorStorebeing available. - M5 — Explore/QA Integration (~2-3 days):
/api/exploreendpoint wrappingrun_uc5_insight_explorerandrun_uc4_semantic_qa. Topic explorer and QA view in frontend. Semantic matching when vector index available, substring fallback when not. - M6 — Polish (~2-3 days): Dark mode (Tailwind, driven by
tokens.cssprefers-color-scheme), finalize open tunable parameters in UXS-001 (typography, radii, spacing), keyboard shortcuts (/ for search, Escape to clear focus), graph export (PNG/SVG via Cytoscape), responsive layout, loading states, error handling, documentation update. - M7 — E2E Test Layer (~2-3 days): Playwright setup and configuration. Test
fixtures (sample artifacts, pre-built small FAISS index). Core E2E scenarios: graph
load, search, search-to-graph focus, filters, dashboard, offline fallback, dark mode.
make test-ui-e2etarget. CI integration.
Total estimate: ~18-23 days of focused work across 7 milestones.
Monitoring:
- API response times logged (FastAPI middleware)
- Frontend performance: graph render time, search latency (console metrics)
- Bundle size tracked per build
Success Criteria:
- All v1 graph + metrics capabilities available in v2 (feature parity)
- Semantic search returns results and highlights graph nodes
- Index health dashboard shows vector counts, feed coverage, freshness
- File-picker fallback works without backend
podcast serveandmake servestart the full application in one commandmake ci-fastpasses with server backend tests included- Dark mode works correctly, driven by UXS-001 semantic tokens in
tokens.css - Theme presets load correctly; swapping a preset changes typography/spacing/radii without touching component code
- Playwright E2E tests pass in CI (headless Chromium)
- Server architecture supports adding platform routes (#50, #347) without restructuring
- v1
web/gi-kg-viz/can be removed after v2 validation - All documentation deliverables complete (see Definition of Done below)
Definition of Done — Documentation Deliverables:
The following documentation must be created or updated before this RFC is considered complete. These are tracked as part of the implementation issue.
| Deliverable | Action | Description |
|---|---|---|
docs/guides/SERVER_GUIDE.md (new) |
Create | Comprehensive server guide: tech stack rationale, architecture overview, route conventions, how to add new routes, how to add platform routes, configuration, podcast serve usage, dev workflow, extension patterns. This is the go-to reference for anyone extending the server. |
docs/architecture/ARCHITECTURE.md |
Update | Add the server module (src/podcast_scraper/server/) to the module map. Document the "one pipeline core, multiple shells" pattern. Add the CLI → Server → Platform evolution diagram. |
docs/architecture/TESTING_STRATEGY.md |
Update | Add the UI testing layers (Vitest unit/component, Playwright E2E, visual regression). Document Makefile targets (make test-ui, make test-ui-e2e). Document test fixture strategy for viewer E2E. |
docs/guides/DEVELOPMENT_GUIDE.md |
Update | Add make serve, make serve-api, make serve-ui-dev, make build-ui to the developer commands section. Document the Vite dev proxy setup. |
README.md |
Update | Add podcast serve command to the CLI reference. Mention the viewer and link to SERVER_GUIDE.md. |
web/gi-kg-viewer/README.md (new) |
Create | Frontend-specific README: how to install, dev, build, test, lint. Component architecture overview. How to add new views/stores. |
docs/guides/TESTING_GUIDE.md |
Update | Add Playwright E2E section: setup, running, writing new tests, CI integration, fixture management. |
Platform Evolution Path¶
This section documents how the server and frontend architecture introduced by this RFC grows into the full platform vision described in the megasketch and tracked by #50 (simple UI + server), #347 (UI for DB output), and #46 (Docker architecture).
What This RFC Establishes (v2.6)¶
| Component | What Gets Built | Where |
|---|---|---|
| Server module | src/podcast_scraper/server/ with app factory, feature flags, viewer routes |
Python package |
| CLI command | podcast serve --output-dir |
CLI layer |
| Viewer routes | /api/health, /api/artifacts, /api/search, /api/explore, /api/index/stats |
Server routes |
| Frontend SPA | Vue 3 + Cytoscape.js viewer with search, dashboard, explore views | web/gi-kg-viewer/ |
| E2E tests | Playwright test suite for viewer regression testing | web/gi-kg-viewer/e2e/ |
What Platform Work Adds (v2.7 — #50, #347)¶
| Component | What Gets Added | How It Fits |
|---|---|---|
| Platform routes | /api/feeds (catalog CRUD), /api/episodes (browsing), /api/jobs (pipeline runs), /api/status (monitoring) |
New route files in server/routes/, mounted when enable_platform=True |
| Platform stores | feeds.ts, jobs.ts, episodes.ts in Pinia |
New stores alongside viewer stores |
| Platform views | Feed management, episode browser, job dashboard, config editor | New vue-router routes: /feeds, /episodes, /jobs, /settings |
| DB integration | Postgres reads for episodes/summaries (#347, RFC-051) | New dependency in server/dependencies.py |
podcast serve --platform |
Activates platform routes + DB connection | Flag in CLI + app factory |
What Docker/Deployment Adds (v2.7+ — #46)¶
| Component | What Gets Added | How It Fits |
|---|---|---|
| Docker Compose | postgres + api + worker + caddy services |
Megasketch Part B |
| Worker process | Same image, different command — consumes job queue |
Uses service.run() internally |
| Static build | Vite output baked into Docker image | make build-ui in Dockerfile |
Growth Path (No Architectural Rewrites)¶
v2.6 (this RFC)
├── src/podcast_scraper/server/
│ ├── app.py (enable_viewer=True)
│ └── routes/ (health, artifacts, search, explore, index_stats)
├── web/gi-kg-viewer/ (Vue 3 SPA)
└── podcast serve --output-dir
v2.7 (#50, #347)
├── src/podcast_scraper/server/
│ ├── app.py (enable_viewer=True, enable_platform=True)
│ └── routes/ (+ feeds, episodes, jobs, status)
├── web/gi-kg-viewer/
│ └── src/views/ (+ FeedsView, EpisodesView, JobsView, SettingsView)
└── podcast serve --output-dir --platform
v2.7+ (#46, megasketch)
├── docker-compose.yml (postgres, api, worker, caddy)
├── src/podcast_scraper/server/ (same module, running in container)
└── podcast serve --platform --db-url postgres://...
The key property: nothing gets restructured between v2.6 and v2.7+. The server
module, the frontend SPA, the route pattern, the Pinia stores, and the Playwright
test structure all extend additively. This is the payoff of naming it server/
instead of viewer/ and using feature flags instead of separate entry points.
Relationship to Other RFCs¶
This RFC (RFC-062) is both a consumption layer for the GIL/KG depth initiative (#466) and the seed of the platform server (#50, #347, megasketch):
RFC-049 (GIL Core) → artifacts visualized in graph
RFC-050 (GIL Use Cases) → UC4/UC5 exposed via explore/search UI
RFC-055 (KG Core) → KG artifacts visualized in graph
↓
RFC-061 (Semantic Search) → search backend consumed by viewer
↓
RFC-062 (this RFC) → server + viewer SPA + E2E tests
↓
RFC-051 (DB Projection) → structured query backend (platform reads)
↓
#50 / #347 / megasketch → platform routes + UI views on same server
↓
#46 / megasketch Part B → Docker Compose deployment
Key Distinction:
- RFC-061: Defines the search engine —
VectorStore, FAISS, embed-and-index pipeline, CLI commands - RFC-062: Defines the server layer and the search UI — FastAPI server with pluggable route groups, Vue 3 frontend, graph integration, dashboard, E2E tests. The server is the first building block of the platform; the viewer is its first UI.
Together, RFC-061 provides the retrieval engine, RFC-062 provides the server and
visual interface, and the platform work (#50, #347, megasketch) extends both with
CRUD routes, job management, and Postgres integration — all on the same
src/podcast_scraper/server/ foundation.
Benefits¶
- Surfaces semantic search visually: The most powerful new capability (RFC-061) becomes accessible through a graphical interface, not just CLI
- Proper web architecture: Vue 3 + Pinia + TypeScript replaces fragile vanilla JS with a maintainable, type-safe component model
- Single graph engine: Halves the graph code, focuses maintenance on Cytoscape.js
- Platform foundation: The
server/module is designed to grow into the full platform API (megasketch Part A) — viewer routes land first, platform routes (#50, #347) extend additively without restructuring - Graceful degradation: File-picker fallback preserves offline use; search features activate when backend + index are available
- Developer experience: Vite HMR, TypeScript, ESLint, Vitest — standard modern tooling instead of CDN script tags and browser refresh
- Regression safety: Playwright E2E test layer catches visual and functional regressions before they reach users. Test structure grows with platform views.
- One server, one CLI:
podcast serveis the single entry point for all server functionality — viewer today, platform tomorrow. No fragmented scripts.
Migration Path¶
- Phase 1 — Build v2: New
web/gi-kg-viewer/directory andsrc/podcast_scraper/server/module. v1 (web/gi-kg-viz/) remains untouched. - Phase 2 — Validate parity: Confirm all v1 capabilities work in v2. Playwright E2E tests green. Gather feedback.
- Phase 3 — Switch default:
make servepoints to v2.podcast serveis the canonical server command. Add deprecation notice to v1. - Phase 4 — Remove v1: Delete
web/gi-kg-viz/andscripts/gi_kg_viz_server.pyafter v2 is validated. - Phase 5 — Platform extension (v2.7): Add platform routes, views, and stores to the same server and frontend. No structural migration needed.
Open Questions¶
- Graph export formats: Should the viewer support exporting the graph as data (JSON, GraphML) in addition to image (PNG, SVG)? Useful for external tools. Recommendation: defer to M6 polish; image export first.
- Platform route stubs: Should the v2.7 platform route files (
feeds.py,episodes.py,jobs.py) ship as empty stubs withpasshandlers in v2.6, or should they be added only when platform work starts? Recommendation: empty stubs in v2.6 to make the growth path visible and validate the feature-flag pattern. - Frontend directory rename: When platform views are added (v2.7), should
web/gi-kg-viewer/be renamed toweb/app/to reflect the broader scope? Recommendation: defer — rename is trivial and can happen when platform views actually land. Viewer naming is accurate for v2.6.
Resolved Questions (from earlier draft):
- Vue router vs tabs — Resolved: use
vue-routerfor clean view separation, URL deep linking, and alignment with platform views that will be added as routes. - E2E test tooling — Resolved: Playwright. Official Vue recommendation, headless CI, multi-browser, lighter than Cypress. See Testing Strategy section.
- Bundle vendoring — Resolved:
npm installviapackage.json. Vite bundles everything; vendoring is a v1 concern solved by proper tooling. - Backend naming — Resolved:
src/podcast_scraper/server/, notviewer/. This is the project's server layer, not just the viewer's backend.
References¶
- Related PRD:
docs/prd/PRD-021-semantic-corpus-search.md - Related RFC:
docs/rfc/RFC-061-semantic-corpus-search.md - Related RFC:
docs/rfc/RFC-049-grounded-insight-layer-core.md - Related RFC:
docs/rfc/RFC-050-grounded-insight-layer-use-cases.md - Related RFC:
docs/rfc/RFC-055-knowledge-graph-layer-core.md - Related RFC:
docs/rfc/RFC-056-knowledge-graph-layer-use-cases.md - Related RFC:
docs/rfc/RFC-051-database-projection-gil-kg.md - Viewer v1:
web/gi-kg-viz/(current implementation from #445) - Platform Vision:
docs/architecture/PLATFORM_ARCHITECTURE_BLUEPRINT.md - Platform Issues: #50 (UI + server), #347 (UI for DB), #46 (Docker architecture)
- Source Code:
scripts/gi_kg_viz_server.py(current dev server, replaced byserver/) - Source Code:
podcast_scraper/gi/explore.py(explore/query logic) - Source Code:
podcast_scraper/service.py(one-shot pipeline execution)