Requests for Comment (RFCs)¶

Purpose¶

Requests for Comment (RFCs) define the how behind each feature implementation in podcast_scraper. They capture:

Technical design and architecture decisions
Implementation details and module boundaries
API contracts and data structures
Testing strategies and validation approaches

RFCs translate PRD requirements into concrete technical solutions and serve as living documentation for developers.

How RFCs Work¶

Reference PRDs: RFCs implement requirements defined in PRDs
Define Architecture: RFCs specify module design, interfaces, and data flow
Guide Implementation: Developers use RFCs as blueprints for code changes
Document Decisions: RFCs capture design rationale and alternatives considered

Open RFCs¶

RFC	Title	Related PRD	Description
RFC-015	AI Experiment Pipeline	PRD-007	Technical design for configuration-driven experiment pipeline (CI integration pending)
RFC-041	Podcast ML Benchmarking Framework	PRD-007	Repeatable, objective ML benchmarking system (CI integration pending)
RFC-077	Viewer feeds + operator config + jobs & hygiene	PRD-030	Draft: structured `feeds.spec.yaml` + operator YAML API, job lifecycle + stale/reconcile (#626)
RFC-081	Pre-prod environment on GitHub Codespaces (Phase 1)	—	Draft: Codespaces deploy auto-fired on Stack-test green; cloud_thin profile via GHCR-published `pipeline-llm`; Grafana Cloud + Sentry free observability; Cloudflare R2 corpus backup; Slack notifications. Always-on host deferred to a follow-up RFC.
RFC-082	Always-on pre-prod / production hosting (Phase 2 lift-and-shift from RFC-081)	—	Draft starter: picks up where RFC-081 ended. Reuses the published GHCR image set unchanged; chooses VPS host (Hetzner CX/CCX), auth wall (Cloudflare Tunnel + Access OR Tailscale), deploy mechanism (push from GHA), corpus persistence (host bind-mount), and host-side backup cron. Includes stack contract vs environment adapters (ADR-093, #762). Cost ceiling ~$10-16/mo. Open questions: operator geography, existing Cloudflare domain, scheduled-cron feed sweep.

Completed RFCs¶

RFC	Title	Related PRD	Version	Description
RFC-001	Workflow Orchestration	PRD-001	v2.0.0	Central orchestrator for transcript acquisition pipeline
RFC-002	RSS Parsing & Episode Modeling	PRD-001	v2.0.0	RSS feed parsing and episode data model
RFC-003	Transcript Download Processing	PRD-001	v2.0.0	Resilient transcript download with retry logic
RFC-004	Filesystem Layout & Run Management	PRD-001	v2.0.0	Deterministic output directory structure and run scoping
RFC-005	Whisper Integration Lifecycle	PRD-002	v2.0.0	Whisper model loading, transcription, and cleanup
RFC-006	Whisper Screenplay Formatting	PRD-002	v2.0.0	Speaker-attributed transcript formatting
RFC-007	CLI Interface & Validation	PRD-003	v2.0.0	Command-line argument parsing and validation
RFC-008	Configuration Model & Validation	PRD-003	v2.0.0	Pydantic-based configuration with file loading
RFC-009	Progress Reporting Integration	PRD-001	v2.0.0	Pluggable progress reporting interface
RFC-010	Automatic Speaker Name Detection	PRD-008	v2.1.0	NER-based host and guest identification
RFC-011	Per-Episode Metadata Generation	PRD-004	v2.2.0	Structured metadata document generation
RFC-012	Episode Summarization Using Local Transformers	PRD-005	v2.3.0	Local transformer-based summarization
RFC-013	OpenAI Provider Implementation	PRD-006	v2.4.0	OpenAI API providers for transcription, NER, and summarization
RFC-016	Modularization for AI Experiments	PRD-007	v2.4.0	Provider system architecture to support AI experiment pipeline
RFC-017	Prompt Management	PRD-006	v2.4.0	Versioned, parameterized prompt management system (Jinja2)
RFC-018	Test Structure Reorganization	-	v2.4.0	Reorganized test suite into unit/integration/e2e directories
RFC-019	E2E Test Infrastructure and Coverage Improvements	PRD-001+	v2.4.0	Comprehensive E2E test infrastructure and coverage
RFC-020	Integration Test Infrastructure and Coverage Improvements	PRD-001+	v2.4.0	Integration test suite improvements (10 stages, 182 tests)
RFC-021	Modularization Refactoring Plan	PRD-006	v2.4.0	Detailed plan for modular provider architecture
RFC-022	Environment Variable Candidates Analysis	-	v2.4.0	Environment variable support for deployment flexibility
RFC-024	Test Execution Optimization	-	v2.4.0	Optimized test execution with markers, tiers, parallel execution
RFC-025	Test Metrics and Health Tracking	-	v2.4.0	Metrics collection, CI integration, flaky test detection
RFC-026	Metrics Consumption and Dashboards	-	v2.4.0	GitHub Pages metrics JSON API and job summaries
RFC-028	ML Model Preloading and Caching	-	v2.4.0	Model preloading for local dev and GitHub Actions caching
RFC-029	Provider Refactoring Consolidation	PRD-006	v2.4.0	Unified provider architecture documentation
RFC-030	Python Test Coverage Improvements	-	v2.4.0	Coverage collection in CI, threshold enforcement
RFC-031	Code Complexity Analysis Tooling	-	v2.4.0	Radon, Vulture, Interrogate, and codespell integration
RFC-032	Anthropic Provider Implementation	PRD-009	v2.4.0	Technical design for Anthropic Claude API providers
RFC-033	Mistral Provider Implementation	PRD-010	v2.5.0	Technical design for Mistral AI providers (all 3 capabilities)
RFC-034	DeepSeek Provider Implementation	PRD-011	v2.5.0	Technical design for DeepSeek AI (ultra low-cost)
RFC-035	Gemini Provider Implementation	PRD-012	v2.5.0	Technical design for Google Gemini (2M context)
RFC-036	Grok Provider Implementation (xAI)	PRD-013	v2.5.0	Technical design for Grok (xAI's AI model)
RFC-037	Ollama Provider Implementation	PRD-014	v2.5.0	Technical design for Ollama (local/offline)
RFC-039	Development Workflow	-	v2.4.0	Git worktrees, Cursor integration, CI evolution
RFC-023	README Acceptance Tests	-	v2.5.0	Script-based acceptance (`make test-acceptance`, `MAIN_ACCEPTANCE_CONFIG.yaml` fast matrix, `scripts/acceptance/`) — not pytest `tests/acceptance/`
RFC-040	Audio Preprocessing Pipeline	-	v2.5.0	FFmpeg preprocessing, opus codec, audio caching, factory pattern
RFC-042	Hybrid Podcast Summarization Pipeline	-	v2.5.0	Hybrid MAP-REDUCE with instruction-tuned LLMs
RFC-044	Model Registry for Architecture Limits	-	v2.5.0	Centralized registry for model architecture limits
RFC-045	ML Model Optimization Guide	PRD-005, PRD-007	v2.5.0	cleaning_v4 profile, preprocessing optimization, parameter tuning guide
RFC-046	Materialization Architecture	PRD-007	v2.5.0	Dataset materialization for honest evaluation comparisons
RFC-047	Lightweight Run Comparison & Diagnostics Tool	PRD-007	v2.5.0	Streamlit-based visual tool for comparing runs
RFC-048	Evaluation ↔ Application Alignment	PRD-007	v2.5.0	Fingerprinting and single-path eval-app alignment
RFC-049	Grounded Insight Layer – Core Concepts & Data Model	PRD-017	v2.6.0	Core ontology, grounding contract, storage format for GIL
RFC-050	Grounded Insight Layer – Use Cases & End-to-End Consumption	PRD-017	v2.6.0	Single-layer GIL consumption (CLI inspect, Insight Explorer, query patterns); cross-layer use cases moved to RFC-072
RFC-052	Locally Hosted LLM Models with Prompts	PRD-014	v2.5.0	Ollama provider and optimized prompt templates
RFC-055	Knowledge Graph Layer — Core Concepts & Data Model	PRD-019	v2.6.0	KG ontology, artifacts, and separation from GIL
RFC-056	Knowledge Graph Layer — Use Cases & End-to-End Consumption	PRD-019	v2.6.0	Single-layer KG consumption (`kg` CLI, entity roll-up, export); cross-layer use cases moved to RFC-072
RFC-057	AutoResearch Optimization Loop (Prompts & ML Params)	PRD-007	v2.6.0	Closed per ADR-073; Tracks A/B complete; silver refs + 72-config eval matrix
RFC-061	Semantic Corpus Search (FAISS)	PRD-021	v2.6.0	Shipped: `FaissVectorStore`, `podcast search` / `index`, embed-and-index, semantic `gi explore`, `/api/search` (ADR-060); platform backends — RFC-070 (Draft)
RFC-062	GI/KG Viewer v2 — Semantic Search UI	PRD-017, PRD-019, PRD-021	v2.6.0	FastAPI `podcast serve`, Vue 3 + Vite + Cytoscape SPA, Playwright UI E2E (ADR-064–ADR-066); platform routes remain v2.7 per ADR-064
RFC-063	Multi-Feed Corpus, Append/Resume, and Unified Discovery	#440+	v2.6.0	N feeds, layout A, opt-in append; unified index (#505); `corpus_manifest.json` / run summary (#506); extends RFC-004; see CORPUS_MULTI_FEED_ARTIFACTS.md
RFC-064	Performance Profiling and Release Freeze Framework	-	v2.6.0	Frozen profiles under `data/profiles/`, `scripts/eval/profile/freeze_profile.py`, `diff_profiles.py`, `make profile-freeze` / `profile-diff`; guide
RFC-065	Live Pipeline Monitor (macOS Developer Tooling)	#512	v2.6.0	`--monitor`, `.pipeline_status.json`, `rich` or `.monitor.log`; optional `[monitor]` memray + py-spy; tmux split deferred; guide
RFC-066	Run Comparison Tool — Performance Tab	-	v2.6.0	Streamlit Performance page (`?page=performance`) joining run metrics with frozen RFC-064 profiles
RFC-067	Corpus Library — Catalog API & Viewer	PRD-022	v2.6.0	Filesystem-first `/api/corpus/*`, Library tab, episode detail, FAISS similar episodes, handoffs to graph and `/api/search` (Phases 1–3)
RFC-068	Corpus Digest — API & Viewer	PRD-023	v2.6.0	`GET /api/corpus/digest`, Digest tab, Library 24h glance, feed diversity, semantic topic bands; `corpus_digest_api` on `/api/health`
RFC-069	GI/KG Viewer — Graph Exploration Toolkit	PRD-024	v2.6.0	Zoom controls, % readout, Shift+drag box zoom, minimap v1, degree-bucket filter, built-in layouts, edge filters; extends RFC-062
RFC-071	Corpus Intelligence Dashboard (GI/KG Viewer)	PRD-025	v2.6.0	Dashboard tab: *`/api/corpus/` aggregates + Chart.js (Pipeline / Content intelligence); manifest + capped `run.json`** discovery; index/digest/GI-KG timelines; PRD-025
RFC-076	Progressive graph expansion (cross-episode)	#581	v2.6.0	`POST /api/corpus/node-episodes`, `onetap` rail / `dbltap` expand-collapse, bridge-only scan; extends RFC-069
RFC-084	Corpus snapshot backup manifest and version-aware restore	—	v2.6.0	`snapshot.manifest.json`, `scripts/ops/corpus_snapshot/`, backup/restore workflows + `make restore-corpus` / `restore-corpus-prod`; GitHub #763

Gap analysis¶

Counts (reconcile when moving RFCs): 84 files under docs/rfc/RFC-*.md -- IDs RFC-001--RFC-084 with no RFC-014. 3 open (in-flight, partial implementation), 60 completed, and 16 Draft (not indexed until promoted) in the tables above.

Open RFC clusters: AI experiment pipeline + ML benchmark CI (RFC-015, RFC-041).

Draft RFCs (not indexed): Pipeline metrics (RFC-027), continuous review (RFC-038), metrics alerts (RFC-043), Postgres projection (RFC-051), adaptive summarization routing (RFC-053), E2E mock composition (RFC-054), diarization and cleaning (RFC-058--RFC-060), semantic search platform (RFC-070), canonical identity layer (RFC-072), enrichment layer (RFC-073), process safety (RFC-074), ephemeral acceptance smoke test (RFC-078), full-stack Docker Compose (RFC-079; optional doc polish: RFC-079 §Optional follow-ups), prod failover orchestration and cutover (RFC-083). These are discoverable by filename under docs/rfc/ but excluded from the index per the index inclusion rule (Draft docs are not indexed).

Open RFCs (detail)¶

RFC	Theme	Notes
RFC-015	Experiments	Runner implemented; CI auto-run still pending
RFC-041	Benchmarks	Datasets/scripts exist; automated CI benchmarking not fully wired
RFC-077	Viewer feeds + operator config + `serve` jobs & hygiene	PRD-030
RFC-078	Ephemeral full-stack stack-test (CI + gates)	Implemented (Phase 1): `compose/docker-compose.stack-test.yml`, `make stack-test-`, Playwright `tests/stack-test/`, `.github/workflows/stack-test.yml`; stack base RFC-079 / #659; follow-ups (`workflow_run`, merge policy, BuildKit cache) tracked via GitHub issues*
RFC-079	Full-stack Compose (Nginx + API + pipeline)	Implemented: `compose/docker-compose.stack.yml`, `stack-*`, #659 Phase 1 + #660 Docker job factory (Option B); §Native vs Docker

Recently completed (v2.6.0+)¶

RFC	Delivered (high level)
RFC-050	Single-layer GIL consumption; cross-layer → RFC-072
RFC-056	Single-layer KG consumption; cross-layer → RFC-072
RFC-057	Closed per ADR-073
RFC-061	FAISS path, CLI + API + semantic `gi explore`
RFC-062	Server + Vue SPA + Playwright (ADR-064–ADR-066)
RFC-063	Multi-feed layout, manifest (ADR-074)
RFC-064	Frozen profiles, freeze/diff scripts (ADR-075)
RFC-065	`--monitor`, `.pipeline_status.json`, optional `[monitor]`
RFC-066	Streamlit Performance vs frozen profiles (ADR-076)
RFC-067	`/api/corpus/*`, Library tab, similar episodes
RFC-068	Digest API + tab, Library glance
RFC-069	Graph exploration toolkit
RFC-071	Dashboard tab, corpus intelligence panels
RFC-076	Progressive graph expansion (`/api/corpus/node-episodes`, graph `onetap`/`dbltap`)

Older draft RFC audit tables (pre-2026-04) are archeology — trust this index and each RFC’s Status block.

Recommendations¶

Status changes — Edit RFC body + this index together.
Large deliveries without new ADRs — Often RFC + guides + API docs; see ADR gap analysis for when an ADR is still worth extracting.
Decision vs code — Use Open / Completed here plus docs/adr/index.md Code column.

Maintenance: Edit each RFC Status line when you move its row between Open and Completed. Product gaps: PRD gap analysis. Decision records: ADR gap analysis.

Quick Links¶

PRDs - Product requirements documents
Architecture - System design and module responsibilities
Releases - Release notes and version history

Creating New RFCs¶

Use the RFC Template as a starting point for new technical design documents.

Status vocabulary: Use Draft while in flight and Completed when shipped (optionally with version or caveats in the same line). Do not use Accepted for RFCs — that label is for ADRs only.