Requests for Comment (RFCs)¶
Purpose¶
Requests for Comment (RFCs) define the how behind each feature implementation in podcast_scraper. They capture:
- Technical design and architecture decisions
- Implementation details and module boundaries
- API contracts and data structures
- Testing strategies and validation approaches
RFCs translate PRD requirements into concrete technical solutions and serve as living documentation for developers.
How RFCs Work¶
- Reference PRDs: RFCs implement requirements defined in PRDs
- Define Architecture: RFCs specify module design, interfaces, and data flow
- Guide Implementation: Developers use RFCs as blueprints for code changes
- Document Decisions: RFCs capture design rationale and alternatives considered
Open RFCs¶
| RFC | Title | Related PRD | Description |
|---|---|---|---|
| RFC-015 | AI Experiment Pipeline | PRD-007 | Technical design for configuration-driven experiment pipeline (CI integration pending) |
| RFC-041 | Podcast ML Benchmarking Framework | PRD-007 | Repeatable, objective ML benchmarking system (CI integration pending) |
| RFC-077 | Viewer feeds + operator config + jobs & hygiene | PRD-030 | Draft: structured feeds.spec.yaml + operator YAML API, job lifecycle + stale/reconcile (#626) |
| RFC-081 | Pre-prod environment on GitHub Codespaces (Phase 1) | — | Draft: Codespaces deploy auto-fired on Stack-test green; cloud_thin profile via GHCR-published pipeline-llm; Grafana Cloud + Sentry free observability; Cloudflare R2 corpus backup; Slack notifications. Always-on host deferred to a follow-up RFC. |
| RFC-082 | Always-on pre-prod / production hosting (Phase 2 lift-and-shift from RFC-081) | — | Draft starter: picks up where RFC-081 ended. Reuses the published GHCR image set unchanged; chooses VPS host (Hetzner CX/CCX), auth wall (Cloudflare Tunnel + Access OR Tailscale), deploy mechanism (push from GHA), corpus persistence (host bind-mount), and host-side backup cron. Includes stack contract vs environment adapters (ADR-093, #762). Cost ceiling ~$10-16/mo. Open questions: operator geography, existing Cloudflare domain, scheduled-cron feed sweep. |
Completed RFCs¶
| RFC | Title | Related PRD | Version | Description |
|---|---|---|---|---|
| RFC-001 | Workflow Orchestration | PRD-001 | v2.0.0 | Central orchestrator for transcript acquisition pipeline |
| RFC-002 | RSS Parsing & Episode Modeling | PRD-001 | v2.0.0 | RSS feed parsing and episode data model |
| RFC-003 | Transcript Download Processing | PRD-001 | v2.0.0 | Resilient transcript download with retry logic |
| RFC-004 | Filesystem Layout & Run Management | PRD-001 | v2.0.0 | Deterministic output directory structure and run scoping |
| RFC-005 | Whisper Integration Lifecycle | PRD-002 | v2.0.0 | Whisper model loading, transcription, and cleanup |
| RFC-006 | Whisper Screenplay Formatting | PRD-002 | v2.0.0 | Speaker-attributed transcript formatting |
| RFC-007 | CLI Interface & Validation | PRD-003 | v2.0.0 | Command-line argument parsing and validation |
| RFC-008 | Configuration Model & Validation | PRD-003 | v2.0.0 | Pydantic-based configuration with file loading |
| RFC-009 | Progress Reporting Integration | PRD-001 | v2.0.0 | Pluggable progress reporting interface |
| RFC-010 | Automatic Speaker Name Detection | PRD-008 | v2.1.0 | NER-based host and guest identification |
| RFC-011 | Per-Episode Metadata Generation | PRD-004 | v2.2.0 | Structured metadata document generation |
| RFC-012 | Episode Summarization Using Local Transformers | PRD-005 | v2.3.0 | Local transformer-based summarization |
| RFC-013 | OpenAI Provider Implementation | PRD-006 | v2.4.0 | OpenAI API providers for transcription, NER, and summarization |
| RFC-016 | Modularization for AI Experiments | PRD-007 | v2.4.0 | Provider system architecture to support AI experiment pipeline |
| RFC-017 | Prompt Management | PRD-006 | v2.4.0 | Versioned, parameterized prompt management system (Jinja2) |
| RFC-018 | Test Structure Reorganization | - | v2.4.0 | Reorganized test suite into unit/integration/e2e directories |
| RFC-019 | E2E Test Infrastructure and Coverage Improvements | PRD-001+ | v2.4.0 | Comprehensive E2E test infrastructure and coverage |
| RFC-020 | Integration Test Infrastructure and Coverage Improvements | PRD-001+ | v2.4.0 | Integration test suite improvements (10 stages, 182 tests) |
| RFC-021 | Modularization Refactoring Plan | PRD-006 | v2.4.0 | Detailed plan for modular provider architecture |
| RFC-022 | Environment Variable Candidates Analysis | - | v2.4.0 | Environment variable support for deployment flexibility |
| RFC-024 | Test Execution Optimization | - | v2.4.0 | Optimized test execution with markers, tiers, parallel execution |
| RFC-025 | Test Metrics and Health Tracking | - | v2.4.0 | Metrics collection, CI integration, flaky test detection |
| RFC-026 | Metrics Consumption and Dashboards | - | v2.4.0 | GitHub Pages metrics JSON API and job summaries |
| RFC-028 | ML Model Preloading and Caching | - | v2.4.0 | Model preloading for local dev and GitHub Actions caching |
| RFC-029 | Provider Refactoring Consolidation | PRD-006 | v2.4.0 | Unified provider architecture documentation |
| RFC-030 | Python Test Coverage Improvements | - | v2.4.0 | Coverage collection in CI, threshold enforcement |
| RFC-031 | Code Complexity Analysis Tooling | - | v2.4.0 | Radon, Vulture, Interrogate, and codespell integration |
| RFC-032 | Anthropic Provider Implementation | PRD-009 | v2.4.0 | Technical design for Anthropic Claude API providers |
| RFC-033 | Mistral Provider Implementation | PRD-010 | v2.5.0 | Technical design for Mistral AI providers (all 3 capabilities) |
| RFC-034 | DeepSeek Provider Implementation | PRD-011 | v2.5.0 | Technical design for DeepSeek AI (ultra low-cost) |
| RFC-035 | Gemini Provider Implementation | PRD-012 | v2.5.0 | Technical design for Google Gemini (2M context) |
| RFC-036 | Grok Provider Implementation (xAI) | PRD-013 | v2.5.0 | Technical design for Grok (xAI's AI model) |
| RFC-037 | Ollama Provider Implementation | PRD-014 | v2.5.0 | Technical design for Ollama (local/offline) |
| RFC-039 | Development Workflow | - | v2.4.0 | Git worktrees, Cursor integration, CI evolution |
| RFC-023 | README Acceptance Tests | - | v2.5.0 | Script-based acceptance (make test-acceptance, MAIN_ACCEPTANCE_CONFIG.yaml fast matrix, scripts/acceptance/) — not pytest tests/acceptance/ |
| RFC-040 | Audio Preprocessing Pipeline | - | v2.5.0 | FFmpeg preprocessing, opus codec, audio caching, factory pattern |
| RFC-042 | Hybrid Podcast Summarization Pipeline | - | v2.5.0 | Hybrid MAP-REDUCE with instruction-tuned LLMs |
| RFC-044 | Model Registry for Architecture Limits | - | v2.5.0 | Centralized registry for model architecture limits |
| RFC-045 | ML Model Optimization Guide | PRD-005, PRD-007 | v2.5.0 | cleaning_v4 profile, preprocessing optimization, parameter tuning guide |
| RFC-046 | Materialization Architecture | PRD-007 | v2.5.0 | Dataset materialization for honest evaluation comparisons |
| RFC-047 | Lightweight Run Comparison & Diagnostics Tool | PRD-007 | v2.5.0 | Streamlit-based visual tool for comparing runs |
| RFC-048 | Evaluation ↔ Application Alignment | PRD-007 | v2.5.0 | Fingerprinting and single-path eval-app alignment |
| RFC-049 | Grounded Insight Layer – Core Concepts & Data Model | PRD-017 | v2.6.0 | Core ontology, grounding contract, storage format for GIL |
| RFC-050 | Grounded Insight Layer – Use Cases & End-to-End Consumption | PRD-017 | v2.6.0 | Single-layer GIL consumption (CLI inspect, Insight Explorer, query patterns); cross-layer use cases moved to RFC-072 |
| RFC-052 | Locally Hosted LLM Models with Prompts | PRD-014 | v2.5.0 | Ollama provider and optimized prompt templates |
| RFC-055 | Knowledge Graph Layer — Core Concepts & Data Model | PRD-019 | v2.6.0 | KG ontology, artifacts, and separation from GIL |
| RFC-056 | Knowledge Graph Layer — Use Cases & End-to-End Consumption | PRD-019 | v2.6.0 | Single-layer KG consumption (kg CLI, entity roll-up, export); cross-layer use cases moved to RFC-072 |
| RFC-057 | AutoResearch Optimization Loop (Prompts & ML Params) | PRD-007 | v2.6.0 | Closed per ADR-073; Tracks A/B complete; silver refs + 72-config eval matrix |
| RFC-061 | Semantic Corpus Search (FAISS) | PRD-021 | v2.6.0 | Shipped: FaissVectorStore, podcast search / index, embed-and-index, semantic gi explore, /api/search (ADR-060); platform backends — RFC-070 (Draft) |
| RFC-062 | GI/KG Viewer v2 — Semantic Search UI | PRD-017, PRD-019, PRD-021 | v2.6.0 | FastAPI podcast serve, Vue 3 + Vite + Cytoscape SPA, Playwright UI E2E (ADR-064–ADR-066); platform routes remain v2.7 per ADR-064 |
| RFC-063 | Multi-Feed Corpus, Append/Resume, and Unified Discovery | #440+ | v2.6.0 | N feeds, layout A, opt-in append; unified index (#505); corpus_manifest.json / run summary (#506); extends RFC-004; see CORPUS_MULTI_FEED_ARTIFACTS.md |
| RFC-064 | Performance Profiling and Release Freeze Framework | - | v2.6.0 | Frozen profiles under data/profiles/, scripts/eval/profile/freeze_profile.py, diff_profiles.py, make profile-freeze / profile-diff; guide |
| RFC-065 | Live Pipeline Monitor (macOS Developer Tooling) | #512 | v2.6.0 | --monitor, .pipeline_status.json, rich or .monitor.log; optional [monitor] memray + py-spy; tmux split deferred; guide |
| RFC-066 | Run Comparison Tool — Performance Tab | - | v2.6.0 | Streamlit Performance page (?page=performance) joining run metrics with frozen RFC-064 profiles |
| RFC-067 | Corpus Library — Catalog API & Viewer | PRD-022 | v2.6.0 | Filesystem-first /api/corpus/*, Library tab, episode detail, FAISS similar episodes, handoffs to graph and /api/search (Phases 1–3) |
| RFC-068 | Corpus Digest — API & Viewer | PRD-023 | v2.6.0 | GET /api/corpus/digest, Digest tab, Library 24h glance, feed diversity, semantic topic bands; corpus_digest_api on /api/health |
| RFC-069 | GI/KG Viewer — Graph Exploration Toolkit | PRD-024 | v2.6.0 | Zoom controls, % readout, Shift+drag box zoom, minimap v1, degree-bucket filter, built-in layouts, edge filters; extends RFC-062 |
| RFC-071 | Corpus Intelligence Dashboard (GI/KG Viewer) | PRD-025 | v2.6.0 | Dashboard tab: /api/corpus/* aggregates + Chart.js (Pipeline / Content intelligence); manifest + capped run.json discovery; index/digest/GI-KG timelines; PRD-025 |
| RFC-076 | Progressive graph expansion (cross-episode) | #581 | v2.6.0 | POST /api/corpus/node-episodes, onetap rail / dbltap expand-collapse, bridge-only scan; extends RFC-069 |
| RFC-084 | Corpus snapshot backup manifest and version-aware restore | — | v2.6.0 | snapshot.manifest.json, scripts/ops/corpus_snapshot/, backup/restore workflows + make restore-corpus / restore-corpus-prod; GitHub #763 |
Gap analysis¶
Counts (reconcile when moving RFCs): 84 files under docs/rfc/RFC-*.md -- IDs RFC-001--RFC-084
with no RFC-014. 3 open (in-flight, partial implementation), 60 completed, and 16 Draft
(not indexed until promoted) in the tables above.
Open RFC clusters: AI experiment pipeline + ML benchmark CI (RFC-015, RFC-041).
Draft RFCs (not indexed): Pipeline metrics (RFC-027), continuous review (RFC-038),
metrics alerts (RFC-043), Postgres projection (RFC-051), adaptive summarization routing
(RFC-053), E2E mock composition (RFC-054), diarization and cleaning
(RFC-058--RFC-060), semantic search platform (RFC-070), canonical identity layer
(RFC-072), enrichment layer (RFC-073), process safety (RFC-074),
ephemeral acceptance smoke test (RFC-078), full-stack Docker Compose (RFC-079;
optional doc polish: RFC-079 §Optional follow-ups),
prod failover orchestration and cutover (RFC-083).
These are discoverable by filename under docs/rfc/ but excluded from the index per the
index inclusion rule (Draft docs are not indexed).
Open RFCs (detail)¶
| RFC | Theme | Notes |
|---|---|---|
| RFC-015 | Experiments | Runner implemented; CI auto-run still pending |
| RFC-041 | Benchmarks | Datasets/scripts exist; automated CI benchmarking not fully wired |
| RFC-077 | Viewer feeds + operator config + serve jobs & hygiene |
PRD-030 |
| RFC-078 | Ephemeral full-stack stack-test (CI + gates) | Implemented (Phase 1): compose/docker-compose.stack-test.yml, make stack-test-*, Playwright tests/stack-test/, .github/workflows/stack-test.yml; stack base RFC-079 / #659; follow-ups (workflow_run, merge policy, BuildKit cache) tracked via GitHub issues |
| RFC-079 | Full-stack Compose (Nginx + API + pipeline) | Implemented: compose/docker-compose.stack.yml, stack-*, #659 Phase 1 + #660 Docker job factory (Option B); §Native vs Docker |
Recently completed (v2.6.0+)¶
| RFC | Delivered (high level) |
|---|---|
| RFC-050 | Single-layer GIL consumption; cross-layer → RFC-072 |
| RFC-056 | Single-layer KG consumption; cross-layer → RFC-072 |
| RFC-057 | Closed per ADR-073 |
| RFC-061 | FAISS path, CLI + API + semantic gi explore |
| RFC-062 | Server + Vue SPA + Playwright (ADR-064–ADR-066) |
| RFC-063 | Multi-feed layout, manifest (ADR-074) |
| RFC-064 | Frozen profiles, freeze/diff scripts (ADR-075) |
| RFC-065 | --monitor, .pipeline_status.json, optional [monitor] |
| RFC-066 | Streamlit Performance vs frozen profiles (ADR-076) |
| RFC-067 | /api/corpus/*, Library tab, similar episodes |
| RFC-068 | Digest API + tab, Library glance |
| RFC-069 | Graph exploration toolkit |
| RFC-071 | Dashboard tab, corpus intelligence panels |
| RFC-076 | Progressive graph expansion (/api/corpus/node-episodes, graph onetap/dbltap) |
Older draft RFC audit tables (pre-2026-04) are archeology — trust this index and each RFC’s Status block.
Recommendations¶
- Status changes — Edit RFC body + this index together.
- Large deliveries without new ADRs — Often RFC + guides + API docs; see ADR gap analysis for when an ADR is still worth extracting.
- Decision vs code — Use Open / Completed here plus
docs/adr/index.mdCode column.
Maintenance: Edit each RFC Status line when you move its row between Open and Completed.
Product gaps: PRD gap analysis. Decision records: ADR gap analysis.
Quick Links¶
- PRDs - Product requirements documents
- Architecture - System design and module responsibilities
- Releases - Release notes and version history
Creating New RFCs¶
Use the RFC Template as a starting point for new technical design documents.
Status vocabulary: Use Draft while in flight and Completed when shipped (optionally with version or caveats in the same line). Do not use Accepted for RFCs — that label is for ADRs only.