Skip to content

Requests for Comment (RFCs)

Purpose

Requests for Comment (RFCs) define the how behind each feature implementation in podcast_scraper. They capture:

  • Technical design and architecture decisions
  • Implementation details and module boundaries
  • API contracts and data structures
  • Testing strategies and validation approaches

RFCs translate PRD requirements into concrete technical solutions and serve as living documentation for developers.

How RFCs Work

  1. Reference PRDs: RFCs implement requirements defined in PRDs
  2. Define Architecture: RFCs specify module design, interfaces, and data flow
  3. Guide Implementation: Developers use RFCs as blueprints for code changes
  4. Document Decisions: RFCs capture design rationale and alternatives considered

Open RFCs

RFC Title Related PRD Description
RFC-015 AI Experiment Pipeline PRD-007 Technical design for configuration-driven experiment pipeline (CI integration pending)
RFC-041 Podcast ML Benchmarking Framework PRD-007 Repeatable, objective ML benchmarking system (CI integration pending)
RFC-077 Viewer feeds + operator config + jobs & hygiene PRD-030 Draft: structured feeds.spec.yaml + operator YAML API, job lifecycle + stale/reconcile (#626)
RFC-081 Pre-prod environment on GitHub Codespaces (Phase 1) Draft: Codespaces deploy auto-fired on Stack-test green; cloud_thin profile via GHCR-published pipeline-llm; Grafana Cloud + Sentry free observability; Cloudflare R2 corpus backup; Slack notifications. Always-on host deferred to a follow-up RFC.
RFC-082 Always-on pre-prod / production hosting (Phase 2 lift-and-shift from RFC-081) Draft starter: picks up where RFC-081 ended. Reuses the published GHCR image set unchanged; chooses VPS host (Hetzner CX/CCX), auth wall (Cloudflare Tunnel + Access OR Tailscale), deploy mechanism (push from GHA), corpus persistence (host bind-mount), and host-side backup cron. Includes stack contract vs environment adapters (ADR-093, #762). Cost ceiling ~$10-16/mo. Open questions: operator geography, existing Cloudflare domain, scheduled-cron feed sweep.

Completed RFCs

RFC Title Related PRD Version Description
RFC-001 Workflow Orchestration PRD-001 v2.0.0 Central orchestrator for transcript acquisition pipeline
RFC-002 RSS Parsing & Episode Modeling PRD-001 v2.0.0 RSS feed parsing and episode data model
RFC-003 Transcript Download Processing PRD-001 v2.0.0 Resilient transcript download with retry logic
RFC-004 Filesystem Layout & Run Management PRD-001 v2.0.0 Deterministic output directory structure and run scoping
RFC-005 Whisper Integration Lifecycle PRD-002 v2.0.0 Whisper model loading, transcription, and cleanup
RFC-006 Whisper Screenplay Formatting PRD-002 v2.0.0 Speaker-attributed transcript formatting
RFC-007 CLI Interface & Validation PRD-003 v2.0.0 Command-line argument parsing and validation
RFC-008 Configuration Model & Validation PRD-003 v2.0.0 Pydantic-based configuration with file loading
RFC-009 Progress Reporting Integration PRD-001 v2.0.0 Pluggable progress reporting interface
RFC-010 Automatic Speaker Name Detection PRD-008 v2.1.0 NER-based host and guest identification
RFC-011 Per-Episode Metadata Generation PRD-004 v2.2.0 Structured metadata document generation
RFC-012 Episode Summarization Using Local Transformers PRD-005 v2.3.0 Local transformer-based summarization
RFC-013 OpenAI Provider Implementation PRD-006 v2.4.0 OpenAI API providers for transcription, NER, and summarization
RFC-016 Modularization for AI Experiments PRD-007 v2.4.0 Provider system architecture to support AI experiment pipeline
RFC-017 Prompt Management PRD-006 v2.4.0 Versioned, parameterized prompt management system (Jinja2)
RFC-018 Test Structure Reorganization - v2.4.0 Reorganized test suite into unit/integration/e2e directories
RFC-019 E2E Test Infrastructure and Coverage Improvements PRD-001+ v2.4.0 Comprehensive E2E test infrastructure and coverage
RFC-020 Integration Test Infrastructure and Coverage Improvements PRD-001+ v2.4.0 Integration test suite improvements (10 stages, 182 tests)
RFC-021 Modularization Refactoring Plan PRD-006 v2.4.0 Detailed plan for modular provider architecture
RFC-022 Environment Variable Candidates Analysis - v2.4.0 Environment variable support for deployment flexibility
RFC-024 Test Execution Optimization - v2.4.0 Optimized test execution with markers, tiers, parallel execution
RFC-025 Test Metrics and Health Tracking - v2.4.0 Metrics collection, CI integration, flaky test detection
RFC-026 Metrics Consumption and Dashboards - v2.4.0 GitHub Pages metrics JSON API and job summaries
RFC-028 ML Model Preloading and Caching - v2.4.0 Model preloading for local dev and GitHub Actions caching
RFC-029 Provider Refactoring Consolidation PRD-006 v2.4.0 Unified provider architecture documentation
RFC-030 Python Test Coverage Improvements - v2.4.0 Coverage collection in CI, threshold enforcement
RFC-031 Code Complexity Analysis Tooling - v2.4.0 Radon, Vulture, Interrogate, and codespell integration
RFC-032 Anthropic Provider Implementation PRD-009 v2.4.0 Technical design for Anthropic Claude API providers
RFC-033 Mistral Provider Implementation PRD-010 v2.5.0 Technical design for Mistral AI providers (all 3 capabilities)
RFC-034 DeepSeek Provider Implementation PRD-011 v2.5.0 Technical design for DeepSeek AI (ultra low-cost)
RFC-035 Gemini Provider Implementation PRD-012 v2.5.0 Technical design for Google Gemini (2M context)
RFC-036 Grok Provider Implementation (xAI) PRD-013 v2.5.0 Technical design for Grok (xAI's AI model)
RFC-037 Ollama Provider Implementation PRD-014 v2.5.0 Technical design for Ollama (local/offline)
RFC-039 Development Workflow - v2.4.0 Git worktrees, Cursor integration, CI evolution
RFC-023 README Acceptance Tests - v2.5.0 Script-based acceptance (make test-acceptance, MAIN_ACCEPTANCE_CONFIG.yaml fast matrix, scripts/acceptance/) — not pytest tests/acceptance/
RFC-040 Audio Preprocessing Pipeline - v2.5.0 FFmpeg preprocessing, opus codec, audio caching, factory pattern
RFC-042 Hybrid Podcast Summarization Pipeline - v2.5.0 Hybrid MAP-REDUCE with instruction-tuned LLMs
RFC-044 Model Registry for Architecture Limits - v2.5.0 Centralized registry for model architecture limits
RFC-045 ML Model Optimization Guide PRD-005, PRD-007 v2.5.0 cleaning_v4 profile, preprocessing optimization, parameter tuning guide
RFC-046 Materialization Architecture PRD-007 v2.5.0 Dataset materialization for honest evaluation comparisons
RFC-047 Lightweight Run Comparison & Diagnostics Tool PRD-007 v2.5.0 Streamlit-based visual tool for comparing runs
RFC-048 Evaluation ↔ Application Alignment PRD-007 v2.5.0 Fingerprinting and single-path eval-app alignment
RFC-049 Grounded Insight Layer – Core Concepts & Data Model PRD-017 v2.6.0 Core ontology, grounding contract, storage format for GIL
RFC-050 Grounded Insight Layer – Use Cases & End-to-End Consumption PRD-017 v2.6.0 Single-layer GIL consumption (CLI inspect, Insight Explorer, query patterns); cross-layer use cases moved to RFC-072
RFC-052 Locally Hosted LLM Models with Prompts PRD-014 v2.5.0 Ollama provider and optimized prompt templates
RFC-055 Knowledge Graph Layer — Core Concepts & Data Model PRD-019 v2.6.0 KG ontology, artifacts, and separation from GIL
RFC-056 Knowledge Graph Layer — Use Cases & End-to-End Consumption PRD-019 v2.6.0 Single-layer KG consumption (kg CLI, entity roll-up, export); cross-layer use cases moved to RFC-072
RFC-057 AutoResearch Optimization Loop (Prompts & ML Params) PRD-007 v2.6.0 Closed per ADR-073; Tracks A/B complete; silver refs + 72-config eval matrix
RFC-061 Semantic Corpus Search (FAISS) PRD-021 v2.6.0 Shipped: FaissVectorStore, podcast search / index, embed-and-index, semantic gi explore, /api/search (ADR-060); platform backends — RFC-070 (Draft)
RFC-062 GI/KG Viewer v2 — Semantic Search UI PRD-017, PRD-019, PRD-021 v2.6.0 FastAPI podcast serve, Vue 3 + Vite + Cytoscape SPA, Playwright UI E2E (ADR-064ADR-066); platform routes remain v2.7 per ADR-064
RFC-063 Multi-Feed Corpus, Append/Resume, and Unified Discovery #440+ v2.6.0 N feeds, layout A, opt-in append; unified index (#505); corpus_manifest.json / run summary (#506); extends RFC-004; see CORPUS_MULTI_FEED_ARTIFACTS.md
RFC-064 Performance Profiling and Release Freeze Framework - v2.6.0 Frozen profiles under data/profiles/, scripts/eval/profile/freeze_profile.py, diff_profiles.py, make profile-freeze / profile-diff; guide
RFC-065 Live Pipeline Monitor (macOS Developer Tooling) #512 v2.6.0 --monitor, .pipeline_status.json, rich or .monitor.log; optional [monitor] memray + py-spy; tmux split deferred; guide
RFC-066 Run Comparison Tool — Performance Tab - v2.6.0 Streamlit Performance page (?page=performance) joining run metrics with frozen RFC-064 profiles
RFC-067 Corpus Library — Catalog API & Viewer PRD-022 v2.6.0 Filesystem-first /api/corpus/*, Library tab, episode detail, FAISS similar episodes, handoffs to graph and /api/search (Phases 1–3)
RFC-068 Corpus Digest — API & Viewer PRD-023 v2.6.0 GET /api/corpus/digest, Digest tab, Library 24h glance, feed diversity, semantic topic bands; corpus_digest_api on /api/health
RFC-069 GI/KG Viewer — Graph Exploration Toolkit PRD-024 v2.6.0 Zoom controls, % readout, Shift+drag box zoom, minimap v1, degree-bucket filter, built-in layouts, edge filters; extends RFC-062
RFC-071 Corpus Intelligence Dashboard (GI/KG Viewer) PRD-025 v2.6.0 Dashboard tab: /api/corpus/* aggregates + Chart.js (Pipeline / Content intelligence); manifest + capped run.json discovery; index/digest/GI-KG timelines; PRD-025
RFC-076 Progressive graph expansion (cross-episode) #581 v2.6.0 POST /api/corpus/node-episodes, onetap rail / dbltap expand-collapse, bridge-only scan; extends RFC-069
RFC-084 Corpus snapshot backup manifest and version-aware restore v2.6.0 snapshot.manifest.json, scripts/ops/corpus_snapshot/, backup/restore workflows + make restore-corpus / restore-corpus-prod; GitHub #763

Gap analysis

Counts (reconcile when moving RFCs): 84 files under docs/rfc/RFC-*.md -- IDs RFC-001--RFC-084 with no RFC-014. 3 open (in-flight, partial implementation), 60 completed, and 16 Draft (not indexed until promoted) in the tables above.

Open RFC clusters: AI experiment pipeline + ML benchmark CI (RFC-015, RFC-041).

Draft RFCs (not indexed): Pipeline metrics (RFC-027), continuous review (RFC-038), metrics alerts (RFC-043), Postgres projection (RFC-051), adaptive summarization routing (RFC-053), E2E mock composition (RFC-054), diarization and cleaning (RFC-058--RFC-060), semantic search platform (RFC-070), canonical identity layer (RFC-072), enrichment layer (RFC-073), process safety (RFC-074), ephemeral acceptance smoke test (RFC-078), full-stack Docker Compose (RFC-079; optional doc polish: RFC-079 §Optional follow-ups), prod failover orchestration and cutover (RFC-083). These are discoverable by filename under docs/rfc/ but excluded from the index per the index inclusion rule (Draft docs are not indexed).

Open RFCs (detail)

RFC Theme Notes
RFC-015 Experiments Runner implemented; CI auto-run still pending
RFC-041 Benchmarks Datasets/scripts exist; automated CI benchmarking not fully wired
RFC-077 Viewer feeds + operator config + serve jobs & hygiene PRD-030
RFC-078 Ephemeral full-stack stack-test (CI + gates) Implemented (Phase 1): compose/docker-compose.stack-test.yml, make stack-test-*, Playwright tests/stack-test/, .github/workflows/stack-test.yml; stack base RFC-079 / #659; follow-ups (workflow_run, merge policy, BuildKit cache) tracked via GitHub issues
RFC-079 Full-stack Compose (Nginx + API + pipeline) Implemented: compose/docker-compose.stack.yml, stack-*, #659 Phase 1 + #660 Docker job factory (Option B); §Native vs Docker

Recently completed (v2.6.0+)

RFC Delivered (high level)
RFC-050 Single-layer GIL consumption; cross-layer → RFC-072
RFC-056 Single-layer KG consumption; cross-layer → RFC-072
RFC-057 Closed per ADR-073
RFC-061 FAISS path, CLI + API + semantic gi explore
RFC-062 Server + Vue SPA + Playwright (ADR-064ADR-066)
RFC-063 Multi-feed layout, manifest (ADR-074)
RFC-064 Frozen profiles, freeze/diff scripts (ADR-075)
RFC-065 --monitor, .pipeline_status.json, optional [monitor]
RFC-066 Streamlit Performance vs frozen profiles (ADR-076)
RFC-067 /api/corpus/*, Library tab, similar episodes
RFC-068 Digest API + tab, Library glance
RFC-069 Graph exploration toolkit
RFC-071 Dashboard tab, corpus intelligence panels
RFC-076 Progressive graph expansion (/api/corpus/node-episodes, graph onetap/dbltap)

Older draft RFC audit tables (pre-2026-04) are archeology — trust this index and each RFC’s Status block.

Recommendations

  1. Status changes — Edit RFC body + this index together.
  2. Large deliveries without new ADRs — Often RFC + guides + API docs; see ADR gap analysis for when an ADR is still worth extracting.
  3. Decision vs code — Use Open / Completed here plus docs/adr/index.md Code column.

Maintenance: Edit each RFC Status line when you move its row between Open and Completed. Product gaps: PRD gap analysis. Decision records: ADR gap analysis.

  • PRDs - Product requirements documents
  • Architecture - System design and module responsibilities
  • Releases - Release notes and version history

Creating New RFCs

Use the RFC Template as a starting point for new technical design documents.

Status vocabulary: Use Draft while in flight and Completed when shipped (optionally with version or caveats in the same line). Do not use Accepted for RFCs — that label is for ADRs only.