Requests for Comment (RFCs)¶

Purpose¶

Requests for Comment (RFCs) define the how behind each feature implementation in podcast_scraper. They capture:

Technical design and architecture decisions
Implementation details and module boundaries
API contracts and data structures
Testing strategies and validation approaches

RFCs translate PRD requirements into concrete technical solutions and serve as living documentation for developers.

How RFCs Work¶

Reference PRDs: RFCs implement requirements defined in PRDs
Define Architecture: RFCs specify module design, interfaces, and data flow
Guide Implementation: Developers use RFCs as blueprints for code changes
Document Decisions: RFCs capture design rationale and alternatives considered

Open RFCs¶

RFC	Title	Related PRD	Description
RFC-015	AI Experiment Pipeline	PRD-007	Technical design for configuration-driven experiment pipeline (CI integration pending)
RFC-027	Pipeline Metrics Improvements	-	Improvements to pipeline metrics collection and reporting
RFC-038	Continuous Review Tooling	#45	Dependabot, pydeps, pre-release checklist
RFC-041	Podcast ML Benchmarking Framework	PRD-007	Repeatable, objective ML benchmarking system (CI integration pending)
RFC-043	Automated Metrics Alerts	-	Automated regression alerts and PR comments for pipeline metrics
RFC-050	Grounded Insight Layer – Use Cases & End-to-End Consumption	PRD-017	Use cases, Insight Explorer, query patterns with insights + quotes
RFC-051	Database Projection (GIL & Knowledge Graph)	PRD-018	Relational export for GIL (`gi.json`) and KG (RFC-055) artifacts
RFC-053	Adaptive Summarization Routing Based on Episode Profiling	PRD-005	Episode profiling; routes summarization, GIL (RFC-049), and KG (RFC-055) strategies
RFC-054	Flexible E2E Mock Response Strategy	#135, #399, #401	Flexible strategy for E2E mock responses supporting normal and advanced error handling scenarios
RFC-056	Knowledge Graph Layer — Use Cases & End-to-End Consumption	PRD-019	KG query patterns, export, `kg` CLI expectations, optional DB consumption
RFC-057	AutoResearch Optimization Loop (Prompts & ML Params)	-	Agent-driven ratchet loop; immutable eval harness; aligns with RFC-017 prompts and `evaluation/`
RFC-058	Audio-Based Speaker Diarization	PRD-020	pyannote.audio integration for neural speaker diarization, replacing gap-based rotation
RFC-059	Speaker Detection Refactor & Test Audio Improvements	PRD-020	Modularize speaker detection, unique test voices, commercial segments
RFC-060	Multi-Signal Commercial Detection & Cleaning	PRD-020	Expanded patterns + positional heuristics (Phase 1, all providers); diarization-enhanced (Phase 2, future)
RFC-061	Semantic Corpus Search	PRD-021	Vector index over GIL/KG/summary/transcript content; FAISS (Phase 1) + Qdrant (Phase 2); `podcast search` CLI
RFC-062	GI/KG Viewer v2 — Semantic Search UI	PRD-021	Vue 3 + Cytoscape.js + FastAPI rebuild of viewer; semantic search panel, index dashboard, explore/QA integration

Completed RFCs¶

RFC	Title	Related PRD	Version	Description
RFC-001	Workflow Orchestration	PRD-001	v2.0.0	Central orchestrator for transcript acquisition pipeline
RFC-002	RSS Parsing & Episode Modeling	PRD-001	v2.0.0	RSS feed parsing and episode data model
RFC-003	Transcript Download Processing	PRD-001	v2.0.0	Resilient transcript download with retry logic
RFC-004	Filesystem Layout & Run Management	PRD-001	v2.0.0	Deterministic output directory structure and run scoping
RFC-005	Whisper Integration Lifecycle	PRD-002	v2.0.0	Whisper model loading, transcription, and cleanup
RFC-006	Whisper Screenplay Formatting	PRD-002	v2.0.0	Speaker-attributed transcript formatting
RFC-007	CLI Interface & Validation	PRD-003	v2.0.0	Command-line argument parsing and validation
RFC-008	Configuration Model & Validation	PRD-003	v2.0.0	Pydantic-based configuration with file loading
RFC-009	Progress Reporting Integration	PRD-001	v2.0.0	Pluggable progress reporting interface
RFC-010	Automatic Speaker Name Detection	PRD-008	v2.1.0	NER-based host and guest identification
RFC-011	Per-Episode Metadata Generation	PRD-004	v2.2.0	Structured metadata document generation
RFC-012	Episode Summarization Using Local Transformers	PRD-005	v2.3.0	Local transformer-based summarization
RFC-013	OpenAI Provider Implementation	PRD-006	v2.4.0	OpenAI API providers for transcription, NER, and summarization
RFC-016	Modularization for AI Experiments	PRD-007	v2.4.0	Provider system architecture to support AI experiment pipeline
RFC-017	Prompt Management	PRD-006	v2.4.0	Versioned, parameterized prompt management system (Jinja2)
RFC-018	Test Structure Reorganization	-	v2.4.0	Reorganized test suite into unit/integration/e2e directories
RFC-019	E2E Test Infrastructure and Coverage Improvements	PRD-001+	v2.4.0	Comprehensive E2E test infrastructure and coverage
RFC-020	Integration Test Infrastructure and Coverage Improvements	PRD-001+	v2.4.0	Integration test suite improvements (10 stages, 182 tests)
RFC-021	Modularization Refactoring Plan	PRD-006	v2.4.0	Detailed plan for modular provider architecture
RFC-022	Environment Variable Candidates Analysis	-	v2.4.0	Environment variable support for deployment flexibility
RFC-024	Test Execution Optimization	-	v2.4.0	Optimized test execution with markers, tiers, parallel execution
RFC-025	Test Metrics and Health Tracking	-	v2.4.0	Metrics collection, CI integration, flaky test detection
RFC-026	Metrics Consumption and Dashboards	-	v2.4.0	GitHub Pages metrics JSON API and job summaries
RFC-028	ML Model Preloading and Caching	-	v2.4.0	Model preloading for local dev and GitHub Actions caching
RFC-029	Provider Refactoring Consolidation	PRD-006	v2.4.0	Unified provider architecture documentation
RFC-030	Python Test Coverage Improvements	-	v2.4.0	Coverage collection in CI, threshold enforcement
RFC-031	Code Complexity Analysis Tooling	-	v2.4.0	Radon, Vulture, Interrogate, and codespell integration
RFC-032	Anthropic Provider Implementation	PRD-009	v2.4.0	Technical design for Anthropic Claude API providers
RFC-033	Mistral Provider Implementation	PRD-010	v2.5.0	Technical design for Mistral AI providers (all 3 capabilities)
RFC-034	DeepSeek Provider Implementation	PRD-011	v2.5.0	Technical design for DeepSeek AI (ultra low-cost)
RFC-035	Gemini Provider Implementation	PRD-012	v2.5.0	Technical design for Google Gemini (2M context)
RFC-036	Grok Provider Implementation (xAI)	PRD-013	v2.5.0	Technical design for Grok (xAI's AI model)
RFC-037	Ollama Provider Implementation	PRD-014	v2.5.0	Technical design for Ollama (local/offline)
RFC-039	Development Workflow	-	v2.4.0	Git worktrees, Cursor integration, CI evolution
RFC-023	README Acceptance Tests	-	v2.5.0	Script-based acceptance tests (`make test-acceptance`) with YAML configs
RFC-040	Audio Preprocessing Pipeline	-	v2.5.0	FFmpeg preprocessing, opus codec, audio caching, factory pattern
RFC-042	Hybrid Podcast Summarization Pipeline	-	v2.5.0	Hybrid MAP-REDUCE with instruction-tuned LLMs
RFC-044	Model Registry for Architecture Limits	-	v2.5.0	Centralized registry for model architecture limits
RFC-045	ML Model Optimization Guide	PRD-005, PRD-007	v2.5.0	cleaning_v4 profile, preprocessing optimization, parameter tuning guide
RFC-046	Materialization Architecture	PRD-007	v2.5.0	Dataset materialization for honest evaluation comparisons
RFC-047	Lightweight Run Comparison & Diagnostics Tool	PRD-007	v2.5.0	Streamlit-based visual tool for comparing runs
RFC-048	Evaluation ↔ Application Alignment	PRD-007	v2.5.0	Fingerprinting and single-path eval-app alignment
RFC-049	Grounded Insight Layer – Core Concepts & Data Model	PRD-017	v2.6.0	Core ontology, grounding contract, storage format for GIL
RFC-052	Locally Hosted LLM Models with Prompts	PRD-014	v2.5.0	Ollama provider and optimized prompt templates
RFC-055	Knowledge Graph Layer — Core Concepts & Data Model	PRD-019	v2.6.0	KG ontology, artifacts, and separation from GIL

Quick Links¶

PRDs - Product requirements documents
Architecture - System design and module responsibilities
Releases - Release notes and version history

Creating New RFCs¶

Use the RFC Template as a starting point for new technical design documents.