Requests for Comment (RFCs)¶
Purpose¶
Requests for Comment (RFCs) define the how behind each feature implementation in podcast_scraper. They capture:
- Technical design and architecture decisions
- Implementation details and module boundaries
- API contracts and data structures
- Testing strategies and validation approaches
RFCs translate PRD requirements into concrete technical solutions and serve as living documentation for developers.
How RFCs Work¶
- Reference PRDs: RFCs implement requirements defined in PRDs
- Define Architecture: RFCs specify module design, interfaces, and data flow
- Guide Implementation: Developers use RFCs as blueprints for code changes
- Document Decisions: RFCs capture design rationale and alternatives considered
Open RFCs¶
| RFC | Title | Related PRD | Description |
|---|---|---|---|
| RFC-015 | AI Experiment Pipeline | PRD-007 | Technical design for configuration-driven experiment pipeline (CI integration pending) |
| RFC-027 | Pipeline Metrics Improvements | - | Improvements to pipeline metrics collection and reporting |
| RFC-038 | Continuous Review Tooling | #45 | Dependabot, pydeps, pre-release checklist |
| RFC-041 | Podcast ML Benchmarking Framework | PRD-007 | Repeatable, objective ML benchmarking system (CI integration pending) |
| RFC-043 | Automated Metrics Alerts | - | Automated regression alerts and PR comments for pipeline metrics |
| RFC-050 | Grounded Insight Layer – Use Cases & End-to-End Consumption | PRD-017 | Use cases, Insight Explorer, query patterns with insights + quotes |
| RFC-051 | Database Projection (GIL & Knowledge Graph) | PRD-018 | Relational export for GIL (gi.json) and KG (RFC-055) artifacts |
| RFC-053 | Adaptive Summarization Routing Based on Episode Profiling | PRD-005 | Episode profiling; routes summarization, GIL (RFC-049), and KG (RFC-055) strategies |
| RFC-054 | Flexible E2E Mock Response Strategy | #135, #399, #401 | Flexible strategy for E2E mock responses supporting normal and advanced error handling scenarios |
| RFC-056 | Knowledge Graph Layer — Use Cases & End-to-End Consumption | PRD-019 | KG query patterns, export, kg CLI expectations, optional DB consumption |
| RFC-057 | AutoResearch Optimization Loop (Prompts & ML Params) | - | Agent-driven ratchet loop; immutable eval harness; aligns with RFC-017 prompts and evaluation/ |
| RFC-058 | Audio-Based Speaker Diarization | PRD-020 | pyannote.audio integration for neural speaker diarization, replacing gap-based rotation |
| RFC-059 | Speaker Detection Refactor & Test Audio Improvements | PRD-020 | Modularize speaker detection, unique test voices, commercial segments |
| RFC-060 | Multi-Signal Commercial Detection & Cleaning | PRD-020 | Expanded patterns + positional heuristics (Phase 1, all providers); diarization-enhanced (Phase 2, future) |
| RFC-061 | Semantic Corpus Search | PRD-021 | Vector index over GIL/KG/summary/transcript content; FAISS (Phase 1) + Qdrant (Phase 2); podcast search CLI |
| RFC-062 | GI/KG Viewer v2 — Semantic Search UI | PRD-021 | Vue 3 + Cytoscape.js + FastAPI rebuild of viewer; semantic search panel, index dashboard, explore/QA integration |
Completed RFCs¶
| RFC | Title | Related PRD | Version | Description |
|---|---|---|---|---|
| RFC-001 | Workflow Orchestration | PRD-001 | v2.0.0 | Central orchestrator for transcript acquisition pipeline |
| RFC-002 | RSS Parsing & Episode Modeling | PRD-001 | v2.0.0 | RSS feed parsing and episode data model |
| RFC-003 | Transcript Download Processing | PRD-001 | v2.0.0 | Resilient transcript download with retry logic |
| RFC-004 | Filesystem Layout & Run Management | PRD-001 | v2.0.0 | Deterministic output directory structure and run scoping |
| RFC-005 | Whisper Integration Lifecycle | PRD-002 | v2.0.0 | Whisper model loading, transcription, and cleanup |
| RFC-006 | Whisper Screenplay Formatting | PRD-002 | v2.0.0 | Speaker-attributed transcript formatting |
| RFC-007 | CLI Interface & Validation | PRD-003 | v2.0.0 | Command-line argument parsing and validation |
| RFC-008 | Configuration Model & Validation | PRD-003 | v2.0.0 | Pydantic-based configuration with file loading |
| RFC-009 | Progress Reporting Integration | PRD-001 | v2.0.0 | Pluggable progress reporting interface |
| RFC-010 | Automatic Speaker Name Detection | PRD-008 | v2.1.0 | NER-based host and guest identification |
| RFC-011 | Per-Episode Metadata Generation | PRD-004 | v2.2.0 | Structured metadata document generation |
| RFC-012 | Episode Summarization Using Local Transformers | PRD-005 | v2.3.0 | Local transformer-based summarization |
| RFC-013 | OpenAI Provider Implementation | PRD-006 | v2.4.0 | OpenAI API providers for transcription, NER, and summarization |
| RFC-016 | Modularization for AI Experiments | PRD-007 | v2.4.0 | Provider system architecture to support AI experiment pipeline |
| RFC-017 | Prompt Management | PRD-006 | v2.4.0 | Versioned, parameterized prompt management system (Jinja2) |
| RFC-018 | Test Structure Reorganization | - | v2.4.0 | Reorganized test suite into unit/integration/e2e directories |
| RFC-019 | E2E Test Infrastructure and Coverage Improvements | PRD-001+ | v2.4.0 | Comprehensive E2E test infrastructure and coverage |
| RFC-020 | Integration Test Infrastructure and Coverage Improvements | PRD-001+ | v2.4.0 | Integration test suite improvements (10 stages, 182 tests) |
| RFC-021 | Modularization Refactoring Plan | PRD-006 | v2.4.0 | Detailed plan for modular provider architecture |
| RFC-022 | Environment Variable Candidates Analysis | - | v2.4.0 | Environment variable support for deployment flexibility |
| RFC-024 | Test Execution Optimization | - | v2.4.0 | Optimized test execution with markers, tiers, parallel execution |
| RFC-025 | Test Metrics and Health Tracking | - | v2.4.0 | Metrics collection, CI integration, flaky test detection |
| RFC-026 | Metrics Consumption and Dashboards | - | v2.4.0 | GitHub Pages metrics JSON API and job summaries |
| RFC-028 | ML Model Preloading and Caching | - | v2.4.0 | Model preloading for local dev and GitHub Actions caching |
| RFC-029 | Provider Refactoring Consolidation | PRD-006 | v2.4.0 | Unified provider architecture documentation |
| RFC-030 | Python Test Coverage Improvements | - | v2.4.0 | Coverage collection in CI, threshold enforcement |
| RFC-031 | Code Complexity Analysis Tooling | - | v2.4.0 | Radon, Vulture, Interrogate, and codespell integration |
| RFC-032 | Anthropic Provider Implementation | PRD-009 | v2.4.0 | Technical design for Anthropic Claude API providers |
| RFC-033 | Mistral Provider Implementation | PRD-010 | v2.5.0 | Technical design for Mistral AI providers (all 3 capabilities) |
| RFC-034 | DeepSeek Provider Implementation | PRD-011 | v2.5.0 | Technical design for DeepSeek AI (ultra low-cost) |
| RFC-035 | Gemini Provider Implementation | PRD-012 | v2.5.0 | Technical design for Google Gemini (2M context) |
| RFC-036 | Grok Provider Implementation (xAI) | PRD-013 | v2.5.0 | Technical design for Grok (xAI's AI model) |
| RFC-037 | Ollama Provider Implementation | PRD-014 | v2.5.0 | Technical design for Ollama (local/offline) |
| RFC-039 | Development Workflow | - | v2.4.0 | Git worktrees, Cursor integration, CI evolution |
| RFC-023 | README Acceptance Tests | - | v2.5.0 | Script-based acceptance tests (make test-acceptance) with YAML configs |
| RFC-040 | Audio Preprocessing Pipeline | - | v2.5.0 | FFmpeg preprocessing, opus codec, audio caching, factory pattern |
| RFC-042 | Hybrid Podcast Summarization Pipeline | - | v2.5.0 | Hybrid MAP-REDUCE with instruction-tuned LLMs |
| RFC-044 | Model Registry for Architecture Limits | - | v2.5.0 | Centralized registry for model architecture limits |
| RFC-045 | ML Model Optimization Guide | PRD-005, PRD-007 | v2.5.0 | cleaning_v4 profile, preprocessing optimization, parameter tuning guide |
| RFC-046 | Materialization Architecture | PRD-007 | v2.5.0 | Dataset materialization for honest evaluation comparisons |
| RFC-047 | Lightweight Run Comparison & Diagnostics Tool | PRD-007 | v2.5.0 | Streamlit-based visual tool for comparing runs |
| RFC-048 | Evaluation ↔ Application Alignment | PRD-007 | v2.5.0 | Fingerprinting and single-path eval-app alignment |
| RFC-049 | Grounded Insight Layer – Core Concepts & Data Model | PRD-017 | v2.6.0 | Core ontology, grounding contract, storage format for GIL |
| RFC-052 | Locally Hosted LLM Models with Prompts | PRD-014 | v2.5.0 | Ollama provider and optimized prompt templates |
| RFC-055 | Knowledge Graph Layer — Core Concepts & Data Model | PRD-019 | v2.6.0 | KG ontology, artifacts, and separation from GIL |
Quick Links¶
- PRDs - Product requirements documents
- Architecture - System design and module responsibilities
- Releases - Release notes and version history
Creating New RFCs¶
Use the RFC Template as a starting point for new technical design documents.