Skip to content

Requests for Comment (RFCs)

Purpose

Requests for Comment (RFCs) define the how behind each feature implementation in podcast_scraper. They capture:

  • Technical design and architecture decisions
  • Implementation details and module boundaries
  • API contracts and data structures
  • Testing strategies and validation approaches

RFCs translate PRD requirements into concrete technical solutions and serve as living documentation for developers.

How RFCs Work

  1. Reference PRDs: RFCs implement requirements defined in PRDs
  2. Define Architecture: RFCs specify module design, interfaces, and data flow
  3. Guide Implementation: Developers use RFCs as blueprints for code changes
  4. Document Decisions: RFCs capture design rationale and alternatives considered

Open RFCs

RFC Title Related PRD Description
RFC-015 AI Experiment Pipeline PRD-007 Technical design for configuration-driven experiment pipeline (CI integration pending)
RFC-027 Pipeline Metrics Improvements - Improvements to pipeline metrics collection and reporting
RFC-038 Continuous Review Tooling #45 Dependabot, pydeps, pre-release checklist
RFC-041 Podcast ML Benchmarking Framework PRD-007 Repeatable, objective ML benchmarking system (CI integration pending)
RFC-043 Automated Metrics Alerts - Automated regression alerts and PR comments for pipeline metrics
RFC-050 Grounded Insight Layer – Use Cases & End-to-End Consumption PRD-017 Use cases, Insight Explorer, query patterns with insights + quotes
RFC-051 Database Projection (GIL & Knowledge Graph) PRD-018 Relational export for GIL (gi.json) and KG (RFC-055) artifacts
RFC-053 Adaptive Summarization Routing Based on Episode Profiling PRD-005 Episode profiling; routes summarization, GIL (RFC-049), and KG (RFC-055) strategies
RFC-054 Flexible E2E Mock Response Strategy #135, #399, #401 Flexible strategy for E2E mock responses supporting normal and advanced error handling scenarios
RFC-056 Knowledge Graph Layer — Use Cases & End-to-End Consumption PRD-019 KG query patterns, export, kg CLI expectations, optional DB consumption
RFC-057 AutoResearch Optimization Loop (Prompts & ML Params) - Agent-driven ratchet loop; immutable eval harness; aligns with RFC-017 prompts and evaluation/
RFC-058 Audio-Based Speaker Diarization PRD-020 pyannote.audio integration for neural speaker diarization, replacing gap-based rotation
RFC-059 Speaker Detection Refactor & Test Audio Improvements PRD-020 Modularize speaker detection, unique test voices, commercial segments
RFC-060 Multi-Signal Commercial Detection & Cleaning PRD-020 Expanded patterns + positional heuristics (Phase 1, all providers); diarization-enhanced (Phase 2, future)
RFC-061 Semantic Corpus Search PRD-021 Vector index over GIL/KG/summary/transcript content; FAISS (Phase 1) + Qdrant (Phase 2); podcast search CLI
RFC-062 GI/KG Viewer v2 — Semantic Search UI PRD-021 Vue 3 + Cytoscape.js + FastAPI rebuild of viewer; semantic search panel, index dashboard, explore/QA integration

Completed RFCs

RFC Title Related PRD Version Description
RFC-001 Workflow Orchestration PRD-001 v2.0.0 Central orchestrator for transcript acquisition pipeline
RFC-002 RSS Parsing & Episode Modeling PRD-001 v2.0.0 RSS feed parsing and episode data model
RFC-003 Transcript Download Processing PRD-001 v2.0.0 Resilient transcript download with retry logic
RFC-004 Filesystem Layout & Run Management PRD-001 v2.0.0 Deterministic output directory structure and run scoping
RFC-005 Whisper Integration Lifecycle PRD-002 v2.0.0 Whisper model loading, transcription, and cleanup
RFC-006 Whisper Screenplay Formatting PRD-002 v2.0.0 Speaker-attributed transcript formatting
RFC-007 CLI Interface & Validation PRD-003 v2.0.0 Command-line argument parsing and validation
RFC-008 Configuration Model & Validation PRD-003 v2.0.0 Pydantic-based configuration with file loading
RFC-009 Progress Reporting Integration PRD-001 v2.0.0 Pluggable progress reporting interface
RFC-010 Automatic Speaker Name Detection PRD-008 v2.1.0 NER-based host and guest identification
RFC-011 Per-Episode Metadata Generation PRD-004 v2.2.0 Structured metadata document generation
RFC-012 Episode Summarization Using Local Transformers PRD-005 v2.3.0 Local transformer-based summarization
RFC-013 OpenAI Provider Implementation PRD-006 v2.4.0 OpenAI API providers for transcription, NER, and summarization
RFC-016 Modularization for AI Experiments PRD-007 v2.4.0 Provider system architecture to support AI experiment pipeline
RFC-017 Prompt Management PRD-006 v2.4.0 Versioned, parameterized prompt management system (Jinja2)
RFC-018 Test Structure Reorganization - v2.4.0 Reorganized test suite into unit/integration/e2e directories
RFC-019 E2E Test Infrastructure and Coverage Improvements PRD-001+ v2.4.0 Comprehensive E2E test infrastructure and coverage
RFC-020 Integration Test Infrastructure and Coverage Improvements PRD-001+ v2.4.0 Integration test suite improvements (10 stages, 182 tests)
RFC-021 Modularization Refactoring Plan PRD-006 v2.4.0 Detailed plan for modular provider architecture
RFC-022 Environment Variable Candidates Analysis - v2.4.0 Environment variable support for deployment flexibility
RFC-024 Test Execution Optimization - v2.4.0 Optimized test execution with markers, tiers, parallel execution
RFC-025 Test Metrics and Health Tracking - v2.4.0 Metrics collection, CI integration, flaky test detection
RFC-026 Metrics Consumption and Dashboards - v2.4.0 GitHub Pages metrics JSON API and job summaries
RFC-028 ML Model Preloading and Caching - v2.4.0 Model preloading for local dev and GitHub Actions caching
RFC-029 Provider Refactoring Consolidation PRD-006 v2.4.0 Unified provider architecture documentation
RFC-030 Python Test Coverage Improvements - v2.4.0 Coverage collection in CI, threshold enforcement
RFC-031 Code Complexity Analysis Tooling - v2.4.0 Radon, Vulture, Interrogate, and codespell integration
RFC-032 Anthropic Provider Implementation PRD-009 v2.4.0 Technical design for Anthropic Claude API providers
RFC-033 Mistral Provider Implementation PRD-010 v2.5.0 Technical design for Mistral AI providers (all 3 capabilities)
RFC-034 DeepSeek Provider Implementation PRD-011 v2.5.0 Technical design for DeepSeek AI (ultra low-cost)
RFC-035 Gemini Provider Implementation PRD-012 v2.5.0 Technical design for Google Gemini (2M context)
RFC-036 Grok Provider Implementation (xAI) PRD-013 v2.5.0 Technical design for Grok (xAI's AI model)
RFC-037 Ollama Provider Implementation PRD-014 v2.5.0 Technical design for Ollama (local/offline)
RFC-039 Development Workflow - v2.4.0 Git worktrees, Cursor integration, CI evolution
RFC-023 README Acceptance Tests - v2.5.0 Script-based acceptance tests (make test-acceptance) with YAML configs
RFC-040 Audio Preprocessing Pipeline - v2.5.0 FFmpeg preprocessing, opus codec, audio caching, factory pattern
RFC-042 Hybrid Podcast Summarization Pipeline - v2.5.0 Hybrid MAP-REDUCE with instruction-tuned LLMs
RFC-044 Model Registry for Architecture Limits - v2.5.0 Centralized registry for model architecture limits
RFC-045 ML Model Optimization Guide PRD-005, PRD-007 v2.5.0 cleaning_v4 profile, preprocessing optimization, parameter tuning guide
RFC-046 Materialization Architecture PRD-007 v2.5.0 Dataset materialization for honest evaluation comparisons
RFC-047 Lightweight Run Comparison & Diagnostics Tool PRD-007 v2.5.0 Streamlit-based visual tool for comparing runs
RFC-048 Evaluation ↔ Application Alignment PRD-007 v2.5.0 Fingerprinting and single-path eval-app alignment
RFC-049 Grounded Insight Layer – Core Concepts & Data Model PRD-017 v2.6.0 Core ontology, grounding contract, storage format for GIL
RFC-052 Locally Hosted LLM Models with Prompts PRD-014 v2.5.0 Ollama provider and optimized prompt templates
RFC-055 Knowledge Graph Layer — Core Concepts & Data Model PRD-019 v2.6.0 KG ontology, artifacts, and separation from GIL
  • PRDs - Product requirements documents
  • Architecture - System design and module responsibilities
  • Releases - Release notes and version history

Creating New RFCs

Use the RFC Template as a starting point for new technical design documents.