Skip to content

Architecture Decision Records (ADRs)

Purpose

Architecture Decision Records (ADRs) capture the what and why of significant architectural decisions in podcast_scraper. While RFCs represent the proposal and journey, ADRs serve as the final, immutable record of truth for the project's architecture.

How ADRs Work

  1. Immutable Records: Once an ADR is accepted, it remains unchanged unless superseded by a new ADR.
  2. Context Driven: They explain the trade-offs and rationale behind a decision.
  3. Reference for Developers: They provide onboarding context for why certain patterns (like the Provider Protocol) were chosen.

ADR Index

ADR Title Status Related RFC Description Impl
ADR-001 Hybrid Concurrency Strategy Accepted RFC-001 IO-bound threading, sequential CPU/GPU tasks
ADR-002 Security-First XML Processing Accepted RFC-002 Mandated use of defusedxml for RSS parsing
ADR-003 Deterministic Feed Storage Accepted RFC-004 Hash-based output directory derivation
ADR-004 Flat Filesystem Archive Layout Accepted RFC-004 Flat directory structure per feed run
ADR-005 Lazy ML Dependency Loading Accepted RFC-005 Function-level imports for heavy ML libraries
ADR-006 Context-Aware Model Selection Accepted RFC-010 Automatic English model promotion (.en)
ADR-007 Universal Episode Identity Accepted RFC-011 GUID-first deterministic episode ID generation
ADR-008 Database-Agnostic Metadata Schema Accepted RFC-011 Unified JSON format for SQL/NoSQL
ADR-009 Privacy-First Local Summarization Accepted RFC-012 Local Transformers over Cloud APIs
ADR-010 Hierarchical Summarization Pattern Accepted RFC-012 Map-reduce chunking for long transcripts
ADR-011 Secure Credential Injection Accepted RFC-013 Environment-based secret management
ADR-012 Provider-Agnostic Preprocessing Accepted RFC-013 Shared pre-inference cleaning pipeline
ADR-013 Standalone Experiment Configuration Accepted RFC-015 Separation of research params from code
ADR-014 Codified Comparison Baselines Accepted RFC-015, RFC-041 Objective delta measurement vs baseline artifacts
ADR-015 Deep Provider Fingerprinting Accepted RFC-016 Hardware and environment tracking for reproducibility
ADR-016 Typed Provider Parameter Models Accepted RFC-016 Pydantic validation for backend parameters
ADR-017 Registered Preprocessing Profiles Accepted RFC-016 Versioned cleaning logic tracking
ADR-018 Externalized Prompt Management Accepted RFC-017 Versioned Jinja2 templates in prompts/
ADR-019 Standardized Test Pyramid Accepted RFC-018, RFC-024 Strict unit/integration/e2e tiering
ADR-020 Protocol-Based Provider Discovery Accepted RFC-021 Decoupling via PEP 544 Protocols
ADR-021 Acceptance Test Tier as Final CI Gate Accepted RFC-023 Fourth test tier for README/documentation accuracy; runs last in CI
ADR-022 Flaky Test Defense Accepted RFC-025 Automated retries and health reporting
ADR-023 Public Operational Metrics Accepted RFC-026 Transparency via GitHub Pages dashboards
ADR-024 Unified Provider Pattern Accepted RFC-029 Type-based unified provider classes
ADR-025 Technology-Based Provider Naming Accepted RFC-029 Clear library-based option naming
ADR-026 Per-Capability Provider Selection Accepted RFC-032, RFC-033, RFC-034, RFC-035, RFC-036, RFC-037 Independent provider choice per capability; partial-protocol providers allowed
ADR-027 Unified Provider Metrics Contract Accepted - Standardized ProviderCallMetrics pattern for all providers
ADR-028 Unified Retry Policy with Metrics Accepted - Centralized retry logic with exponential backoff and metrics tracking
ADR-029 Grouped Dependency Automation Accepted RFC-038 Balanced Dependabot updates via grouping
ADR-030 Periodic Module Coupling Analysis Accepted RFC-038 Nightly visualization of architecture health
ADR-031 Mandatory Pre-Release Validation Accepted RFC-038 Standardized checklist script for releases 🔶
ADR-032 Git Worktree-Based Development Accepted RFC-039 Parallel stable dev environments
ADR-033 Stratified CI Execution Accepted RFC-039 Fast push checks vs. full PR validation
ADR-034 Isolated Runtime Environments Accepted RFC-039 Independent venv per worktree
ADR-035 Linear History via Squash-Merge Accepted RFC-039 Clean, revertible main branch history
ADR-036 Standardized Pre-Provider Audio Stage Accepted RFC-040 Mandatory optimization before any transcription
ADR-037 Content-Hash Based Audio Caching Accepted RFC-040 Shared optimized artifacts in .cache/
ADR-038 FFmpeg-First Audio Manipulation Accepted RFC-040 System-level performance for audio pipelines
ADR-039 Speech-Optimized Codec (Opus) Accepted RFC-040 Opus at 24kbps for intermediate artifacts
ADR-040 Explicit Golden Dataset Versioning Accepted RFC-041 Approved, frozen ground truth data versions
ADR-041 Multi-Tiered Benchmarking Strategy Accepted RFC-041 Fast PR smoke tests vs nightly full benchmarks
ADR-042 Heuristic-Based Quality Gates Accepted RFC-041 Regex-based detection of common AI failure modes
ADR-043 Hybrid MAP-REDUCE Summarization Accepted RFC-042 Compression (Classic) + Abstraction (Instruct LLM)
ADR-044 Local LLM Backend Abstraction Accepted RFC-042 Support for llama.cpp, ollama, and transformers
ADR-045 Strict REDUCE Prompt Contract Accepted RFC-042 Mandatory markdown structure for LLM outputs
ADR-046 MPS Exclusive Mode for Apple Silicon Accepted RFC-042 Serialize GPU work on MPS to prevent memory contention; default on
ADR-047 Proactive Metric Regression Alerting Accepted RFC-043 Automated PR comments and webhook notifications 🔶
ADR-048 Centralized Model Registry Accepted RFC-044, RFC-029 Single source of truth for model architecture limits
ADR-049 Materialization Boundary for Evaluation Inputs Accepted RFC-046 Preprocessing becomes dataset definition via materialization_id; chunking stays in run config
ADR-050 Single Code Path for Evaluation and Application Accepted RFC-048 Eval and app share identical execution path; scorers are read-only observers
ADR-051 Per-Episode JSON Artifacts with Logical Union Accepted RFC-049, RFC-055, RFC-061 Shard by episode (gi.json, kg.json); union at query time; optional materialization
ADR-052 Separate GIL and KG Artifact Layers Accepted RFC-049, RFC-055 Independent schemas, feature flags, CLI namespaces, and evolution paths
ADR-053 Grounding Contract for Evidence-Backed Insights Accepted RFC-049, RFC-050 Explicit grounded boolean, verbatim quotes with spans, evidence chain
ADR-054 Relational Postgres Projection for GIL and KG Accepted RFC-051 Files canonical, Postgres is derived; separate GIL/KG tables; provenance on every row
ADR-055 Adaptive Summarization Routing Proposed RFC-053 Rule-based routing with episode profiling for summarization strategies
ADR-056 Composable E2E Mock Response Strategy Proposed RFC-054 Separation of functional responses from non-functional behavior in tests
ADR-057 AutoResearch Thin Harness with Credential Isolation Accepted RFC-057 Thin control layer reusing existing eval; immutable score.py; AUTORESEARCH_* credential vars 🔶
ADR-058 Additive pyannote Diarization with Separate [diarize] Extra Accepted RFC-058 pyannote as additive second pass; segment-level; separate [diarize] dependency group
ADR-059 Confidence-Scored Multi-Signal Commercial Detection Accepted RFC-060 Confidence-scored candidates replace binary detection; pattern primary, diarization adjusts
ADR-060 VectorStore Protocol with Backend Abstraction Accepted RFC-061 PEP 544 protocol decoupling FAISS (Phase 1) from Qdrant (Phase 2)
ADR-061 FAISS Phase 1 with Post-Filter Metadata Strategy Accepted RFC-061 Over-fetch + post-filter for CLI-scale; auto index type selection
ADR-062 Sentence-Boundary Transcript Chunking Accepted RFC-061 Regex sentence split, configurable target/overlap tokens, timestamp interpolation
ADR-063 Transparent Semantic Upgrade for gi explore Accepted RFC-061, RFC-050 Auto-detect vector index; semantic if available, substring fallback if not
ADR-064 Canonical Server Layer with Feature-Flagged Route Groups Accepted RFC-062 server/ module with podcast serve CLI; viewer routes v2.6, platform routes v2.7
ADR-065 Vue 3 + Vite + Cytoscape.js Frontend Stack Accepted RFC-062 Unified frontend stack for viewer and future platform UI
ADR-066 Playwright for UI End-to-End Testing Accepted RFC-062 Browser regression testing; extends ADR-020 test pyramid with UI layer

Architecture Decision Candidates

These items have been identified as potential architectural decisions but are currently under review.

| Candidate Decision | Origin | Status | Description | | :--- | :--- | :--- | :--- | :--- | | Informational-Only Metric Gates | RFC-043 | Open | Should regressions (runtime, coverage) block PRs or just notify? | | Excel-Based Result Aggregation | RFC-015 | Open | Should we maintain experiment_results.xlsx or move fully to web? | | Manual vs. Automated Golden Creation | RFC-041 | Open | Should golden data creation always require manual approval? | | ~~Diarization-Free Dialogue Formatting~~ | RFC-006 | Resolved → ADR-058 | Additive pyannote diarization accepted; gap-based rotation preserved as default fallback | | Minimalist Parser Dependency Strategy | RFC-002 | Open | Raw ElementTree vs. external RSS libraries | | Two-Phase Configuration Validation | RFC-007 | Open | argparse syntax + Pydantic semantic validation |


Creating New ADRs

Use the ADR Template to document new architectural decisions. Decisions typically originate from an RFC that has been accepted and implemented.