Architecture Decision Records (ADRs)¶

Purpose¶

Architecture Decision Records (ADRs) capture the what and why of significant architectural decisions in podcast_scraper. While RFCs represent the proposal and journey, ADRs serve as the final, immutable record of truth for the project's architecture.

How ADRs Work¶

Immutable Records: Once an ADR is accepted, it remains unchanged unless superseded by a new ADR.
Context Driven: They explain the trade-offs and rationale behind a decision.
Reference for Developers: They provide onboarding context for why certain patterns (like the Provider Protocol) were chosen.

ADR Index¶

ADR	Title	Status	Related RFC	Description	Impl
ADR-001	Hybrid Concurrency Strategy	Accepted	RFC-001	IO-bound threading, sequential CPU/GPU tasks	✅
ADR-002	Security-First XML Processing	Accepted	RFC-002	Mandated use of defusedxml for RSS parsing	✅
ADR-003	Deterministic Feed Storage	Accepted	RFC-004	Hash-based output directory derivation	✅
ADR-004	Flat Filesystem Archive Layout	Accepted	RFC-004	Flat directory structure per feed run	✅
ADR-005	Lazy ML Dependency Loading	Accepted	RFC-005	Function-level imports for heavy ML libraries	✅
ADR-006	Context-Aware Model Selection	Accepted	RFC-010	Automatic English model promotion (.en)	✅
ADR-007	Universal Episode Identity	Accepted	RFC-011	GUID-first deterministic episode ID generation	✅
ADR-008	Database-Agnostic Metadata Schema	Accepted	RFC-011	Unified JSON format for SQL/NoSQL	✅
ADR-009	Privacy-First Local Summarization	Accepted	RFC-012	Local Transformers over Cloud APIs	✅
ADR-010	Hierarchical Summarization Pattern	Accepted	RFC-012	Map-reduce chunking for long transcripts	✅
ADR-011	Secure Credential Injection	Accepted	RFC-013	Environment-based secret management	✅
ADR-012	Provider-Agnostic Preprocessing	Accepted	RFC-013	Shared pre-inference cleaning pipeline	✅
ADR-013	Standalone Experiment Configuration	Accepted	RFC-015	Separation of research params from code	✅
ADR-014	Codified Comparison Baselines	Accepted	RFC-015, RFC-041	Objective delta measurement vs baseline artifacts	✅
ADR-015	Deep Provider Fingerprinting	Accepted	RFC-016	Hardware and environment tracking for reproducibility	✅
ADR-016	Typed Provider Parameter Models	Accepted	RFC-016	Pydantic validation for backend parameters	✅
ADR-017	Registered Preprocessing Profiles	Accepted	RFC-016	Versioned cleaning logic tracking	✅
ADR-018	Externalized Prompt Management	Accepted	RFC-017	Versioned Jinja2 templates in prompts/	✅
ADR-019	Standardized Test Pyramid	Accepted	RFC-018, RFC-024	Strict unit/integration/e2e tiering	✅
ADR-020	Protocol-Based Provider Discovery	Accepted	RFC-021	Decoupling via PEP 544 Protocols	✅
ADR-021	Acceptance Test Tier as Final CI Gate	Accepted	RFC-023	Fourth test tier for README/documentation accuracy; runs last in CI	✅
ADR-022	Flaky Test Defense	Accepted	RFC-025	Automated retries and health reporting	✅
ADR-023	Public Operational Metrics	Accepted	RFC-026	Transparency via GitHub Pages dashboards	✅
ADR-024	Unified Provider Pattern	Accepted	RFC-029	Type-based unified provider classes	✅
ADR-025	Technology-Based Provider Naming	Accepted	RFC-029	Clear library-based option naming	✅
ADR-026	Per-Capability Provider Selection	Accepted	RFC-032, RFC-033, RFC-034, RFC-035, RFC-036, RFC-037	Independent provider choice per capability; partial-protocol providers allowed	✅
ADR-027	Unified Provider Metrics Contract	Accepted	-	Standardized `ProviderCallMetrics` pattern for all providers	✅
ADR-028	Unified Retry Policy with Metrics	Accepted	-	Centralized retry logic with exponential backoff and metrics tracking	✅
ADR-029	Grouped Dependency Automation	Accepted	RFC-038	Balanced Dependabot updates via grouping	✅
ADR-030	Periodic Module Coupling Analysis	Accepted	RFC-038	Nightly visualization of architecture health	✅
ADR-031	Mandatory Pre-Release Validation	Accepted	RFC-038	Standardized checklist script for releases	🔶
ADR-032	Git Worktree-Based Development	Accepted	RFC-039	Parallel stable dev environments	✅
ADR-033	Stratified CI Execution	Accepted	RFC-039	Fast push checks vs. full PR validation	✅
ADR-034	Isolated Runtime Environments	Accepted	RFC-039	Independent venv per worktree	✅
ADR-035	Linear History via Squash-Merge	Accepted	RFC-039	Clean, revertible main branch history	✅
ADR-036	Standardized Pre-Provider Audio Stage	Accepted	RFC-040	Mandatory optimization before any transcription	✅
ADR-037	Content-Hash Based Audio Caching	Accepted	RFC-040	Shared optimized artifacts in .cache/	✅
ADR-038	FFmpeg-First Audio Manipulation	Accepted	RFC-040	System-level performance for audio pipelines	✅
ADR-039	Speech-Optimized Codec (Opus)	Accepted	RFC-040	Opus at 24kbps for intermediate artifacts	✅
ADR-040	Explicit Golden Dataset Versioning	Accepted	RFC-041	Approved, frozen ground truth data versions	✅
ADR-041	Multi-Tiered Benchmarking Strategy	Accepted	RFC-041	Fast PR smoke tests vs nightly full benchmarks	✅
ADR-042	Heuristic-Based Quality Gates	Accepted	RFC-041	Regex-based detection of common AI failure modes	✅
ADR-043	Hybrid MAP-REDUCE Summarization	Accepted	RFC-042	Compression (Classic) + Abstraction (Instruct LLM)	✅
ADR-044	Local LLM Backend Abstraction	Accepted	RFC-042	Support for llama.cpp, ollama, and transformers	✅
ADR-045	Strict REDUCE Prompt Contract	Accepted	RFC-042	Mandatory markdown structure for LLM outputs	✅
ADR-046	MPS Exclusive Mode for Apple Silicon	Accepted	RFC-042	Serialize GPU work on MPS to prevent memory contention; default on	✅
ADR-047	Proactive Metric Regression Alerting	Accepted	RFC-043	Automated PR comments and webhook notifications	🔶
ADR-048	Centralized Model Registry	Accepted	RFC-044, RFC-029	Single source of truth for model architecture limits	✅
ADR-049	Materialization Boundary for Evaluation Inputs	Accepted	RFC-046	Preprocessing becomes dataset definition via materialization_id; chunking stays in run config	✅
ADR-050	Single Code Path for Evaluation and Application	Accepted	RFC-048	Eval and app share identical execution path; scorers are read-only observers	✅
ADR-051	Per-Episode JSON Artifacts with Logical Union	Accepted	RFC-049, RFC-055, RFC-061	Shard by episode (gi.json, kg.json); union at query time; optional materialization	✅
ADR-052	Separate GIL and KG Artifact Layers	Accepted	RFC-049, RFC-055	Independent schemas, feature flags, CLI namespaces, and evolution paths	✅
ADR-053	Grounding Contract for Evidence-Backed Insights	Accepted	RFC-049, RFC-050	Explicit grounded boolean, verbatim quotes with spans, evidence chain	✅
ADR-054	Relational Postgres Projection for GIL and KG	Accepted	RFC-051	Files canonical, Postgres is derived; separate GIL/KG tables; provenance on every row	—
ADR-055	Adaptive Summarization Routing	Proposed	RFC-053	Rule-based routing with episode profiling for summarization strategies	—
ADR-056	Composable E2E Mock Response Strategy	Proposed	RFC-054	Separation of functional responses from non-functional behavior in tests	—
ADR-057	AutoResearch Thin Harness with Credential Isolation	Accepted	RFC-057	Thin control layer reusing existing eval; immutable score.py; AUTORESEARCH_* credential vars	🔶
ADR-058	Additive pyannote Diarization with Separate `[diarize]` Extra	Accepted	RFC-058	pyannote as additive second pass; segment-level; separate [diarize] dependency group	—
ADR-059	Confidence-Scored Multi-Signal Commercial Detection	Accepted	RFC-060	Confidence-scored candidates replace binary detection; pattern primary, diarization adjusts	—
ADR-060	VectorStore Protocol with Backend Abstraction	Accepted	RFC-061	PEP 544 protocol decoupling FAISS (Phase 1) from Qdrant (Phase 2)	✅
ADR-061	FAISS Phase 1 with Post-Filter Metadata Strategy	Accepted	RFC-061	Over-fetch + post-filter for CLI-scale; auto index type selection	✅
ADR-062	Sentence-Boundary Transcript Chunking	Accepted	RFC-061	Regex sentence split, configurable target/overlap tokens, timestamp interpolation	—
ADR-063	Transparent Semantic Upgrade for gi explore	Accepted	RFC-061, RFC-050	Auto-detect vector index; semantic if available, substring fallback if not	—
ADR-064	Canonical Server Layer with Feature-Flagged Route Groups	Accepted	RFC-062	`server/` module with `podcast serve` CLI; viewer routes v2.6, platform routes v2.7	—
ADR-065	Vue 3 + Vite + Cytoscape.js Frontend Stack	Accepted	RFC-062	Unified frontend stack for viewer and future platform UI	—
ADR-066	Playwright for UI End-to-End Testing	Accepted	RFC-062	Browser regression testing; extends ADR-020 test pyramid with UI layer	—

Architecture Decision Candidates¶

These items have been identified as potential architectural decisions but are currently under review.

Creating New ADRs¶

Use the ADR Template to document new architectural decisions. Decisions typically originate from an RFC that has been accepted and implemented.