Architecture Decision Records (ADRs)¶
Purpose¶
Architecture Decision Records (ADRs) capture the what and why of significant architectural decisions in podcast_scraper. While RFCs represent the proposal and journey, ADRs serve as the final, immutable record of truth for the project's architecture.
How ADRs Work¶
- Immutable Records: Once an ADR is accepted, it remains unchanged unless superseded by a new ADR.
- Context Driven: They explain the trade-offs and rationale behind a decision.
- Reference for Developers: They provide onboarding context for why certain patterns (like the Provider Protocol) were chosen.
ADR Index¶
| ADR | Title | Status | Related RFC | Description | Impl |
|---|---|---|---|---|---|
| ADR-001 | Hybrid Concurrency Strategy | Accepted | RFC-001 | IO-bound threading, sequential CPU/GPU tasks | ✅ |
| ADR-002 | Security-First XML Processing | Accepted | RFC-002 | Mandated use of defusedxml for RSS parsing | ✅ |
| ADR-003 | Deterministic Feed Storage | Accepted | RFC-004 | Hash-based output directory derivation | ✅ |
| ADR-004 | Flat Filesystem Archive Layout | Accepted | RFC-004 | Flat directory structure per feed run | ✅ |
| ADR-005 | Lazy ML Dependency Loading | Accepted | RFC-005 | Function-level imports for heavy ML libraries | ✅ |
| ADR-006 | Context-Aware Model Selection | Accepted | RFC-010 | Automatic English model promotion (.en) | ✅ |
| ADR-007 | Universal Episode Identity | Accepted | RFC-011 | GUID-first deterministic episode ID generation | ✅ |
| ADR-008 | Database-Agnostic Metadata Schema | Accepted | RFC-011 | Unified JSON format for SQL/NoSQL | ✅ |
| ADR-009 | Privacy-First Local Summarization | Accepted | RFC-012 | Local Transformers over Cloud APIs | ✅ |
| ADR-010 | Hierarchical Summarization Pattern | Accepted | RFC-012 | Map-reduce chunking for long transcripts | ✅ |
| ADR-011 | Secure Credential Injection | Accepted | RFC-013 | Environment-based secret management | ✅ |
| ADR-012 | Provider-Agnostic Preprocessing | Accepted | RFC-013 | Shared pre-inference cleaning pipeline | ✅ |
| ADR-013 | Standalone Experiment Configuration | Accepted | RFC-015 | Separation of research params from code | ✅ |
| ADR-014 | Codified Comparison Baselines | Accepted | RFC-015, RFC-041 | Objective delta measurement vs baseline artifacts | ✅ |
| ADR-015 | Deep Provider Fingerprinting | Accepted | RFC-016 | Hardware and environment tracking for reproducibility | ✅ |
| ADR-016 | Typed Provider Parameter Models | Accepted | RFC-016 | Pydantic validation for backend parameters | ✅ |
| ADR-017 | Registered Preprocessing Profiles | Accepted | RFC-016 | Versioned cleaning logic tracking | ✅ |
| ADR-018 | Externalized Prompt Management | Accepted | RFC-017 | Versioned Jinja2 templates in prompts/ | ✅ |
| ADR-019 | Standardized Test Pyramid | Accepted | RFC-018, RFC-024 | Strict unit/integration/e2e tiering | ✅ |
| ADR-020 | Protocol-Based Provider Discovery | Accepted | RFC-021 | Decoupling via PEP 544 Protocols | ✅ |
| ADR-021 | Acceptance Test Tier as Final CI Gate | Accepted | RFC-023 | Fourth test tier for README/documentation accuracy; runs last in CI | ✅ |
| ADR-022 | Flaky Test Defense | Accepted | RFC-025 | Automated retries and health reporting | ✅ |
| ADR-023 | Public Operational Metrics | Accepted | RFC-026 | Transparency via GitHub Pages dashboards | ✅ |
| ADR-024 | Unified Provider Pattern | Accepted | RFC-029 | Type-based unified provider classes | ✅ |
| ADR-025 | Technology-Based Provider Naming | Accepted | RFC-029 | Clear library-based option naming | ✅ |
| ADR-026 | Per-Capability Provider Selection | Accepted | RFC-032, RFC-033, RFC-034, RFC-035, RFC-036, RFC-037 | Independent provider choice per capability; partial-protocol providers allowed | ✅ |
| ADR-027 | Unified Provider Metrics Contract | Accepted | - | Standardized ProviderCallMetrics pattern for all providers |
✅ |
| ADR-028 | Unified Retry Policy with Metrics | Accepted | - | Centralized retry logic with exponential backoff and metrics tracking | ✅ |
| ADR-029 | Grouped Dependency Automation | Accepted | RFC-038 | Balanced Dependabot updates via grouping | ✅ |
| ADR-030 | Periodic Module Coupling Analysis | Accepted | RFC-038 | Nightly visualization of architecture health | ✅ |
| ADR-031 | Mandatory Pre-Release Validation | Accepted | RFC-038 | Standardized checklist script for releases | 🔶 |
| ADR-032 | Git Worktree-Based Development | Accepted | RFC-039 | Parallel stable dev environments | ✅ |
| ADR-033 | Stratified CI Execution | Accepted | RFC-039 | Fast push checks vs. full PR validation | ✅ |
| ADR-034 | Isolated Runtime Environments | Accepted | RFC-039 | Independent venv per worktree | ✅ |
| ADR-035 | Linear History via Squash-Merge | Accepted | RFC-039 | Clean, revertible main branch history | ✅ |
| ADR-036 | Standardized Pre-Provider Audio Stage | Accepted | RFC-040 | Mandatory optimization before any transcription | ✅ |
| ADR-037 | Content-Hash Based Audio Caching | Accepted | RFC-040 | Shared optimized artifacts in .cache/ | ✅ |
| ADR-038 | FFmpeg-First Audio Manipulation | Accepted | RFC-040 | System-level performance for audio pipelines | ✅ |
| ADR-039 | Speech-Optimized Codec (Opus) | Accepted | RFC-040 | Opus at 24kbps for intermediate artifacts | ✅ |
| ADR-040 | Explicit Golden Dataset Versioning | Accepted | RFC-041 | Approved, frozen ground truth data versions | ✅ |
| ADR-041 | Multi-Tiered Benchmarking Strategy | Accepted | RFC-041 | Fast PR smoke tests vs nightly full benchmarks | ✅ |
| ADR-042 | Heuristic-Based Quality Gates | Accepted | RFC-041 | Regex-based detection of common AI failure modes | ✅ |
| ADR-043 | Hybrid MAP-REDUCE Summarization | Accepted | RFC-042 | Compression (Classic) + Abstraction (Instruct LLM) | ✅ |
| ADR-044 | Local LLM Backend Abstraction | Accepted | RFC-042 | Support for llama.cpp, ollama, and transformers | ✅ |
| ADR-045 | Strict REDUCE Prompt Contract | Accepted | RFC-042 | Mandatory markdown structure for LLM outputs | ✅ |
| ADR-046 | MPS Exclusive Mode for Apple Silicon | Accepted | RFC-042 | Serialize GPU work on MPS to prevent memory contention; default on | ✅ |
| ADR-047 | Proactive Metric Regression Alerting | Accepted | RFC-043 | Automated PR comments and webhook notifications | 🔶 |
| ADR-048 | Centralized Model Registry | Accepted | RFC-044, RFC-029 | Single source of truth for model architecture limits | ✅ |
| ADR-049 | Materialization Boundary for Evaluation Inputs | Accepted | RFC-046 | Preprocessing becomes dataset definition via materialization_id; chunking stays in run config | ✅ |
| ADR-050 | Single Code Path for Evaluation and Application | Accepted | RFC-048 | Eval and app share identical execution path; scorers are read-only observers | ✅ |
| ADR-051 | Per-Episode JSON Artifacts with Logical Union | Accepted | RFC-049, RFC-055, RFC-061 | Shard by episode (gi.json, kg.json); union at query time; optional materialization | ✅ |
| ADR-052 | Separate GIL and KG Artifact Layers | Accepted | RFC-049, RFC-055 | Independent schemas, feature flags, CLI namespaces, and evolution paths | ✅ |
| ADR-053 | Grounding Contract for Evidence-Backed Insights | Accepted | RFC-049, RFC-050 | Explicit grounded boolean, verbatim quotes with spans, evidence chain | ✅ |
| ADR-054 | Relational Postgres Projection for GIL and KG | Accepted | RFC-051 | Files canonical, Postgres is derived; separate GIL/KG tables; provenance on every row | — |
| ADR-055 | Adaptive Summarization Routing | Proposed | RFC-053 | Rule-based routing with episode profiling for summarization strategies | — |
| ADR-056 | Composable E2E Mock Response Strategy | Proposed | RFC-054 | Separation of functional responses from non-functional behavior in tests | — |
| ADR-057 | AutoResearch Thin Harness with Credential Isolation | Accepted | RFC-057 | Thin control layer reusing existing eval; immutable score.py; AUTORESEARCH_* credential vars | 🔶 |
| ADR-058 | Additive pyannote Diarization with Separate [diarize] Extra |
Accepted | RFC-058 | pyannote as additive second pass; segment-level; separate [diarize] dependency group | — |
| ADR-059 | Confidence-Scored Multi-Signal Commercial Detection | Accepted | RFC-060 | Confidence-scored candidates replace binary detection; pattern primary, diarization adjusts | — |
| ADR-060 | VectorStore Protocol with Backend Abstraction | Accepted | RFC-061 | PEP 544 protocol decoupling FAISS (Phase 1) from Qdrant (Phase 2) | ✅ |
| ADR-061 | FAISS Phase 1 with Post-Filter Metadata Strategy | Accepted | RFC-061 | Over-fetch + post-filter for CLI-scale; auto index type selection | ✅ |
| ADR-062 | Sentence-Boundary Transcript Chunking | Accepted | RFC-061 | Regex sentence split, configurable target/overlap tokens, timestamp interpolation | — |
| ADR-063 | Transparent Semantic Upgrade for gi explore | Accepted | RFC-061, RFC-050 | Auto-detect vector index; semantic if available, substring fallback if not | — |
| ADR-064 | Canonical Server Layer with Feature-Flagged Route Groups | Accepted | RFC-062 | server/ module with podcast serve CLI; viewer routes v2.6, platform routes v2.7 |
— |
| ADR-065 | Vue 3 + Vite + Cytoscape.js Frontend Stack | Accepted | RFC-062 | Unified frontend stack for viewer and future platform UI | — |
| ADR-066 | Playwright for UI End-to-End Testing | Accepted | RFC-062 | Browser regression testing; extends ADR-020 test pyramid with UI layer | — |
Architecture Decision Candidates¶
These items have been identified as potential architectural decisions but are currently under review.
| Candidate Decision | Origin | Status | Description |
| :--- | :--- | :--- | :--- | :--- |
| Informational-Only Metric Gates | RFC-043 | Open | Should regressions (runtime, coverage) block PRs or just notify? |
| Excel-Based Result Aggregation | RFC-015 | Open | Should we maintain experiment_results.xlsx or move fully to web? |
| Manual vs. Automated Golden Creation | RFC-041 | Open | Should golden data creation always require manual approval? |
| ~~Diarization-Free Dialogue Formatting~~ | RFC-006 | Resolved → ADR-058 | Additive pyannote diarization accepted; gap-based rotation preserved as default fallback |
| Minimalist Parser Dependency Strategy | RFC-002 | Open | Raw ElementTree vs. external RSS libraries |
| Two-Phase Configuration Validation | RFC-007 | Open | argparse syntax + Pydantic semantic validation |
Creating New ADRs¶
Use the ADR Template to document new architectural decisions. Decisions typically originate from an RFC that has been accepted and implemented.