ADR-040: Per-Capability Provider Selection¶
- Status: Accepted
- Date: 2026-02-10
- Authors: Podcast Scraper Team
- Related RFCs: RFC-032, RFC-033, RFC-034, RFC-035, RFC-036, RFC-037
- Related PRDs: PRD-006, PRD-009, PRD-010–PRD-014
Context & Problem Statement¶
The system has three AI capabilities: transcription, speaker detection, and summarization. Not all providers support all three. For example, Anthropic, DeepSeek, Grok, and Ollama do not support audio transcription; only Whisper, OpenAI, Mistral, and Gemini do. The pipeline must allow users to pick the best provider per capability without forcing a single provider for everything, and without requiring every provider to implement every protocol.
Decision¶
We adopt per-capability provider selection:
- Independent config fields:
transcription_provider,speaker_detector_provider, andsummary_providerare chosen independently. Each accepts only providers that implement that capability. - Partial-protocol providers: A provider may implement a subset of the three protocols (TranscriptionProvider, SpeakerDetector, SummarizationProvider). It is only offered in the config for capabilities it supports.
- No automatic fallback across providers: If the user selects a provider that does not support a capability, that is a config error (e.g. selecting "anthropic" for transcription is invalid). Fallback (e.g. Whisper when no cloud transcription) is achieved by the user choosing a different provider for that capability, not by the pipeline auto-switching.
Rationale¶
- Clarity: Users explicitly choose per capability; no hidden fallback behavior.
- Consistency with ADR-024: Each provider remains a unified class implementing one or more protocols; config simply restricts which providers appear per capability.
- Extensibility: New providers (e.g. "no transcription") are added by implementing only the protocols they support and registering for the corresponding config Literal.
Alternatives Considered¶
- Single provider for all three: Rejected; would force users to use the same vendor for transcription and summarization despite different capability matrices.
- Automatic fallback (e.g. always use Whisper for transcription if LLM provider doesn't support it): Rejected; implicit behavior would be surprising and would complicate config semantics and testing.
Consequences¶
- Positive: Clear config model; provider capability matrix is documented; adding new providers only requires implementing the protocols they support and updating the config Literals for those capabilities.
- Negative: Config validation must keep per-capability allowlists in sync with provider implementations.
Implementation Notes¶
- Config:
config.Config.transcription_provider,speaker_detector_provider,summary_providerwith Literal types that list only providers supporting that capability. - Pattern: Factory functions (
create_transcription_provider, etc.) only accept provider types that implement the relevant protocol; validation rejects invalid combinations. - Documentation: Provider capability matrix in PRDs and configuration reference lists which providers support which capabilities.
References¶
- ADR-024: Unified Provider Pattern – Type-based unified provider classes
- ADR-020: Protocol-Based Provider Discovery – PEP 544 Protocols per capability
- RFC-032: Anthropic Provider Implementation – Example: no transcription support