ADR-040: Per-Capability Provider Selection¶

Status: Accepted
Date: 2026-02-10
Authors: Podcast Scraper Team
Related RFCs: RFC-032, RFC-033, RFC-034, RFC-035, RFC-036, RFC-037
Related PRDs: PRD-006, PRD-009, PRD-010–PRD-014

Context & Problem Statement¶

The system has three AI capabilities: transcription, speaker detection, and summarization. Not all providers support all three. For example, Anthropic, DeepSeek, Grok, and Ollama do not support audio transcription; only Whisper, OpenAI, Mistral, and Gemini do. The pipeline must allow users to pick the best provider per capability without forcing a single provider for everything, and without requiring every provider to implement every protocol.

Decision¶

We adopt per-capability provider selection:

Independent config fields: transcription_provider, speaker_detector_provider, and summary_provider are chosen independently. Each accepts only providers that implement that capability.
Partial-protocol providers: A provider may implement a subset of the three protocols (TranscriptionProvider, SpeakerDetector, SummarizationProvider). It is only offered in the config for capabilities it supports.
No automatic fallback across providers: If the user selects a provider that does not support a capability, that is a config error (e.g. selecting "anthropic" for transcription is invalid). Fallback (e.g. Whisper when no cloud transcription) is achieved by the user choosing a different provider for that capability, not by the pipeline auto-switching.

Rationale¶

Clarity: Users explicitly choose per capability; no hidden fallback behavior.
Consistency with ADR-024: Each provider remains a unified class implementing one or more protocols; config simply restricts which providers appear per capability.
Extensibility: New providers (e.g. "no transcription") are added by implementing only the protocols they support and registering for the corresponding config Literal.

Alternatives Considered¶

Single provider for all three: Rejected; would force users to use the same vendor for transcription and summarization despite different capability matrices.
Automatic fallback (e.g. always use Whisper for transcription if LLM provider doesn't support it): Rejected; implicit behavior would be surprising and would complicate config semantics and testing.

Consequences¶

Positive: Clear config model; provider capability matrix is documented; adding new providers only requires implementing the protocols they support and updating the config Literals for those capabilities.
Negative: Config validation must keep per-capability allowlists in sync with provider implementations.

Implementation Notes¶

Config: config.Config.transcription_provider, speaker_detector_provider, summary_provider with Literal types that list only providers supporting that capability.
Pattern: Factory functions (create_transcription_provider, etc.) only accept provider types that implement the relevant protocol; validation rejects invalid combinations.
Documentation: Provider capability matrix in PRDs and configuration reference lists which providers support which capabilities.

References¶

ADR-024: Unified Provider Pattern – Type-based unified provider classes
ADR-020: Protocol-Based Provider Discovery – PEP 544 Protocols per capability
RFC-032: Anthropic Provider Implementation – Example: no transcription support