PRD-011: DeepSeek Provider Integration¶
- Status: ✅ Implemented (v2.5.0)
- Revision: 2
- Date: 2026-02-04
- Related RFCs: RFC-034 (Revised)
- Related PRDs: PRD-006 (OpenAI), PRD-009 (Anthropic), PRD-010 (Mistral)
Summary¶
Add DeepSeek AI as an optional provider for speaker detection and summarization capabilities. DeepSeek offers extremely competitive pricing (up to 95% cheaper than OpenAI) and strong reasoning capabilities via DeepSeek-R1. Like OpenAI, DeepSeek uses a unified provider pattern where a single DeepSeekProvider class implements both capabilities. Like Anthropic, DeepSeek does NOT support audio transcription via its public API. This builds on the existing modularization architecture (RFC-021) and provider patterns to provide seamless provider switching.
Background & Context¶
Currently, the podcast scraper supports the following providers:
- Local ML Providers: spaCy NER (speaker detection), Whisper (transcription), Hugging Face transformers (summarization)
- OpenAI Providers: GPT API (speaker detection, summarization), Whisper API (transcription)
- Anthropic Providers: Claude API (speaker detection, summarization) - no transcription
- Mistral Providers: Mistral chat API (speaker detection, summarization), Voxtral (transcription)
Users have requested DeepSeek as an alternative for several reasons:
- Extremely Low Cost: DeepSeek is 90-95% cheaper than OpenAI
- Strong Reasoning: DeepSeek-R1 rivals OpenAI o1 in reasoning benchmarks
- OpenAI-Compatible API: Uses same API format, easy integration
- Open Weights: Models available for self-hosting if needed
- Chinese Market Access: Strong performance in multilingual tasks
This PRD addresses adding DeepSeek as a cost-effective provider option.
Goals¶
- Add DeepSeek AI as provider option for speaker detection and summarization
- Maintain 100% backward compatibility with existing providers
- Follow unified provider pattern (like OpenAI) - single class implementing both protocols
- Provide secure API key management via environment variables and
.envfiles - Support both Config-based and experiment-based factory modes from the start
- Enable per-capability provider selection (can mix all providers)
- Handle capability gaps gracefully (transcription not supported)
- Leverage OpenAI-compatible API for simplified integration (no new SDK dependency)
- Use environment-based model defaults (test vs production)
- Create DeepSeek-specific prompt templates
DeepSeek Model Selection and Cost Analysis¶
Configuration Fields¶
Add to config.py when implementing DeepSeek providers (following OpenAI pattern):
# DeepSeek API Configuration
deepseek_api_key: Optional[str] = Field(
default=None,
alias="deepseek_api_key",
description="DeepSeek API key (prefer DEEPSEEK_API_KEY env var or .env file)"
)
deepseek_api_base: Optional[str] = Field(
default=None,
alias="deepseek_api_base",
description="DeepSeek API base URL (default: https://api.deepseek.com, for E2E testing)"
)
# DeepSeek Model Selection (environment-based defaults, like OpenAI)
deepseek_speaker_model: str = Field(
default_factory=_get_default_deepseek_speaker_model,
alias="deepseek_speaker_model",
description="DeepSeek model for speaker detection (default: environment-based)"
)
deepseek_summary_model: str = Field(
default_factory=_get_default_deepseek_summary_model,
alias="deepseek_summary_model",
description="DeepSeek model for summarization (default: environment-based)"
)
# Shared settings (like OpenAI)
deepseek_temperature: float = Field(
default=0.3,
alias="deepseek_temperature",
description="Temperature for DeepSeek generation (0.0-2.0, lower = more deterministic)"
)
deepseek_max_tokens: Optional[int] = Field(
default=None,
alias="deepseek_max_tokens",
description="Max tokens for DeepSeek generation (None = model default)"
)
# DeepSeek Prompt Configuration (following OpenAI pattern)
deepseek_speaker_system_prompt: Optional[str] = Field(
default=None,
alias="deepseek_speaker_system_prompt",
description="DeepSeek system prompt for speaker detection (default: deepseek/ner/system_ner_v1)"
)
deepseek_speaker_user_prompt: str = Field(
default="deepseek/ner/guest_host_v1",
alias="deepseek_speaker_user_prompt",
description="DeepSeek user prompt for speaker detection"
)
deepseek_summary_system_prompt: Optional[str] = Field(
default=None,
alias="deepseek_summary_system_prompt",
description="DeepSeek system prompt for summarization (default: deepseek/summarization/system_v1)"
)
deepseek_summary_user_prompt: str = Field(
default="deepseek/summarization/long_v1",
alias="deepseek_summary_user_prompt",
description="DeepSeek user prompt for summarization"
)
Environment-based defaults:
- Test environment:
deepseek-chat(fast, extremely cheap) - Production environment:
deepseek-chat(same model, still very cheap)
Model Options and Pricing¶
| Model | Input Cost (Cache Miss) | Input Cost (Cache Hit) | Output Cost | Context Window | Best For |
|---|---|---|---|---|---|
| deepseek-chat | $0.28 / 1M tokens | $0.028 / 1M tokens | $0.42 / 1M tokens | 64k tokens | General tasks |
| deepseek-reasoner (R1) | $0.28 / 1M tokens | $0.028 / 1M tokens | $0.42 / 1M tokens | 64k tokens | Complex reasoning |
Note: Prices subject to change. Check DeepSeek Pricing for current rates.
Volume Discounts¶
| Tier | Monthly Usage | Discount |
|---|---|---|
| Standard | 0-10M tokens | 0% |
| Growth | 10M-100M tokens | 10% |
| Scale | 100M-1B tokens | 20% |
| Enterprise | 1B+ tokens | 30%+ |
Dev/Test vs Production Model Selection¶
| Environment | Speaker Model | Summary Model | Rationale |
|---|---|---|---|
| Dev/Test | deepseek-chat |
deepseek-chat |
Fast, extremely cheap |
| Production | deepseek-chat |
deepseek-chat |
Same model, still very cheap |
| Complex Reasoning | deepseek-reasoner |
deepseek-reasoner |
For difficult analysis tasks |
Cost Comparison: All Providers (Per 100 Episodes)¶
| Component | OpenAI (gpt-4o-mini) | Anthropic (haiku) | Mistral (small) | DeepSeek (chat) |
|---|---|---|---|---|
| Transcription | $0.60 | ❌ N/A | TBD | ❌ N/A |
| Speaker Detection | $0.14 | $0.10 | $0.03 | $0.004 |
| Summarization | $0.41 | $0.30 | $0.08 | $0.012 |
| Total Text Processing | $0.55 | $0.40 | $0.11 | $0.016 |
DeepSeek is approximately 95% cheaper than OpenAI and 85% cheaper than Anthropic for text processing!
Monthly Cost Projection (1000 Episodes)¶
| Provider | Speaker Detection | Summarization | Total |
|---|---|---|---|
| OpenAI (gpt-4o-mini) | $1.40 | $4.10 | $5.50 |
| Anthropic (haiku) | $1.00 | $3.00 | $4.00 |
| Mistral (small) | $0.30 | $0.80 | $1.10 |
| DeepSeek (chat) | $0.04 | $0.12 | $0.16 |
Non-Goals¶
- Transcription support (DeepSeek doesn't have audio API)
- Changing default behavior (local providers remain default)
- Modifying existing provider implementations
- Adding new features beyond provider selection
- Self-hosted DeepSeek deployment (future consideration)
Personas¶
- Budget-Conscious Bob: Wants cheapest possible cloud processing
- Quality Seeker Quinn: Wants to compare DeepSeek vs other providers
- Startup Steve: Needs to minimize API costs while scaling
- Developer Devin: Needs to test with DeepSeek API during development
- Privacy-First Pat: Continues using local providers exclusively
User Stories¶
- As Budget-Conscious Bob, I can use DeepSeek for the cheapest cloud text processing available.
- As Quality Seeker Quinn, I can compare DeepSeek results with OpenAI/Anthropic/Mistral.
- As Startup Steve, I can minimize costs while processing thousands of episodes.
- As Developer Devin, I can set my DeepSeek API key in environment variables and test.
- As any operator, I get a clear error message if I try to use DeepSeek for transcription.
- As any operator, I can see which provider was used for each capability in logs.
Functional Requirements¶
FR1: Provider Selection¶
- FR1.1: Add
"deepseek"as valid value forspeaker_detector_providerconfig field - FR1.2: Add
"deepseek"as valid value forsummary_providerconfig field - FR1.3: Attempting to set
transcription_provider: deepseekresults in clear error message - FR1.4: Provider selection is independent per capability
- FR1.5: Default values maintain current behavior (local providers)
- FR1.6: Invalid provider values result in clear error messages
- FR1.7: Support both Config-based and experiment-based factory modes from the start
FR2: Provider Capability Gap Handling¶
- FR2.1: Clear error message: "DeepSeek provider does not support transcription. Use 'whisper' (local), 'openai', or 'mistral' instead."
- FR2.2: Provider capability matrix documented in configuration reference
- FR2.3: Validation occurs at configuration load time (fail fast)
FR3: API Key Management¶
- FR3.1: Support
DEEPSEEK_API_KEYenvironment variable for API authentication (likeOPENAI_API_KEY) - FR3.2: Support
.envfile viapython-dotenvfor convenient configuration - FR3.3: API key is never stored in source code or committed files
- FR3.4:
.env.exampletemplate file updated with DeepSeek placeholder - FR3.5: Missing API key results in clear error message
- FR3.6: API key validation at provider initialization (fail fast)
- FR3.7: Support
DEEPSEEK_API_BASEenvironment variable for E2E testing (likeOPENAI_API_BASE)
FR4: Speaker Detection with DeepSeek¶
- FR4.1: DeepSeek provider uses chat models for entity extraction
- FR4.2: Maintains same interface as other providers
- FR4.3: Returns results in same format as other providers
- FR4.4: Handles API rate limits gracefully (retry with backoff)
- FR4.5: Uses DeepSeek-specific prompt templates
FR5: Summarization with DeepSeek¶
- FR5.1: DeepSeek provider uses chat models for summarization
- FR5.2: Maintains same interface as other providers
- FR5.3: Leverages 64k token context window
- FR5.4: Returns results in same format as other providers
- FR5.5: Uses DeepSeek-specific prompt templates
FR6: OpenAI-Compatible API¶
- FR6.1: Use OpenAI Python SDK with custom
base_urlfor DeepSeek - FR6.2: No separate DeepSeek SDK dependency required
- FR6.3: API format is identical to OpenAI chat completions
FR7: Logging and Observability¶
- FR7.1: Log which provider is used for each capability
- FR7.2: Include provider information in metadata documents
- FR7.3: Log API usage for debugging
- FR7.4: No sensitive information in logs
FR8: Error Handling¶
- FR8.1: API errors result in clear error messages
- FR8.2: Rate limit errors include retry information
- FR8.3: Network errors handled gracefully with retries
- FR8.4: Invalid API key errors are clear and actionable
Technical Requirements¶
TR1: Architecture¶
- TR1.1: Follow unified provider pattern (like OpenAI) - single class implementing both protocols
- TR1.2: Create
providers/deepseek/deepseek_provider.pywith unifiedDeepSeekProviderclass - TR1.3:
DeepSeekProviderimplementsSpeakerDetectorandSummarizationProviderprotocols - TR1.4: Update factories to include DeepSeek option with support for both Config-based and experiment-based modes
- TR1.5: Create
prompts/deepseek/directory with provider-specific prompt templates - TR1.6: Use OpenAI SDK with custom base_url (no new SDK dependency)
- TR1.7: Follow OpenAI provider architecture exactly for consistency
TR2: Dependencies¶
- TR2.1: Reuse existing
openaipackage (already installed for OpenAI provider) - TR2.2: No additional SDK dependency required
- TR2.3: Lazy initialization of OpenAI client with DeepSeek base_url
- TR2.4: ImportError with helpful message if
openaipackage not installed
TR3: Configuration¶
- TR3.1: Add DeepSeek provider type to config Literal types
- TR3.2: Add DeepSeek-specific config fields
- TR3.3: Validate provider + API key consistency
- TR3.4: Validate DeepSeek not used for transcription
TR4: Testing¶
- TR4.1: Unit tests for DeepSeek providers (with mocked API)
- TR4.2: Integration tests with E2E server mock endpoints
- TR4.3: E2E tests for complete workflow
- TR4.4: Tests verify same interface as other providers
- TR4.5: Backward compatibility tests
TR5: E2E Server Extensions¶
- TR5.1: Add DeepSeek mock endpoints (reuse OpenAI format - same API structure)
- TR5.2: Mock
/v1/chat/completionsfor DeepSeek (same as OpenAI endpoint) - TR5.3: Add
deepseek_api_base()helper toE2EServerURLsclass - TR5.4: Support
deepseek_api_baseconfig field for custom base URL (likeopenai_api_base)
Success Criteria¶
- ✅ Users can select DeepSeek provider for speaker detection and summarization via unified provider
- ✅ Clear error when attempting transcription with DeepSeek
- ✅ Default behavior (local providers) unchanged
- ✅ API keys managed securely via
DEEPSEEK_API_KEYenvironment variable - ✅ Environment-based model defaults (test vs production)
- ✅ Both Config-based and experiment-based factory modes supported
- ✅ DeepSeek providers implement same interfaces as other providers
- ✅ Uses existing OpenAI SDK (no new dependency)
- ✅ Error handling is clear and actionable
- ✅ E2E tests pass with DeepSeek mock endpoints
- ✅ Follows OpenAI provider pattern exactly for consistency
Out of Scope¶
- Transcription support (no DeepSeek audio API)
- Self-hosted DeepSeek deployment
- DeepSeek-specific reasoning features (thinking tokens)
- Function calling / tool use
Dependencies¶
- Prerequisite: Modularization refactoring (RFC-021) ✅ Completed
- Prerequisite: OpenAI provider implementation (RFC-013) ✅ Completed
- External: DeepSeek API access and API key
- Internal: OpenAI Python SDK (already a dependency)
Risks & Mitigations¶
- Risk: API availability in certain regions
- Mitigation: Document regional considerations, support custom base_url
- Risk: Model quality differences from OpenAI/Anthropic
- Mitigation: Optimize prompts for DeepSeek, document quality comparison
- Risk: Rate limits may differ from other providers
- Mitigation: Implement retry logic, document limits
Provider Capability Matrix (Updated)¶
| Capability | Local | OpenAI | Anthropic | Mistral | DeepSeek |
|---|---|---|---|---|---|
| Transcription | ✅ Whisper | ✅ Whisper API | ❌ | ✅ Voxtral | ❌ |
| Speaker Detection | ✅ spaCy | ✅ GPT | ✅ Claude | ✅ Mistral | ✅ DeepSeek |
| Summarization | ✅ Transformers | ✅ GPT | ✅ Claude | ✅ Mistral | ✅ DeepSeek |
Future Considerations¶
- Self-hosted DeepSeek deployment for maximum cost savings
- DeepSeek-R1 reasoning features for complex analysis
- Function calling for structured output
- Support for additional DeepSeek models as released