Skip to content

PRD-011: DeepSeek Provider Integration

  • Status: ✅ Implemented (v2.5.0)
  • Revision: 2
  • Date: 2026-02-04
  • Related RFCs: RFC-034 (Revised)
  • Related PRDs: PRD-006 (OpenAI), PRD-009 (Anthropic), PRD-010 (Mistral)

Summary

Add DeepSeek AI as an optional provider for speaker detection and summarization capabilities. DeepSeek offers extremely competitive pricing (up to 95% cheaper than OpenAI) and strong reasoning capabilities via DeepSeek-R1. Like OpenAI, DeepSeek uses a unified provider pattern where a single DeepSeekProvider class implements both capabilities. Like Anthropic, DeepSeek does NOT support audio transcription via its public API. This builds on the existing modularization architecture (RFC-021) and provider patterns to provide seamless provider switching.

Background & Context

Currently, the podcast scraper supports the following providers:

  • Local ML Providers: spaCy NER (speaker detection), Whisper (transcription), Hugging Face transformers (summarization)
  • OpenAI Providers: GPT API (speaker detection, summarization), Whisper API (transcription)
  • Anthropic Providers: Claude API (speaker detection, summarization) - no transcription
  • Mistral Providers: Mistral chat API (speaker detection, summarization), Voxtral (transcription)

Users have requested DeepSeek as an alternative for several reasons:

  • Extremely Low Cost: DeepSeek is 90-95% cheaper than OpenAI
  • Strong Reasoning: DeepSeek-R1 rivals OpenAI o1 in reasoning benchmarks
  • OpenAI-Compatible API: Uses same API format, easy integration
  • Open Weights: Models available for self-hosting if needed
  • Chinese Market Access: Strong performance in multilingual tasks

This PRD addresses adding DeepSeek as a cost-effective provider option.

Goals

  • Add DeepSeek AI as provider option for speaker detection and summarization
  • Maintain 100% backward compatibility with existing providers
  • Follow unified provider pattern (like OpenAI) - single class implementing both protocols
  • Provide secure API key management via environment variables and .env files
  • Support both Config-based and experiment-based factory modes from the start
  • Enable per-capability provider selection (can mix all providers)
  • Handle capability gaps gracefully (transcription not supported)
  • Leverage OpenAI-compatible API for simplified integration (no new SDK dependency)
  • Use environment-based model defaults (test vs production)
  • Create DeepSeek-specific prompt templates

DeepSeek Model Selection and Cost Analysis

Configuration Fields

Add to config.py when implementing DeepSeek providers (following OpenAI pattern):

# DeepSeek API Configuration
deepseek_api_key: Optional[str] = Field(
    default=None,
    alias="deepseek_api_key",
    description="DeepSeek API key (prefer DEEPSEEK_API_KEY env var or .env file)"
)

deepseek_api_base: Optional[str] = Field(
    default=None,
    alias="deepseek_api_base",
    description="DeepSeek API base URL (default: https://api.deepseek.com, for E2E testing)"
)

# DeepSeek Model Selection (environment-based defaults, like OpenAI)
deepseek_speaker_model: str = Field(
    default_factory=_get_default_deepseek_speaker_model,
    alias="deepseek_speaker_model",
    description="DeepSeek model for speaker detection (default: environment-based)"
)

deepseek_summary_model: str = Field(
    default_factory=_get_default_deepseek_summary_model,
    alias="deepseek_summary_model",
    description="DeepSeek model for summarization (default: environment-based)"
)

# Shared settings (like OpenAI)
deepseek_temperature: float = Field(
    default=0.3,
    alias="deepseek_temperature",
    description="Temperature for DeepSeek generation (0.0-2.0, lower = more deterministic)"
)

deepseek_max_tokens: Optional[int] = Field(
    default=None,
    alias="deepseek_max_tokens",
    description="Max tokens for DeepSeek generation (None = model default)"
)

# DeepSeek Prompt Configuration (following OpenAI pattern)
deepseek_speaker_system_prompt: Optional[str] = Field(
    default=None,
    alias="deepseek_speaker_system_prompt",
    description="DeepSeek system prompt for speaker detection (default: deepseek/ner/system_ner_v1)"
)

deepseek_speaker_user_prompt: str = Field(
    default="deepseek/ner/guest_host_v1",
    alias="deepseek_speaker_user_prompt",
    description="DeepSeek user prompt for speaker detection"
)

deepseek_summary_system_prompt: Optional[str] = Field(
    default=None,
    alias="deepseek_summary_system_prompt",
    description="DeepSeek system prompt for summarization (default: deepseek/summarization/system_v1)"
)

deepseek_summary_user_prompt: str = Field(
    default="deepseek/summarization/long_v1",
    alias="deepseek_summary_user_prompt",
    description="DeepSeek user prompt for summarization"
)

Environment-based defaults:

  • Test environment: deepseek-chat (fast, extremely cheap)
  • Production environment: deepseek-chat (same model, still very cheap)

Model Options and Pricing

Model Input Cost (Cache Miss) Input Cost (Cache Hit) Output Cost Context Window Best For
deepseek-chat $0.28 / 1M tokens $0.028 / 1M tokens $0.42 / 1M tokens 64k tokens General tasks
deepseek-reasoner (R1) $0.28 / 1M tokens $0.028 / 1M tokens $0.42 / 1M tokens 64k tokens Complex reasoning

Note: Prices subject to change. Check DeepSeek Pricing for current rates.

Volume Discounts

Tier Monthly Usage Discount
Standard 0-10M tokens 0%
Growth 10M-100M tokens 10%
Scale 100M-1B tokens 20%
Enterprise 1B+ tokens 30%+

Dev/Test vs Production Model Selection

Environment Speaker Model Summary Model Rationale
Dev/Test deepseek-chat deepseek-chat Fast, extremely cheap
Production deepseek-chat deepseek-chat Same model, still very cheap
Complex Reasoning deepseek-reasoner deepseek-reasoner For difficult analysis tasks

Cost Comparison: All Providers (Per 100 Episodes)

Component OpenAI (gpt-4o-mini) Anthropic (haiku) Mistral (small) DeepSeek (chat)
Transcription $0.60 ❌ N/A TBD ❌ N/A
Speaker Detection $0.14 $0.10 $0.03 $0.004
Summarization $0.41 $0.30 $0.08 $0.012
Total Text Processing $0.55 $0.40 $0.11 $0.016

DeepSeek is approximately 95% cheaper than OpenAI and 85% cheaper than Anthropic for text processing!

Monthly Cost Projection (1000 Episodes)

Provider Speaker Detection Summarization Total
OpenAI (gpt-4o-mini) $1.40 $4.10 $5.50
Anthropic (haiku) $1.00 $3.00 $4.00
Mistral (small) $0.30 $0.80 $1.10
DeepSeek (chat) $0.04 $0.12 $0.16

Non-Goals

  • Transcription support (DeepSeek doesn't have audio API)
  • Changing default behavior (local providers remain default)
  • Modifying existing provider implementations
  • Adding new features beyond provider selection
  • Self-hosted DeepSeek deployment (future consideration)

Personas

  • Budget-Conscious Bob: Wants cheapest possible cloud processing
  • Quality Seeker Quinn: Wants to compare DeepSeek vs other providers
  • Startup Steve: Needs to minimize API costs while scaling
  • Developer Devin: Needs to test with DeepSeek API during development
  • Privacy-First Pat: Continues using local providers exclusively

User Stories

  • As Budget-Conscious Bob, I can use DeepSeek for the cheapest cloud text processing available.
  • As Quality Seeker Quinn, I can compare DeepSeek results with OpenAI/Anthropic/Mistral.
  • As Startup Steve, I can minimize costs while processing thousands of episodes.
  • As Developer Devin, I can set my DeepSeek API key in environment variables and test.
  • As any operator, I get a clear error message if I try to use DeepSeek for transcription.
  • As any operator, I can see which provider was used for each capability in logs.

Functional Requirements

FR1: Provider Selection

  • FR1.1: Add "deepseek" as valid value for speaker_detector_provider config field
  • FR1.2: Add "deepseek" as valid value for summary_provider config field
  • FR1.3: Attempting to set transcription_provider: deepseek results in clear error message
  • FR1.4: Provider selection is independent per capability
  • FR1.5: Default values maintain current behavior (local providers)
  • FR1.6: Invalid provider values result in clear error messages
  • FR1.7: Support both Config-based and experiment-based factory modes from the start

FR2: Provider Capability Gap Handling

  • FR2.1: Clear error message: "DeepSeek provider does not support transcription. Use 'whisper' (local), 'openai', or 'mistral' instead."
  • FR2.2: Provider capability matrix documented in configuration reference
  • FR2.3: Validation occurs at configuration load time (fail fast)

FR3: API Key Management

  • FR3.1: Support DEEPSEEK_API_KEY environment variable for API authentication (like OPENAI_API_KEY)
  • FR3.2: Support .env file via python-dotenv for convenient configuration
  • FR3.3: API key is never stored in source code or committed files
  • FR3.4: .env.example template file updated with DeepSeek placeholder
  • FR3.5: Missing API key results in clear error message
  • FR3.6: API key validation at provider initialization (fail fast)
  • FR3.7: Support DEEPSEEK_API_BASE environment variable for E2E testing (like OPENAI_API_BASE)

FR4: Speaker Detection with DeepSeek

  • FR4.1: DeepSeek provider uses chat models for entity extraction
  • FR4.2: Maintains same interface as other providers
  • FR4.3: Returns results in same format as other providers
  • FR4.4: Handles API rate limits gracefully (retry with backoff)
  • FR4.5: Uses DeepSeek-specific prompt templates

FR5: Summarization with DeepSeek

  • FR5.1: DeepSeek provider uses chat models for summarization
  • FR5.2: Maintains same interface as other providers
  • FR5.3: Leverages 64k token context window
  • FR5.4: Returns results in same format as other providers
  • FR5.5: Uses DeepSeek-specific prompt templates

FR6: OpenAI-Compatible API

  • FR6.1: Use OpenAI Python SDK with custom base_url for DeepSeek
  • FR6.2: No separate DeepSeek SDK dependency required
  • FR6.3: API format is identical to OpenAI chat completions

FR7: Logging and Observability

  • FR7.1: Log which provider is used for each capability
  • FR7.2: Include provider information in metadata documents
  • FR7.3: Log API usage for debugging
  • FR7.4: No sensitive information in logs

FR8: Error Handling

  • FR8.1: API errors result in clear error messages
  • FR8.2: Rate limit errors include retry information
  • FR8.3: Network errors handled gracefully with retries
  • FR8.4: Invalid API key errors are clear and actionable

Technical Requirements

TR1: Architecture

  • TR1.1: Follow unified provider pattern (like OpenAI) - single class implementing both protocols
  • TR1.2: Create providers/deepseek/deepseek_provider.py with unified DeepSeekProvider class
  • TR1.3: DeepSeekProvider implements SpeakerDetector and SummarizationProvider protocols
  • TR1.4: Update factories to include DeepSeek option with support for both Config-based and experiment-based modes
  • TR1.5: Create prompts/deepseek/ directory with provider-specific prompt templates
  • TR1.6: Use OpenAI SDK with custom base_url (no new SDK dependency)
  • TR1.7: Follow OpenAI provider architecture exactly for consistency

TR2: Dependencies

  • TR2.1: Reuse existing openai package (already installed for OpenAI provider)
  • TR2.2: No additional SDK dependency required
  • TR2.3: Lazy initialization of OpenAI client with DeepSeek base_url
  • TR2.4: ImportError with helpful message if openai package not installed

TR3: Configuration

  • TR3.1: Add DeepSeek provider type to config Literal types
  • TR3.2: Add DeepSeek-specific config fields
  • TR3.3: Validate provider + API key consistency
  • TR3.4: Validate DeepSeek not used for transcription

TR4: Testing

  • TR4.1: Unit tests for DeepSeek providers (with mocked API)
  • TR4.2: Integration tests with E2E server mock endpoints
  • TR4.3: E2E tests for complete workflow
  • TR4.4: Tests verify same interface as other providers
  • TR4.5: Backward compatibility tests

TR5: E2E Server Extensions

  • TR5.1: Add DeepSeek mock endpoints (reuse OpenAI format - same API structure)
  • TR5.2: Mock /v1/chat/completions for DeepSeek (same as OpenAI endpoint)
  • TR5.3: Add deepseek_api_base() helper to E2EServerURLs class
  • TR5.4: Support deepseek_api_base config field for custom base URL (like openai_api_base)

Success Criteria

  • ✅ Users can select DeepSeek provider for speaker detection and summarization via unified provider
  • ✅ Clear error when attempting transcription with DeepSeek
  • ✅ Default behavior (local providers) unchanged
  • ✅ API keys managed securely via DEEPSEEK_API_KEY environment variable
  • ✅ Environment-based model defaults (test vs production)
  • ✅ Both Config-based and experiment-based factory modes supported
  • ✅ DeepSeek providers implement same interfaces as other providers
  • ✅ Uses existing OpenAI SDK (no new dependency)
  • ✅ Error handling is clear and actionable
  • ✅ E2E tests pass with DeepSeek mock endpoints
  • ✅ Follows OpenAI provider pattern exactly for consistency

Out of Scope

  • Transcription support (no DeepSeek audio API)
  • Self-hosted DeepSeek deployment
  • DeepSeek-specific reasoning features (thinking tokens)
  • Function calling / tool use

Dependencies

  • Prerequisite: Modularization refactoring (RFC-021) ✅ Completed
  • Prerequisite: OpenAI provider implementation (RFC-013) ✅ Completed
  • External: DeepSeek API access and API key
  • Internal: OpenAI Python SDK (already a dependency)

Risks & Mitigations

  • Risk: API availability in certain regions
  • Mitigation: Document regional considerations, support custom base_url
  • Risk: Model quality differences from OpenAI/Anthropic
  • Mitigation: Optimize prompts for DeepSeek, document quality comparison
  • Risk: Rate limits may differ from other providers
  • Mitigation: Implement retry logic, document limits

Provider Capability Matrix (Updated)

Capability Local OpenAI Anthropic Mistral DeepSeek
Transcription ✅ Whisper ✅ Whisper API ✅ Voxtral
Speaker Detection ✅ spaCy ✅ GPT ✅ Claude ✅ Mistral ✅ DeepSeek
Summarization ✅ Transformers ✅ GPT ✅ Claude ✅ Mistral ✅ DeepSeek

Future Considerations

  • Self-hosted DeepSeek deployment for maximum cost savings
  • DeepSeek-R1 reasoning features for complex analysis
  • Function calling for structured output
  • Support for additional DeepSeek models as released