PRD-010: Mistral Provider Integration¶

Status: ✅ Implemented (v2.5.0)
Revision: 2
Date: 2026-02-04
Related RFCs: RFC-033 (Revised)
Related PRDs: PRD-006 (OpenAI), PRD-009 (Anthropic)

Summary¶

Add Mistral AI as an optional provider for transcription, speaker detection, and summarization capabilities. Mistral is unique among cloud providers in offering a complete alternative to OpenAI, supporting all three capabilities through their chat models and Voxtral audio models. Like OpenAI, Mistral uses a unified provider pattern where a single MistralProvider class implements all three capabilities. This builds on the existing modularization architecture (RFC-021) and provider patterns (PRD-006, PRD-009) to provide seamless provider switching.

Background & Context¶

Currently, the podcast scraper supports the following providers:

Local ML Providers: spaCy NER (speaker detection), Whisper (transcription), Hugging Face transformers (summarization)
OpenAI Providers: OpenAI GPT API (speaker detection, summarization), OpenAI Whisper API (transcription)
Anthropic Providers: Claude API (speaker detection, summarization) - no transcription

Users have requested Mistral as an alternative for several reasons:

Full OpenAI Alternative: Unlike Anthropic, Mistral supports ALL capabilities including transcription
European Provider: Mistral is based in France, which may be preferred for data residency
Competitive Pricing: Mistral offers competitive pricing, especially for smaller models
Large Context Window: Mistral Large 3 offers 256k token context window
Model Diversity: Different models may produce better results for specific use cases

This PRD addresses adding Mistral as a complete provider option covering all three capabilities.

Goals¶

Add Mistral AI as provider option for transcription, speaker detection, and summarization
Maintain 100% backward compatibility with existing providers (local, OpenAI, Anthropic)
Follow unified provider pattern (like OpenAI) - single class implementing all three protocols
Provide secure API key management via environment variables and .env files
Support both Config-based and experiment-based factory modes from the start
Enable per-capability provider selection (can mix local, OpenAI, Anthropic, and Mistral)
Use environment-based model defaults (test vs production)
Create Mistral-specific prompt templates (prompts are provider-specific)
Leverage Voxtral models for audio transcription

Mistral Model Selection and Cost Analysis¶

Configuration Fields¶

Add to config.py when implementing Mistral providers (following OpenAI pattern):

# Mistral API Configuration
mistral_api_key: Optional[str] = Field(
    default=None,
    alias="mistral_api_key",
    description="Mistral API key (prefer MISTRAL_API_KEY env var or .env file)"
)

mistral_api_base: Optional[str] = Field(
    default=None,
    alias="mistral_api_base",
    description="Mistral API base URL (for E2E testing with mock servers)"
)

# Mistral Model Selection (environment-based defaults, like OpenAI)
mistral_transcription_model: str = Field(
    default_factory=_get_default_mistral_transcription_model,
    alias="mistral_transcription_model",
    description="Mistral Voxtral model for transcription (default: environment-based)"
)

mistral_speaker_model: str = Field(
    default_factory=_get_default_mistral_speaker_model,
    alias="mistral_speaker_model",
    description="Mistral model for speaker detection (default: environment-based)"
)

mistral_summary_model: str = Field(
    default_factory=_get_default_mistral_summary_model,
    alias="mistral_summary_model",
    description="Mistral model for summarization (default: environment-based)"
)

# Shared settings (like OpenAI)
mistral_temperature: float = Field(
    default=0.3,
    alias="mistral_temperature",
    description="Temperature for Mistral generation (0.0-1.0, lower = more deterministic)"
)

mistral_max_tokens: Optional[int] = Field(
    default=None,
    alias="mistral_max_tokens",
    description="Max tokens for Mistral generation (None = model default)"
)

# Mistral Prompt Configuration (following OpenAI pattern)
mistral_speaker_system_prompt: Optional[str] = Field(
    default=None,
    alias="mistral_speaker_system_prompt",
    description="Mistral system prompt for speaker detection (default: mistral/ner/system_ner_v1)"
)

mistral_speaker_user_prompt: str = Field(
    default="mistral/ner/guest_host_v1",
    alias="mistral_speaker_user_prompt",
    description="Mistral user prompt for speaker detection"
)

mistral_summary_system_prompt: Optional[str] = Field(
    default=None,
    alias="mistral_summary_system_prompt",
    description="Mistral system prompt for summarization (default: mistral/summarization/system_v1)"
)

mistral_summary_user_prompt: str = Field(
    default="mistral/summarization/long_v1",
    alias="mistral_summary_user_prompt",
    description="Mistral user prompt for summarization"
)

Environment-based defaults:

Test environment:
Transcription: voxtral-mini-latest (only option)
Speaker/Summary: mistral-small-latest (cheapest text)
Production environment:
Transcription: voxtral-mini-latest (only option)
Speaker/Summary: mistral-large-latest (best quality, 256k context)

Mistral Model Pricing¶

⚠️ Pricing Information: The pricing table below is provided for reference only and may be outdated. Always check Mistral Pricing for current rates before making cost decisions.

Model Pricing (per million tokens)¶

Model	Input	Output	Context	Notes
Mistral Large 3	$2.00	$6.00	256k	Production - best quality
Mistral Medium 3	$0.40	$2.00	256k	Balanced quality/cost
Mistral Small 3.1	$0.10	$0.30	128k	Dev/Test - cheapest text
Mistral NeMo	$0.15	$0.15	128k	Budget option
Codestral 2501	$0.20	$0.60	32k	Code generation
Pixtral Large	$0.15	$0.15	128k	Vision/multimodal
Voxtral Mini	~$0.01/min	-	-	Audio transcription

Mistral vs OpenAI vs Anthropic Comparison¶

Model Tier	Mistral	OpenAI	Anthropic
Premium	Large 3 ($2/$6)	GPT-5 ($1.25/$10)	Claude Opus ($15/$75)
Standard	Medium 3 ($0.40/$2)	GPT-5 Mini ($0.25/$2)	Claude Sonnet ($3/$15)
Budget	Small 3.1 ($0.10/$0.30)	GPT-5 Nano ($0.05/$0.40)	Claude Haiku ($0.25/$1.25)

Key Insight: Mistral Small 3.1 is the cheapest text model across all providers ($0.10/$0.30), making it ideal for development and testing.

Recommended Model Defaults¶

Purpose	Test Default	Production Default	Rationale
Speaker Detection	`mistral-small-latest`	`mistral-large-latest`	Names extraction needs quality for prod
Summarization	`mistral-small-latest`	`mistral-large-latest`	Large context (256k) for full transcripts
Transcription	`voxtral-mini-latest`	`voxtral-mini-latest`	Only Voxtral option

Cost Estimate Per Episode¶

Assuming ~10,000 words per episode transcript (~13,000 tokens):

Task	Test (Small)	Production (Large)
Speaker Detection	~$0.002	~$0.04
Summarization	~$0.003	~$0.05
Transcription (60 min)	~$0.60	~$0.60
Total per episode	~$0.61	~$0.69

Cost Comparison: All Providers (Per 100 Episodes)¶

Component	Local	OpenAI (GPT-5)	Anthropic (Sonnet)	Mistral (Large)
Transcription	Free	$36.00	❌ N/A	~$60.00
Speaker Detection	Free	$1.60	$4.00	$4.00
Summarization	Free	$4.00	$20.00	$5.00
Total	$0	$41.60	$24.00 (no transcription)	$69.00

Hybrid Strategy Recommendations¶

Cost-Optimized (Local Whisper + Mistral Small):

yaml transcription_provider: whisper # Free (local) speaker_detector_provider: mistral summary_provider: mistral mistral_speaker_model: mistral-small-latest mistral_summary_model: mistral-small-latestbash

Cost: ~$0.005/episode (cheapest cloud text processing)

Quality-Optimized (Mistral Full Stack):

yaml transcription_provider: mistral speaker_detector_provider: mistral summary_provider: mistral mistral_speaker_model: mistral-large-latest mistral_summary_model: mistral-large-latestbash

Cost: ~$0.69/episode (complete EU-based solution)

European Data Residency (All Mistral):

For organizations requiring EU data residency, Mistral is the only cloud provider option that keeps all data within the EU.

Mistral Advantages¶

Cheapest text processing - Small 3.1 at $0.10/$0.30 per million tokens
Full capability coverage - Unlike Anthropic, supports transcription
EU data residency - Based in France, data stays in EU
Largest context window - 256k tokens (Large 3)
Complete OpenAI alternative - All three capabilities covered

Non-Goals¶

Changing default behavior (local providers remain default)
Modifying existing OpenAI or Anthropic provider implementations
Adding new features beyond provider selection
API key management UI or interactive configuration
Provider fallback chains (use provider A, fallback to provider B)
Vision/image capabilities (future consideration)

Personas¶

Quality Seeker Quinn: Wants to compare Mistral vs GPT vs Claude for quality
Cost-Conscious Carol: Uses Mistral Small for development (cheapest text option)
European Enterprise Eric: Prefers EU-based provider for data residency
Full-Stack Fiona: Wants one provider for all capabilities (transcription + text)
Developer Devin: Needs to test with Mistral API during development
Privacy-First Pat: Continues using local providers exclusively

User Stories¶

As Quality Seeker Quinn, I can configure Mistral for all capabilities to compare with OpenAI/Anthropic.
As Cost-Conscious Carol, I can use Mistral Small for the cheapest cloud processing.
As European Enterprise Eric, I can use Mistral knowing data stays with EU provider.
As Full-Stack Fiona, I can use Mistral for transcription AND summarization (unlike Anthropic).
As Developer Devin, I can set my Mistral API key in environment variables and test.
As any operator, I can see which provider was used for each capability in logs.

Functional Requirements¶

FR1: Provider Selection¶

FR1.1: Add "mistral" as valid value for transcription_provider config field
FR1.2: Add "mistral" as valid value for speaker_detector_provider config field
FR1.3: Add "mistral" as valid value for summary_provider config field
FR1.4: Provider selection is independent per capability (can mix all providers)
FR1.5: Default values maintain current behavior (local providers)
FR1.6: Invalid provider values result in clear error messages with valid options listed
FR1.7: Support both Config-based and experiment-based factory modes from the start

FR2: API Key Management¶

FR2.1: Support MISTRAL_API_KEY environment variable for API authentication (like OPENAI_API_KEY)
FR2.2: Support .env file via python-dotenv for convenient configuration
FR2.3: API key is never stored in source code, config files, or committed files
FR2.4: .env file automatically loaded when config.py module is imported
FR2.5: examples/.env.example template file updated with Mistral placeholder
FR2.6: Missing API key when Mistral provider is selected results in clear error
FR2.7: API key validation occurs at provider initialization (fail fast)
FR2.8: Support MISTRAL_API_BASE environment variable for E2E testing (like OPENAI_API_BASE)

FR3: Transcription with Mistral¶

FR3.1: Mistral provider uses Voxtral models for audio transcription
FR3.2: Maintains same interface as other providers (initialize, transcribe, transcribe_with_segments, cleanup)
FR3.3: Supports timestamp granularity from Voxtral API
FR3.4: Returns results in same format as other providers (no workflow changes)
FR3.5: Handles API rate limits gracefully (retry with backoff)
FR3.6: Logs provider type used in transcription logs
FR3.7: Audio file handling compatible with Voxtral API requirements

FR4: Speaker Detection with Mistral¶

FR4.1: Mistral provider uses chat models for entity extraction
FR4.2: Maintains same interface as other providers (detect_hosts, detect_speakers, analyze_patterns)
FR4.3: Returns results in same format as other providers
FR4.4: Handles API rate limits gracefully (retry with backoff)
FR4.5: Logs provider type used in detection logs
FR4.6: Uses Mistral-specific prompt templates

FR5: Summarization with Mistral¶

FR5.1: Mistral provider uses chat models for summarization
FR5.2: Maintains same interface as other providers (initialize, summarize, cleanup)
FR5.3: Leverages large context window (256k tokens) - can process full transcripts
FR5.4: Returns results in same format as other providers
FR5.5: Uses Mistral-specific prompt templates
FR5.6: Logs provider type used in summarization logs

FR6: Prompt Management¶

FR6.1: Create Mistral-specific prompt templates in prompts/mistral/ folder
FR6.2: Prompts follow same structure as OpenAI/Anthropic but optimized for Mistral
FR6.3: Support for prompt versioning (v1, v2, etc.)
FR6.4: Configurable prompt selection via config fields

FR7: Logging and Observability¶

FR7.1: Log which provider is used for each capability at initialization
FR7.2: Log provider type in episode processing logs
FR7.3: Include provider information in metadata documents
FR7.4: Log API usage (calls, errors, rate limits) for debugging
FR7.5: No sensitive information (API keys) in logs

FR8: Error Handling¶

FR8.1: API errors result in clear error messages with provider context
FR8.2: Rate limit errors include retry information
FR8.3: Network errors are handled gracefully with retries
FR8.4: Invalid API key errors are clear and actionable
FR8.5: Audio format errors provide guidance on supported formats

Technical Requirements¶

TR1: Architecture¶

TR1.1: Follow unified provider pattern (like OpenAI) - single class implementing all three protocols
TR1.2: Create providers/mistral/mistral_provider.py with unified MistralProvider class
TR1.3: MistralProvider implements TranscriptionProvider, SpeakerDetector, and SummarizationProvider protocols
TR1.4: Update all factories to include Mistral option with support for both Config-based and experiment-based modes
TR1.5: Create prompts/mistral/ directory with provider-specific prompt templates
TR1.6: Follow OpenAI provider architecture exactly for consistency

TR2: Dependencies¶

TR2.1: Add mistralai Python package as optional dependency
TR2.2: Mistral dependency only required when Mistral provider is used
TR2.3: Lazy import Mistral client (only when provider is selected)
TR2.4: Version pinning for Mistral package

TR3: Configuration¶

TR3.1: Add Mistral provider type to all config Literal types
TR3.2: Add Mistral-specific config fields (api_key, models, temperature, etc.)
TR3.3: Config validation ensures provider + API key consistency
TR3.4: Backward compatible defaults (local providers)

TR4: Testing¶

TR4.1: Unit tests for all Mistral providers (with mocked API)
TR4.2: Integration tests with E2E server mock endpoints
TR4.3: E2E tests for complete workflow with Mistral providers
TR4.4: Tests verify same interface as other providers
TR4.5: Tests verify error handling and rate limiting
TR4.6: Backward compatibility tests (default behavior unchanged)
TR4.7: Audio transcription tests with Voxtral mock endpoint

TR5: E2E Server Extensions¶

TR5.1: Add Mistral mock endpoints to E2E HTTP server
TR5.2: Mock /v1/chat/completions endpoint for chat (speaker detection, summarization)
TR5.3: Mock /v1/audio/transcriptions endpoint for Voxtral transcription
TR5.4: Add mistral_api_base() helper to E2EServerURLs class

Success Criteria¶

✅ Users can select Mistral provider for transcription, speaker detection, and summarization via unified provider
✅ Mistral is a complete OpenAI alternative (all three capabilities)
✅ Default behavior (local providers) remains unchanged
✅ API keys are managed securely via MISTRAL_API_KEY environment variable
✅ Environment-based model defaults (test vs production)
✅ Both Config-based and experiment-based factory modes supported
✅ Mistral providers implement same interfaces as other providers
✅ No changes required to workflow.py or end-user code
✅ Error handling is clear and actionable
✅ E2E tests pass with Mistral mock endpoints
✅ Follows OpenAI provider pattern exactly for consistency

Out of Scope¶

Vision/image capabilities (Mistral supports this, future PRD)
Agent framework integration (Mistral offers this, future consideration)
Other cloud providers (AWS, Google, etc.) - future PRDs
API key management UI or interactive setup
Cost tracking or usage monitoring
Provider fallback chains

Dependencies¶

Prerequisite: Modularization refactoring (RFC-021) ✅ Completed
Prerequisite: OpenAI provider implementation (PRD-006, RFC-013) ✅ Completed
Prerequisite: Anthropic provider implementation (PRD-009, RFC-032) ✅ Completed
External: Mistral API access and API key
Internal: Provider abstraction interfaces (from refactoring)

Risks & Mitigations¶

Risk: API costs can accumulate for large batches
Mitigation: Document costs, use small model for dev/test, large for production
Risk: API rate limits may slow processing
Mitigation: Implement retry logic with backoff, document limits
Risk: API availability issues
Mitigation: Clear error messages, retry logic, fallback documentation
Risk: Voxtral audio API differences from OpenAI Whisper
Mitigation: Abstract differences in provider implementation, consistent interface
Risk: Breaking changes if Mistral API changes
Mitigation: Version pinning, comprehensive tests, monitor changelog

Provider Capability Matrix (Updated)¶

Capability	Local	OpenAI	Anthropic	Mistral
Transcription	✅ Whisper	✅ Whisper API	❌ Not supported	✅ Voxtral
Speaker Detection	✅ spaCy NER	✅ GPT API	✅ Claude API	✅ Mistral API
Summarization	✅ Transformers	✅ GPT API	✅ Claude API	✅ Mistral API

Future Considerations¶

Vision/image capabilities for show notes with images
Agent framework for automated podcast analysis workflows
Support for other providers (AWS Bedrock, Google Gemini, local LLMs)
Provider fallback chains
Cost tracking and usage monitoring
Provider performance comparison tools