RFC-033: Mistral Provider Implementation (Revised)¶
- Status: ✅ Completed (v2.5.0)
- Revision: 2
- Date: 2026-02-04
- Authors:
- Stakeholders: Maintainers, users wanting Mistral API integration, developers implementing providers
- Related PRDs:
docs/prd/PRD-010-mistral-provider-integration.md- Related RFCs:
docs/rfc/RFC-013-openai-provider-implementation.md(reference - unified provider pattern)docs/rfc/RFC-032-anthropic-provider-implementation.md(similar pattern - no transcription)docs/rfc/RFC-021-modularization-refactoring-plan.md(architecture foundation)docs/rfc/RFC-017-prompt-management.md(prompt system)
Abstract¶
Design and implement Mistral AI as a unified provider for transcription, speaker detection, and summarization capabilities. Mistral is unique among cloud providers in supporting ALL three capabilities, making it a complete OpenAI alternative. This RFC builds on the existing modularization architecture (RFC-021) and follows the unified provider pattern established by OpenAI (RFC-013), where a single provider class implements multiple protocols.
Architecture Alignment: Mistral provider follows the exact same unified provider pattern as OpenAIProvider, implementing three protocols (TranscriptionProvider, SpeakerDetector, SummarizationProvider) in a single class and integrating via the existing factory pattern with support for both Config-based and experiment-based modes.
Problem Statement¶
Users want the option to use Mistral AI as a complete alternative to OpenAI for:
- Transcription: Audio-to-text using Voxtral models
- Speaker Detection: Entity extraction using Mistral chat models
- Summarization: High-quality summaries using Mistral chat models
Unlike Anthropic, Mistral supports ALL three capabilities, making it a true OpenAI alternative.
Requirements:
- No changes to end-user experience or workflow when using defaults
- Secure API key management (environment variables, never in source code)
- Per-capability provider selection (can mix local, OpenAI, Anthropic, and Mistral)
- Build on existing modularization and provider architecture
- Use Mistral-specific prompts (prompts are provider-specific)
- Handle Voxtral API differences from OpenAI Whisper API
- Support both Config-based and experiment-based factory modes
Constraints & Assumptions¶
Constraints:
- Prerequisite: Modularization refactoring (RFC-021) ✅ Completed
- Prerequisite: OpenAI provider implementation (RFC-013) ✅ Completed
- Backward Compatibility: Default providers (local) must remain unchanged
- API Key Security: API keys must never be in source code or committed files
- Rate Limits: Must respect Mistral API rate limits and implement retry logic
- Must follow unified provider pattern (like OpenAI)
Assumptions:
- Mistral API is stable and well-documented
- Mistral Python SDK follows similar patterns to OpenAI/Anthropic SDKs
- Voxtral transcription API follows similar patterns to OpenAI Whisper API
- Prompts need to be optimized for Mistral (may differ from GPT/Claude)
Design & Implementation¶
0. Mistral API Overview¶
Mistral's API is similar to OpenAI but with some differences:
| Feature | OpenAI | Mistral |
|---|---|---|
| Chat Endpoint | /v1/chat/completions |
/v1/chat/completions |
| Transcription Endpoint | /v1/audio/transcriptions |
/v1/audio/transcriptions |
| Audio Models | whisper-1 | voxtral-mini-latest |
| Context Window | 128k tokens | 256k tokens (large) |
| Temperature Range | 0.0 - 2.0 | 0.0 - 1.0 |
| Python SDK | openai |
mistralai |
| Provider Pattern | Unified (OpenAIProvider) |
Unified (MistralProvider) |
1. Architecture Overview¶
Unified Provider Pattern (following OpenAI):
src/podcast_scraper/
├── providers/
│ └── mistral/ # NEW: Unified Mistral provider
│ ├── __init__.py
│ └── mistral_provider.py # Single class implementing 3 protocols
├── prompts/
│ └── mistral/ # NEW: Mistral-specific prompts
│ ├── ner/
│ │ ├── system_ner_v1.j2
│ │ └── guest_host_v1.j2
│ └── summarization/
│ ├── system_v1.j2
│ └── long_v1.j2
├── transcription/
│ └── factory.py # Updated: Add "mistral" option
├── speaker_detectors/
│ └── factory.py # Updated: Add "mistral" option
├── summarization/
│ └── factory.py # Updated: Add "mistral" option
└── config.py # Updated: Add Mistral fields
Key Architectural Decision: Use unified provider pattern (single MistralProvider class) matching OpenAIProvider, not separate files per capability.
2. Configuration¶
Add to config.py following OpenAI pattern exactly:
from typing import Literal, Optional
# Provider Selection (updated to include mistral)
transcription_provider: Literal["whisper", "openai", "mistral"] = Field(
default="whisper",
description="Transcription provider"
)
speaker_detector_provider: Literal["spacy", "openai", "anthropic", "mistral"] = Field(
default="spacy",
description="Speaker detection provider"
)
summary_provider: Literal["transformers", "openai", "anthropic", "mistral"] = Field(
default="transformers",
description="Summarization provider"
)
# Mistral API Configuration (following OpenAI pattern)
mistral_api_key: Optional[str] = Field(
default=None,
alias="mistral_api_key",
description="Mistral API key (prefer MISTRAL_API_KEY env var or .env file)"
)
mistral_api_base: Optional[str] = Field(
default=None,
alias="mistral_api_base",
description="Mistral API base URL (for E2E testing with mock servers)"
)
# Mistral Model Selection (environment-based defaults, like OpenAI)
mistral_transcription_model: str = Field(
default_factory=_get_default_mistral_transcription_model,
alias="mistral_transcription_model",
description="Mistral Voxtral model for transcription (default: environment-based)"
)
mistral_speaker_model: str = Field(
default_factory=_get_default_mistral_speaker_model,
alias="mistral_speaker_model",
description="Mistral model for speaker detection (default: environment-based)"
)
mistral_summary_model: str = Field(
default_factory=_get_default_mistral_summary_model,
alias="mistral_summary_model",
description="Mistral model for summarization (default: environment-based)"
)
# Shared settings (like OpenAI)
mistral_temperature: float = Field(
default=0.3,
alias="mistral_temperature",
description="Temperature for Mistral generation (0.0-1.0, lower = more deterministic)"
)
mistral_max_tokens: Optional[int] = Field(
default=None,
alias="mistral_max_tokens",
description="Max tokens for Mistral generation (None = model default)"
)
# Mistral Prompt Configuration (following OpenAI pattern)
mistral_speaker_system_prompt: Optional[str] = Field(
default=None,
alias="mistral_speaker_system_prompt",
description="Mistral system prompt for speaker detection (default: mistral/ner/system_ner_v1)"
)
mistral_speaker_user_prompt: str = Field(
default="mistral/ner/guest_host_v1",
alias="mistral_speaker_user_prompt",
description="Mistral user prompt for speaker detection"
)
mistral_summary_system_prompt: Optional[str] = Field(
default=None,
alias="mistral_summary_system_prompt",
description="Mistral system prompt for summarization (default: mistral/summarization/system_v1)"
)
mistral_summary_user_prompt: str = Field(
default="mistral/summarization/long_v1",
alias="mistral_summary_user_prompt",
description="Mistral user prompt for summarization"
)
Environment-based defaults (like OpenAI):
# In config_constants.py
TEST_DEFAULT_MISTRAL_TRANSCRIPTION_MODEL = "voxtral-mini-latest"
PROD_DEFAULT_MISTRAL_TRANSCRIPTION_MODEL = "voxtral-mini-latest" # Only option
TEST_DEFAULT_MISTRAL_SPEAKER_MODEL = "mistral-small-latest" # Cheapest text
PROD_DEFAULT_MISTRAL_SPEAKER_MODEL = "mistral-large-latest" # Best quality
TEST_DEFAULT_MISTRAL_SUMMARY_MODEL = "mistral-small-latest" # Cheapest text
PROD_DEFAULT_MISTRAL_SUMMARY_MODEL = "mistral-large-latest" # Best quality, 256k context
# In config.py
def _get_default_mistral_transcription_model() -> str:
"""Get default Mistral transcription model based on environment."""
if _is_test_environment():
return TEST_DEFAULT_MISTRAL_TRANSCRIPTION_MODEL
return PROD_DEFAULT_MISTRAL_TRANSCRIPTION_MODEL
def _get_default_mistral_speaker_model() -> str:
"""Get default Mistral speaker detection model based on environment."""
if _is_test_environment():
return TEST_DEFAULT_MISTRAL_SPEAKER_MODEL
return PROD_DEFAULT_MISTRAL_SPEAKER_MODEL
def _get_default_mistral_summary_model() -> str:
"""Get default Mistral summarization model based on environment."""
if _is_test_environment():
return TEST_DEFAULT_MISTRAL_SUMMARY_MODEL
return PROD_DEFAULT_MISTRAL_SUMMARY_MODEL
3. API Key Management¶
Follow OpenAI pattern exactly:
# In config.py
@field_validator("mistral_api_key", mode="before")
@classmethod
def _load_mistral_api_key_from_env(cls, value: Any) -> Optional[str]:
"""Load Mistral API key from environment variable if not provided."""
if value is not None:
return value
env_key = os.getenv("MISTRAL_API_KEY")
if env_key:
return env_key
return None
@field_validator("mistral_api_base", mode="before")
@classmethod
def _load_mistral_api_base_from_env(cls, value: Any) -> Optional[str]:
"""Load Mistral API base URL from environment variable if not provided."""
if value is not None:
return value
env_base = os.getenv("MISTRAL_API_BASE")
if env_base:
return env_base
return None
@model_validator(mode="after")
def _validate_mistral_provider_requirements(self) -> "Config":
"""Validate that Mistral API key is provided when Mistral providers are selected."""
mistral_providers_used = []
if self.transcription_provider == "mistral":
mistral_providers_used.append("transcription")
if self.speaker_detector_provider == "mistral":
mistral_providers_used.append("speaker_detection")
if self.summary_provider == "mistral":
mistral_providers_used.append("summarization")
if mistral_providers_used and not self.mistral_api_key:
providers_str = ", ".join(mistral_providers_used)
raise ValueError(
f"Mistral API key required for Mistral providers: {providers_str}. "
"Set MISTRAL_API_KEY environment variable or mistral_api_key in config."
)
return self
4. Unified Provider Implementation¶
File: src/podcast_scraper/providers/mistral/mistral_provider.py
Follow OpenAIProvider pattern exactly, implementing all three protocols:
"""Unified Mistral provider for transcription, speaker detection, and summarization.
This module provides a single MistralProvider class that implements three protocols:
- TranscriptionProvider (using Mistral Voxtral API)
- SpeakerDetector (using Mistral chat API)
- SummarizationProvider (using Mistral chat API)
This unified approach matches the pattern of OpenAI providers, where a single
provider type handles multiple capabilities using shared API client.
Key advantage: Mistral is the only cloud provider (besides OpenAI) that supports
ALL three capabilities, making it a complete OpenAI alternative.
Note: Uses mistralai Python SDK (not OpenAI SDK).
"""
from __future__ import annotations
import json
import logging
import os
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple
try:
from mistralai import Mistral
except ImportError:
Mistral = None # type: ignore
from ... import config, models
from ...workflow import metrics
logger = logging.getLogger(__name__)
# Default speaker names when detection fails
DEFAULT_SPEAKER_NAMES = ["Host", "Guest"]
class MistralProvider:
"""Unified Mistral provider implementing TranscriptionProvider, SpeakerDetector, and SummarizationProvider.
This provider initializes and manages:
- Mistral Voxtral API for transcription
- Mistral chat API for speaker detection
- Mistral chat API for summarization
All capabilities share the same Mistral client, similar to how OpenAI providers
share the same OpenAI client.
Key advantage: Mistral is a complete OpenAI alternative (all three capabilities).
"""
def __init__(self, cfg: config.Config):
"""Initialize unified Mistral provider.
Args:
cfg: Configuration object with settings for all capabilities
Raises:
ValueError: If Mistral API key is not provided
ImportError: If mistralai package is not installed
"""
if Mistral is None:
raise ImportError(
"mistralai package required for Mistral provider. "
"Install with: pip install 'podcast-scraper[mistral]'"
)
if not cfg.mistral_api_key:
raise ValueError(
"Mistral API key required for Mistral provider. "
"Set MISTRAL_API_KEY environment variable or mistral_api_key in config."
)
self.cfg = cfg
# Support custom base_url for E2E testing with mock servers
client_kwargs: dict[str, Any] = {"api_key": cfg.mistral_api_key}
if cfg.mistral_api_base:
client_kwargs["base_url"] = cfg.mistral_api_base
self.client = Mistral(**client_kwargs)
# Transcription settings
self.transcription_model = getattr(
cfg, "mistral_transcription_model", "voxtral-mini-latest"
)
# Speaker detection settings
self.speaker_model = getattr(cfg, "mistral_speaker_model", "mistral-small-latest")
self.speaker_temperature = getattr(cfg, "mistral_temperature", 0.3)
# Summarization settings
self.summary_model = getattr(cfg, "mistral_summary_model", "mistral-small-latest")
self.summary_temperature = getattr(cfg, "mistral_temperature", 0.3)
# Mistral Large supports 256k context window
self.max_context_tokens = 256000 # Conservative estimate
# Initialization state
self._transcription_initialized = False
self._speaker_detection_initialized = False
self._summarization_initialized = False
# Mark provider as thread-safe (API clients can be shared across threads)
self._requires_separate_instances = False
def initialize(self) -> None:
"""Initialize all Mistral capabilities.
For Mistral API, initialization is a no-op but we track it for consistency.
This method is idempotent and can be called multiple times safely.
"""
# Initialize transcription if enabled
if self.cfg.transcription_provider == "mistral" and not self._transcription_initialized:
self._initialize_transcription()
# Initialize speaker detection if enabled
if self.cfg.auto_speakers and not self._speaker_detection_initialized:
self._initialize_speaker_detection()
# Initialize summarization if enabled
if self.cfg.generate_summaries and not self._summarization_initialized:
self._initialize_summarization()
def _initialize_transcription(self) -> None:
"""Initialize transcription capability."""
logger.debug(
"Initializing Mistral transcription (model: %s)", self.transcription_model
)
self._transcription_initialized = True
def _initialize_speaker_detection(self) -> None:
"""Initialize speaker detection capability."""
logger.debug("Initializing Mistral speaker detection (model: %s)", self.speaker_model)
self._speaker_detection_initialized = True
def _initialize_summarization(self) -> None:
"""Initialize summarization capability."""
logger.debug("Initializing Mistral summarization (model: %s)", self.summary_model)
self._summarization_initialized = True
# ============================================================================
# TranscriptionProvider Protocol Implementation
# ============================================================================
def transcribe(
self, audio_path: Path, language: str | None = None
) -> str:
"""Transcribe audio file using Mistral Voxtral API.
Args:
audio_path: Path to audio file
language: Optional language code (hint for transcription)
Returns:
Transcribed text
Raises:
ValueError: If transcription fails
RuntimeError: If provider is not initialized
"""
if not self._transcription_initialized:
raise RuntimeError(
"MistralProvider transcription not initialized. Call initialize() first."
)
logger.debug("Transcribing audio via Mistral Voxtral API: %s", audio_path)
try:
# Mistral Voxtral API uses similar format to OpenAI Whisper
with open(audio_path, "rb") as audio_file:
transcription = self.client.audio.transcriptions.create(
model=self.transcription_model,
file=audio_file,
language=language,
)
text = transcription.text if hasattr(transcription, "text") else ""
if not text:
logger.warning("Mistral Voxtral API returned empty transcription")
return ""
logger.debug("Mistral transcription completed: %d characters", len(text))
return text
except Exception as exc:
logger.error("Mistral API error in transcription: %s", exc)
raise ValueError(f"Mistral transcription failed: {exc}") from exc
def transcribe_with_segments(
self, audio_path: Path, language: str | None = None
) -> tuple[str, list[dict[str, object]]]:
"""Transcribe audio file with timestamp segments using Mistral Voxtral API.
Args:
audio_path: Path to audio file
language: Optional language code (hint for transcription)
Returns:
Tuple of (transcribed text, list of segment dictionaries with start/end/text)
Raises:
ValueError: If transcription fails
RuntimeError: If provider is not initialized
"""
if not self._transcription_initialized:
raise RuntimeError(
"MistralProvider transcription not initialized. Call initialize() first."
)
logger.debug("Transcribing audio with segments via Mistral Voxtral API: %s", audio_path)
try:
with open(audio_path, "rb") as audio_file:
transcription = self.client.audio.transcriptions.create(
model=self.transcription_model,
file=audio_file,
language=language,
response_format="verbose_json", # Request segments
timestamp_granularities=["segment"],
)
text = transcription.text if hasattr(transcription, "text") else ""
segments = []
if hasattr(transcription, "segments"):
segments = [
{
"start": seg.get("start", 0.0),
"end": seg.get("end", 0.0),
"text": seg.get("text", ""),
}
for seg in transcription.segments
]
logger.debug(
"Mistral transcription with segments completed: %d characters, %d segments",
len(text),
len(segments),
)
return text, segments
except Exception as exc:
logger.error("Mistral API error in transcription with segments: %s", exc)
raise ValueError(f"Mistral transcription failed: {exc}") from exc
# ============================================================================
# SpeakerDetector Protocol Implementation
# ============================================================================
def detect_hosts(
self,
feed_title: str | None,
feed_description: str | None,
feed_authors: list[str] | None = None,
) -> Set[str]:
"""Detect host names from feed-level metadata using Mistral API.
Args:
feed_title: Feed title (can be None)
feed_description: Optional feed description
feed_authors: Optional list of author names from RSS feed (preferred source)
Returns:
Set of detected host names
"""
if not self._speaker_detection_initialized:
raise RuntimeError(
"MistralProvider speaker detection not initialized. Call initialize() first."
)
# Prefer RSS author tags if available (like OpenAI)
if feed_authors:
return set(feed_authors)
# Otherwise, use Mistral API to detect hosts from feed metadata
if not feed_title:
return set()
try:
# Use detect_speakers with empty known_hosts to detect hosts
speakers, detected_hosts, _ = self.detect_speakers(
episode_title=feed_title,
episode_description=feed_description,
known_hosts=set(),
)
return detected_hosts
except Exception as exc:
logger.warning("Failed to detect hosts from feed metadata: %s", exc)
return set()
def detect_speakers(
self,
episode_title: str,
episode_description: str | None,
known_hosts: Set[str],
pipeline_metrics: metrics.Metrics | None = None,
) -> Tuple[list[str], Set[str], bool]:
"""Detect speaker names from episode metadata using Mistral API.
Args:
episode_title: Episode title
episode_description: Optional episode description
known_hosts: Set of known host names (for context)
pipeline_metrics: Optional metrics tracker
Returns:
Tuple of:
- List of detected speaker names (hosts + guests)
- Set of detected host names (subset of known_hosts)
- Success flag (True if detection succeeded)
Raises:
ValueError: If detection fails or API key is invalid
RuntimeError: If provider is not initialized
"""
# If auto_speakers is disabled, return defaults without requiring initialization
if not self.cfg.auto_speakers:
logger.debug("Auto-speakers disabled, detection failed")
return DEFAULT_SPEAKER_NAMES.copy(), set(), False
if not self._speaker_detection_initialized:
raise RuntimeError(
"MistralProvider speaker detection not initialized. Call initialize() first."
)
logger.debug("Detecting speakers via Mistral API for episode: %s", episode_title[:50])
try:
# Build prompt using prompt_store (RFC-017)
user_prompt = self._build_speaker_detection_prompt(
episode_title, episode_description, known_hosts
)
# Get system prompt from prompt_store
from ...prompts.store import render_prompt
system_prompt_name = (
self.cfg.mistral_speaker_system_prompt or "mistral/ner/system_ner_v1"
)
system_prompt = render_prompt(system_prompt_name)
# Call Mistral API (similar to OpenAI format)
response = self.client.chat.complete(
model=self.speaker_model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=self.speaker_temperature,
max_tokens=300,
)
response_text = response.choices[0].message.content
if not response_text:
logger.warning("Mistral API returned empty response")
return DEFAULT_SPEAKER_NAMES.copy(), set(), False
# Parse JSON response
speakers, detected_hosts, success = self._parse_speakers_from_response(
response_text, known_hosts
)
logger.debug(
"Mistral speaker detection completed: %d speakers, %d hosts, success=%s",
len(speakers),
len(detected_hosts),
success,
)
# Track LLM call metrics if available
if pipeline_metrics is not None and hasattr(response, "usage"):
input_tokens = response.usage.prompt_tokens if response.usage else 0
output_tokens = response.usage.completion_tokens if response.usage else 0
pipeline_metrics.record_llm_speaker_detection_call(input_tokens, output_tokens)
return speakers, detected_hosts, success
except json.JSONDecodeError as exc:
logger.error("Failed to parse Mistral API JSON response: %s", exc)
return DEFAULT_SPEAKER_NAMES.copy(), set(), False
except Exception as exc:
logger.error("Mistral API error in speaker detection: %s", exc)
raise ValueError(f"Mistral speaker detection failed: {exc}") from exc
def analyze_patterns(
self,
episodes: list[models.Episode],
known_hosts: Set[str],
) -> dict[str, object] | None:
"""Analyze patterns across multiple episodes (optional).
For Mistral provider, pattern analysis is not implemented.
Returns None to use local pattern analysis logic.
"""
return None
def _build_speaker_detection_prompt(
self, episode_title: str, episode_description: str | None, known_hosts: Set[str]
) -> str:
"""Build user prompt for speaker detection using prompt_store."""
from ...prompts.store import render_prompt
user_prompt_name = self.cfg.mistral_speaker_user_prompt
user_prompt = render_prompt(
user_prompt_name,
episode_title=episode_title,
episode_description=episode_description or "",
known_hosts=", ".join(known_hosts) if known_hosts else "",
)
return user_prompt
def _parse_speakers_from_response(
self, response_text: str, known_hosts: Set[str]
) -> Tuple[list[str], Set[str], bool]:
"""Parse speaker names from Mistral API response."""
try:
data = json.loads(response_text)
if isinstance(data, dict):
speakers = data.get("speakers", [])
hosts = set(data.get("hosts", []))
guests = data.get("guests", [])
all_speakers = list(hosts) + guests if not speakers else speakers
return all_speakers, hosts, True
except json.JSONDecodeError:
pass
# Fallback: parse from plain text
speakers = []
for line in response_text.strip().split("\n"):
for name in line.split(","):
name = name.strip().strip("-").strip("*").strip()
if name and len(name) > 1:
speakers.append(name)
detected_hosts = set(s for s in speakers if s in known_hosts)
return speakers, detected_hosts, len(speakers) > 0
# ============================================================================
# SummarizationProvider Protocol Implementation
# ============================================================================
def summarize(
self,
text: str,
episode_title: Optional[str] = None,
episode_description: Optional[str] = None,
params: Optional[Dict[str, Any]] = None,
pipeline_metrics: metrics.Metrics | None = None,
) -> Dict[str, Any]:
"""Summarize text using Mistral chat API.
Can handle full transcripts directly due to large context window (256k tokens).
No chunking needed for most podcast transcripts.
Args:
text: Transcript text to summarize
episode_title: Optional episode title
episode_description: Optional episode description
params: Optional parameters dict with max_length, min_length, etc.
pipeline_metrics: Optional metrics tracker
Returns:
Dictionary with summary results:
{
"summary": str,
"summary_short": Optional[str],
"metadata": {...}
}
Raises:
ValueError: If summarization fails
RuntimeError: If provider is not initialized
"""
if not self._summarization_initialized:
raise RuntimeError(
"MistralProvider summarization not initialized. Call initialize() first."
)
# Extract parameters with defaults from config
max_length = (
(params.get("max_length") if params else None)
or self.cfg.summary_reduce_params.get("max_new_tokens")
or 800
)
min_length = (
(params.get("min_length") if params else None)
or self.cfg.summary_reduce_params.get("min_new_tokens")
or 100
)
custom_prompt = params.get("prompt") if params else None
logger.debug(
"Summarizing text via Mistral API (model: %s, max_tokens: %d)",
self.summary_model,
max_length,
)
try:
# Build prompts using prompt_store (RFC-017)
(
system_prompt,
user_prompt,
system_prompt_name,
user_prompt_name,
paragraphs_min,
paragraphs_max,
) = self._build_summarization_prompts(
text, episode_title, episode_description, max_length, min_length, custom_prompt
)
# Call Mistral API (similar to OpenAI format)
response = self.client.chat.complete(
model=self.summary_model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=self.summary_temperature,
max_tokens=max_length,
)
summary = response.choices[0].message.content
if not summary:
logger.warning("Mistral API returned empty summary")
summary = ""
logger.debug("Mistral summarization completed: %d characters", len(summary))
# Track LLM call metrics if available
if pipeline_metrics is not None and hasattr(response, "usage"):
input_tokens = response.usage.prompt_tokens if response.usage else 0
output_tokens = response.usage.completion_tokens if response.usage else 0
pipeline_metrics.record_llm_summarization_call(input_tokens, output_tokens)
# Get prompt metadata for tracking (RFC-017)
from ...prompts.store import get_prompt_metadata
prompt_metadata = {}
if system_prompt_name:
prompt_metadata["system"] = get_prompt_metadata(system_prompt_name)
user_params = {
"transcript": text[:100] + "..." if len(text) > 100 else text,
"title": episode_title or "",
"paragraphs_min": paragraphs_min,
"paragraphs_max": paragraphs_max,
}
user_params.update(self.cfg.summary_prompt_params)
prompt_metadata["user"] = get_prompt_metadata(user_prompt_name, params=user_params)
return {
"summary": summary,
"summary_short": None, # Mistral provider doesn't generate short summaries separately
"metadata": {
"model": self.summary_model,
"provider": "mistral",
"max_length": max_length,
"min_length": min_length,
"prompts": prompt_metadata,
},
}
except Exception as exc:
logger.error("Mistral API error in summarization: %s", exc)
raise ValueError(f"Mistral summarization failed: {exc}") from exc
def _build_summarization_prompts(
self,
text: str,
episode_title: Optional[str],
episode_description: Optional[str],
max_length: int,
min_length: int,
custom_prompt: Optional[str],
) -> tuple[str, str, Optional[str], str, int, int]:
"""Build system and user prompts for summarization using prompt_store (RFC-017)."""
from ...prompts.store import render_prompt
system_prompt_name = (
self.cfg.mistral_summary_system_prompt or "mistral/summarization/system_v1"
)
user_prompt_name = self.cfg.mistral_summary_user_prompt
system_prompt = render_prompt(system_prompt_name)
paragraphs_min = max(1, min_length // 100)
paragraphs_max = max(paragraphs_min, max_length // 100)
if custom_prompt:
user_prompt = custom_prompt.replace("{{ transcript }}", text)
if episode_title:
user_prompt = user_prompt.replace("{{ title }}", episode_title)
user_prompt_name = "custom"
else:
template_params = {
"transcript": text,
"title": episode_title or "",
"paragraphs_min": paragraphs_min,
"paragraphs_max": paragraphs_max,
}
template_params.update(self.cfg.summary_prompt_params)
user_prompt = render_prompt(user_prompt_name, **template_params)
return (
system_prompt,
user_prompt,
system_prompt_name,
user_prompt_name,
paragraphs_min,
paragraphs_max,
)
# ============================================================================
# Cleanup Methods
# ============================================================================
def cleanup(self) -> None:
"""Cleanup all provider resources (no-op for API provider)."""
self._transcription_initialized = False
self._speaker_detection_initialized = False
self._summarization_initialized = False
def clear_cache(self) -> None:
"""Clear cache (no-op for API provider)."""
pass
@property
def is_initialized(self) -> bool:
"""Check if provider is initialized (any component)."""
return (
self._transcription_initialized
or self._speaker_detection_initialized
or self._summarization_initialized
)
5. Factory Updates¶
Update all three factories to support both Config-based and experiment-based modes (like OpenAI):
File: src/podcast_scraper/transcription/factory.py
def create_transcription_provider(
cfg_or_provider_type: Union[config.Config, str],
params: Optional[Union[TranscriptionParams, Dict[str, Any]]] = None,
) -> TranscriptionProvider:
# ... existing code ...
elif provider_type == "mistral":
from ..providers.mistral.mistral_provider import MistralProvider
if experiment_mode:
from ..config import Config
assert isinstance(params, TranscriptionParams)
cfg = Config(
rss="",
transcription_provider="mistral",
mistral_transcription_model=params.model_name if params.model_name else "voxtral-mini-latest",
mistral_api_key=os.getenv("MISTRAL_API_KEY"),
)
return MistralProvider(cfg)
else:
return MistralProvider(cfg)
else:
raise ValueError(
f"Unsupported transcription provider type: {provider_type}. "
"Supported types: 'whisper', 'openai', 'mistral'"
)
Similar updates for speaker_detectors/factory.py and summarization/factory.py.
6. Dependencies¶
Add to pyproject.toml:
[project.optional-dependencies]
mistral = [
"mistralai>=1.0.0,<2.0.0",
]
7. Prompt Templates¶
Create Mistral-specific prompts in src/podcast_scraper/prompts/mistral/:
ner/system_ner_v1.j2- System prompt for speaker detectionner/guest_host_v1.j2- User prompt for speaker detectionsummarization/system_v1.j2- System prompt for summarizationsummarization/long_v1.j2- User prompt for summarization
Follow OpenAI prompt patterns but optimize for Mistral models.
Testing Strategy¶
Same pattern as OpenAI provider:
- Unit tests: Mock Mistral API responses
- Integration tests: Use E2E mock server with Mistral endpoints
- E2E tests: Full workflow with Mistral provider
Success Criteria¶
- ✅ Mistral supports transcription, speaker detection, and summarization via unified provider
- ✅ Mistral is a complete OpenAI alternative (all three capabilities)
- ✅ Free tier works for development (Small model)
- ✅ E2E tests pass
- ✅ Experiment mode supported from start
- ✅ Environment-based model defaults (test vs prod)
- ✅ Follows OpenAI provider pattern exactly
Migration Notes¶
- Breaking Changes: None (new provider, backward compatible)
- Configuration: Add
MISTRAL_API_KEYto.envfile - Dependencies: Install with
pip install 'podcast-scraper[mistral]'
References¶
- Related PRD:
docs/prd/PRD-010-mistral-provider-integration.md - Reference Implementation:
src/podcast_scraper/providers/openai/openai_provider.py - Mistral API Documentation: https://docs.mistral.ai/
- Mistral Python SDK: https://github.com/mistralai/mistral-python
- Voxtral Documentation: https://docs.mistral.ai/capabilities/audio_transcription