Provider Configuration Quick Reference¶
This guide shows how to configure providers (transcription, speaker detection, summarization) via CLI, config files, and programmatically.
Provider Options¶
Unified Provider Architecture¶
The podcast scraper uses a Unified Provider pattern where a single class implementation handles multiple capabilities.
MLProvider(Local): Handleswhisper(transcription),spacy(speaker detection), andtransformers(summarization).HybridMLProvider(Local): Combines local ML MAP + LLM REDUCE for summarization (RFC-042).OpenAIProvider(API): Handles OpenAI-based transcription and summarization (no speaker detection).GeminiProvider(API): Handles Google Gemini-based transcription and summarization (speaker detection not supported).AnthropicProvider(API): Handles Anthropic Claude-based summarization only (no transcription or speaker detection).MistralProvider(API): Handles Mistral-based summarization only (EU data residency).DeepSeekProvider(API): Handles DeepSeek-based summarization only (ultra low-cost).GrokProvider(API): Handles Grok-based summarization only (real-time information access).OllamaProvider(Local): Handles Ollama-based transcription, speaker detection, and summarization (self-hosted LLMs).
Transcription Providers¶
whisper(default): Local Whisper models (viaMLProvider)openai: OpenAI Whisper API (viaOpenAIProvider)gemini: Google Gemini API (viaGeminiProvider)ollama: Local Ollama LLMs (viaOllamaProvider)
Speaker Detection Providers¶
spacy(default): Local spaCy NER models (viaMLProvider)ollama: Local Ollama LLMs (viaOllamaProvider)
Summarization Providers¶
transformers(default): Local HuggingFace Transformers models (viaMLProvider)hybrid_ml: Local MAP-REDUCE with LLM REDUCE (viaHybridMLProvider)openai: OpenAI GPT API (viaOpenAIProvider)gemini: Google Gemini API (viaGeminiProvider)anthropic: Anthropic Claude API (viaAnthropicProvider) - high quality, 200k contextmistral: Mistral API (viaMistralProvider) - EU data residencydeepseek: DeepSeek Chat API (viaDeepSeekProvider) - 95% cheaper than OpenAIgrok: Grok API (viaGrokProvider) - real-time information accessollama: Local Ollama LLMs (viaOllamaProvider) - zero cost, complete privacy
Configuration Methods¶
1. Command Line Interface (CLI)¶
Use --transcription-provider, --speaker-detector-provider, and --summary-provider flags:
# Use all local ML providers (default)
podcast-scraper --rss https://example.com/feed.xml
# Use OpenAI for transcription
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider openai \
--openai-api-key sk-your-key-here
# Use OpenAI for summarization
podcast-scraper --rss https://example.com/feed.xml \
--summary-provider openai \
--openai-api-key sk-your-key-here
# Mixed configuration: Whisper transcription + Ollama speaker detection + Local summarization
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider whisper \
--speaker-detector-provider ollama \
--summary-provider transformers
# OpenAI transcription + summarization (speaker detection not supported)
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider openai \
--summary-provider openai \
--openai-api-key sk-your-key-here
# Use Gemini for transcription
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider gemini \
--gemini-api-key your-key-here
# Use Gemini for summarization
podcast-scraper --rss https://example.com/feed.xml \
--summary-provider gemini \
--gemini-api-key your-key-here
# Gemini transcription + summarization (speaker detection not supported)
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider gemini \
--summary-provider gemini \
--gemini-api-key your-key-here
OpenAI API Key Options:
- Set via
--openai-api-keyflag - Set via
OPENAI_API_KEYenvironment variable - Set via
.envfile in project root
Custom OpenAI Base URL (for E2E testing):
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider openai \
--openai-api-base http://localhost:8000/v1 \
--openai-api-key sk-test123
Gemini API Key Options:
- Set via
--gemini-api-keyflag - Set via
GEMINI_API_KEYenvironment variable - Set via
.envfile in project root
Custom Gemini Base URL (for E2E testing):
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider gemini \
--gemini-api-base http://localhost:8000/v1beta \
--gemini-api-key test123
Ollama Setup (No API Key Required):
Ollama is a local, self-hosted solution. No API key needed, but requires setup:
# 1. Install Ollama
brew install ollama # macOS
# Or download from https://ollama.ai
# 2. Start Ollama server (keep running)
ollama serve
# 3. Pull required models
ollama pull llama3.3:latest # Production
ollama pull llama3.2:latest # Testing (faster)
# 4. Verify setup
ollama list # Should show your models
Ollama Configuration Options:
- Set via
--ollama-api-baseflag (default:http://localhost:11434/v1) - Set via
OLLAMA_API_BASEenvironment variable - Set via
.envfile in project root - No API key required (local service)
Custom Ollama Base URL (for remote Ollama server):
podcast-scraper --rss https://example.com/feed.xml \
--speaker-detector-provider ollama \
--ollama-api-base http://192.168.1.100:11434/v1
Troubleshooting Ollama:
- If
ollama listhangs: Server not running - start withollama serve - If "model not available": Pull model with
ollama pull <model-name> - If connection refused: Check server is running:
curl http://localhost:11434/api/tags
See Ollama Provider Guide for detailed troubleshooting.
2. Configuration File (YAML/JSON)¶
Create a config file (e.g., config.yaml) with provider settings:
# config.yaml
rss: https://example.com/feed.xml
output_dir: ./transcripts
# Provider configuration
transcription_provider: whisper # or "openai", "gemini", "ollama"
speaker_detector_provider: spacy # or "ollama"
summary_provider: transformers # or "hybrid_ml", "openai", "gemini", "anthropic", "mistral", "deepseek", "grok", "ollama"
# OpenAI configuration (required if using OpenAI providers)
openai_api_key: sk-your-key-here # Optional: can use OPENAI_API_KEY env var instead
openai_api_base: null # Optional: custom base URL (e.g., "http://localhost:8000/v1" for E2E testing)
# Gemini configuration (required if using Gemini providers)
gemini_api_key: your-key-here # Optional: can use GEMINI_API_KEY env var instead
gemini_api_base: null # Optional: custom base URL (e.g., "http://localhost:8000/v1beta" for E2E testing)
# Mistral configuration (required if using Mistral provider)
# Mistral supports summarization only, with EU data residency
mistral_api_key: your-key-here # Optional: can use MISTRAL_API_KEY env var instead
mistral_api_base: null # Optional: custom base URL (e.g., "http://localhost:8000/v1" for E2E testing)
# Transcription settings (for whisper provider)
transcribe_missing: true
whisper_model: base # or "tiny", "small", "medium", "large", etc.
# Speaker detection settings (for spacy provider)
auto_speakers: true
ner_model: en_core_web_trf # spaCy model name. Options: "en_core_web_trf" (default/prod, higher quality), "en_core_web_sm" (dev, fast). Defaults based on environment
# Summarization settings (for local provider)
generate_summaries: true
summary_mode_id: ml_prod_authority_v1 # Optional (RFC-044). Uses promoted baseline defaults from registry.
summary_model: pegasus-cnn # Transformers model alias. Options: "pegasus-cnn" (default/prod), "bart-small" (dev), "bart-large", "fast", "pegasus", "long", "long-fast"
summary_device: cpu # or "cuda", "mps"
mps_exclusive: true # Serialize GPU work on MPS to prevent memory contention (default: true)
# Hybrid MAP-REDUCE (RFC-042) — MAP (LongT5) + REDUCE (transformers / Ollama / llama_cpp)
# summary_provider: hybrid_ml
# hybrid_map_model: longt5-base # MAP model (chunk summarization)
# hybrid_reduce_model: google/flan-t5-base # REDUCE: HF ID (transformers), Ollama tag (ollama), or .gguf path (llama_cpp)
# hybrid_reduce_backend: transformers # Options: transformers | ollama | llama_cpp
# hybrid_reduce_device: mps # For transformers backend (mps | cuda | cpu)
# Grounded Insights (GIL) — optional; writes gi.json per episode (see [Grounded Insights Guide](GROUNDED_INSIGHTS_GUIDE.md))
# generate_gi: false
# embedding_model: sentence-transformers/all-MiniLM-L6-v2 # Evidence stack (lazy load when GIL enabled)
# extractive_qa_model: deepset/roberta-base-squad2
# nli_model: cross-encoder/nli-deberta-v3-base
JSON format:
{
"rss": "https://example.com/feed.xml",
"output_dir": "./transcripts",
"transcription_provider": "whisper",
"speaker_detector_provider": "spacy",
"summary_provider": "transformers",
"summary_mode_id": "ml_prod_authority_v1",
"openai_api_key": "sk-your-key-here",
"gemini_api_key": "your-key-here",
"transcribe_missing": true,
"whisper_model": "base",
"auto_speakers": true,
"generate_summaries": true
}
Use config file:
podcast-scraper --config config.yaml
Config file with CLI overrides:
# Config file sets defaults, CLI flags override
podcast-scraper --config config.yaml --transcription-provider openai
3. Programmatic (Library API)¶
Create a Config object and pass it to run_pipeline():
from podcast_scraper import Config, run_pipeline
# All local ML providers (default)
cfg = Config(
rss_url="https://example.com/feed.xml",
output_dir="./transcripts",
transcription_provider="whisper", # default
speaker_detector_provider="spacy", # default
summary_provider="transformers", # default
)
# OpenAI transcription
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="openai",
openai_api_key="sk-your-key-here", # or set OPENAI_API_KEY env var
)
# OpenAI summarization
cfg = Config(
rss_url="https://example.com/feed.xml",
summary_provider="openai",
openai_api_key="sk-your-key-here",
generate_summaries=True,
)
# Gemini transcription
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="gemini",
gemini_api_key="your-key-here", # or set GEMINI_API_KEY env var
)
# Gemini summarization
cfg = Config(
rss_url="https://example.com/feed.xml",
summary_provider="gemini",
gemini_api_key="your-key-here",
generate_summaries=True,
)
# Mixed configuration
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="whisper", # MLProvider
speaker_detector_provider="ollama", # OllamaProvider
summary_provider="transformers", # MLProvider
)
# OpenAI transcription + summarization (speaker detection not supported)
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="openai",
summary_provider="openai",
openai_api_key="sk-your-key-here",
transcribe_missing=True,
generate_summaries=True,
)
# Gemini transcription + summarization (speaker detection not supported)
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="gemini",
summary_provider="gemini",
gemini_api_key="your-key-here",
transcribe_missing=True,
generate_summaries=True,
)
# Run pipeline
count, summary = run_pipeline(cfg)
print(f"Processed {count} episodes: {summary}")
Load from config file programmatically:
from podcast_scraper import Config, load_config_file
# Load config file
config_dict = load_config_file("config.yaml")
cfg = Config(**config_dict)
# Override provider settings
cfg = Config(
**config_dict,
transcription_provider="openai", # Override from file
openai_api_key="sk-your-key-here",
)
# Run pipeline
count, summary = run_pipeline(cfg)
Custom OpenAI base URL (for E2E testing):
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="openai",
openai_api_base="http://localhost:8000/v1", # E2E server
openai_api_key="sk-test123",
)
Configuration Priority¶
When using multiple methods, priority is:
- CLI arguments (highest priority)
- Config file
- Environment variables
- Defaults (lowest priority)
Example:
# config.yaml has: transcription_provider: whisper
# CLI has: --transcription-provider openai
# Result: openai (CLI overrides config file)
podcast-scraper --config config.yaml --transcription-provider openai
Environment Variables¶
You can also set provider-related settings via environment variables:
# OpenAI API key
export OPENAI_API_KEY=sk-your-key-here
# OpenAI API base URL (for E2E testing)
export OPENAI_API_BASE=http://localhost:8000/v1
# Gemini API key
export GEMINI_API_KEY=your-key-here
# Gemini API base URL (for E2E testing)
export GEMINI_API_BASE=http://localhost:8000/v1beta
# Mistral API key
export MISTRAL_API_KEY=your-key-here
# Mistral API base URL (for E2E testing)
export MISTRAL_API_BASE=http://localhost:8000/v1
# Then use in CLI or config
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider openai
# Or with Gemini
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider gemini
# Or with Mistral (summarization only)
podcast-scraper --rss https://example.com/feed.xml \
--summary-provider mistral
Common Configuration Patterns¶
Pattern 1: All Local (Default)¶
transcription_provider: whisper
speaker_detector_provider: spacy
summary_provider: transformers
- Fast, no API costs
- Requires ML models to be installed/cached
- Works offline
Pattern 2: OpenAI (Transcription + Summarization)¶
transcription_provider: openai
speaker_detector_provider: spacy # OpenAI does not support speaker detection
summary_provider: openai
openai_api_key: sk-your-key-here
- No local ML models needed for transcription/summarization
- API costs per request
- Requires internet connection
Pattern 2b: Gemini (Transcription + Summarization)¶
transcription_provider: gemini
speaker_detector_provider: spacy # Gemini does not support speaker detection
summary_provider: gemini
gemini_api_key: your-key-here
Pattern 2c: Mistral Summarization (+ Local for Other Capabilities)¶
transcription_provider: whisper # Mistral supports summarization only
speaker_detector_provider: spacy # Mistral supports summarization only
summary_provider: mistral
mistral_api_key: your-key-here
- EU data residency (compliance-friendly)
- Competitive pricing for summarization
- Requires internet connection for Mistral; local ML for transcription/speaker detection
Pattern 3: Hybrid (Local + API)¶
transcription_provider: whisper # Local (fast, free)
speaker_detector_provider: ollama # Local Ollama (accurate, free)
summary_provider: openai # API (highest quality)
openai_api_key: sk-your-key-here
- Balance between cost and performance
- Use local for heavy operations, API for quality
Pattern 4: All Ollama (Local Self-Hosted)¶
transcription_provider: ollama
speaker_detector_provider: ollama
summary_provider: ollama
ollama_api_base: http://localhost:11434/v1 # Default, can be omitted
ollama_speaker_model: llama3.1:8b # or mistral:7b for speed
ollama_summary_model: qwen2.5:7b # or gemma2:9b for quality
# Note: Model-specific prompts are automatically selected based on model name
- Zero API costs (all processing on local hardware)
- Complete privacy (data never leaves your machine)
- Works offline/air-gapped
- Requires Ollama installed and models pulled
Prerequisites:
- Install Ollama:
brew install ollama(macOS) or download - Start server:
ollama serve(keep running) - Pull models:
- Dev/test (4GB+):
ollama pull phi3:mini - Fast speaker detection (8GB+):
ollama pull mistral:7b - General purpose (8GB+):
ollama pull llama3.1:8b(default) - Best JSON/GIL (8GB+):
ollama pull qwen2.5:7b(recommended) - Qwen 3.5 (9B / 27B / 35B): Ollama Provider Guide — Qwen 3.5 checklist
- Balanced quality (12GB+):
ollama pull gemma2:9b - Verify:
ollama listshould show your models
See Ollama Provider Guide for detailed installation and troubleshooting.
Pattern 5: Transcription Only¶
transcription_provider: whisper
transcribe_missing: true
# No speaker detection or summarization
auto_speakers: false
generate_summaries: false
Validation¶
Invalid provider types will raise ValueError:
# ❌ Invalid - will raise ValueError
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="invalid", # Not "whisper", "openai", or "gemini"
)
# ✅ Valid
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="whisper", # Valid option
)
Missing API key when using API providers will raise ValueError:
# ❌ Invalid - will raise ValueError
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="openai",
# Missing openai_api_key
)
# ✅ Valid
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="openai",
openai_api_key="sk-your-key-here", # Required
)
# ❌ Invalid - will raise ValueError
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="gemini",
# Missing gemini_api_key
)
# ✅ Valid
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="gemini",
gemini_api_key="your-key-here", # Required
)
Quick Examples¶
Example 1: Basic Local Setup¶
podcast-scraper --rss https://example.com/feed.xml \
--transcribe-missing \
--auto-speakers \
--generate-summaries
Example 2: OpenAI Transcription Only¶
export OPENAI_API_KEY=sk-your-key-here
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider openai \
--transcribe-missing
Example 2b: Gemini Transcription Only¶
export GEMINI_API_KEY=your-key-here
podcast-scraper --rss https://example.com/feed.xml \
--transcription-provider gemini \
--transcribe-missing
Example 3: Mixed Providers (Config File)¶
# config.yaml
rss: https://example.com/feed.xml
transcription_provider: whisper
speaker_detector_provider: ollama
summary_provider: transformers
podcast-scraper --config config.yaml
Example 4: Programmatic Mixed Providers¶
from podcast_scraper import Config, run_pipeline
cfg = Config(
rss_url="https://example.com/feed.xml",
transcription_provider="whisper",
speaker_detector_provider="ollama",
summary_provider="transformers",
transcribe_missing=True,
auto_speakers=True,
generate_summaries=True,
)
count, summary = run_pipeline(cfg)
See Also¶
- Provider Implementation Guide - How providers work internally
- Configuration API Reference - Full configuration options
- Development Guide - Development setup