AI Provider Comparison Guide¶
Your decision-making resource for choosing the right AI provider.
A focused analysis of summarization and capability providers supported by podcast_scraper: local ML, hybrid MAP-REDUCE (hybrid_ml), and 7 LLM providers. This guide answers "which provider should I pick?" with decision matrices, cost analysis, and empirical conclusions.
Companion pages:
- Provider Deep Dives — per-provider reference cards, magic quadrant, visual comparisons
- Evaluation Reports — methodology, metric definitions, and the full library of measured comparison reports
Implementation Status¶
All providers below are implemented and acceptance-tested (v2.4.0+).
| Provider | Status | RFC | Notes |
|---|---|---|---|
| Local ML | ✅ Implemented | - | Default provider (Whisper + spaCy + Transformers): transcription, speaker detection, summarization |
| Hybrid ML | ✅ Implemented | RFC-042 | Summarization only: MAP (LongT5) + REDUCE (transformers / Ollama / llama_cpp) |
| OpenAI | ✅ Implemented | RFC-013 | Transcription + summarization (Whisper API + GPT API) |
| Gemini | ✅ Implemented | RFC-035 | Transcription + summarization (no speaker detection) |
| Mistral | ✅ Implemented | RFC-033 | Summarization only (EU data residency) |
| Anthropic | ✅ Implemented | RFC-032 | Summarization only (no transcription or speaker detection) |
| DeepSeek | ✅ Implemented | RFC-034 | Summarization only; ultra low-cost |
| Grok | ✅ Implemented | RFC-036 | Summarization only; real-time information access |
| Ollama | ✅ Implemented | RFC-037 | Transcription, speaker detection, summarization; local self-hosted LLMs, zero cost, complete privacy |
For hybrid_ml (MAP-REDUCE) configuration and REDUCE backends (Ollama, llama_cpp, transformers), see ML Provider Reference and Configuration API.
Key Statistics at a Glance¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROVIDER LANDSCAPE OVERVIEW │
├─────────────────────────────────────────────────────────────────────────────┤
│ 9 Summarization Options │ (Hybrid = MAP+REDUCE) │ 3 Full-Stack Ready │
│ ════════════════════ │ ═══════════════════════ │ ═══════════════ │
│ ✅ Local ML │ ✅ Hybrid ML (RFC-042) │ ✅ Local ML │
│ ✅ Hybrid ML │ MAP + Ollama/llama_cpp │ ✅ OpenAI (tx+sum) │
│ ✅ OpenAI │ or transformers REDUCE │ ✅ Ollama │
│ ✅ Gemini │ │ │
│ ✅ Mistral │ │ │
│ ✅ Anthropic / DeepSeek │ │ │
│ ✅ Grok / Ollama │ │ │
├─────────────────────────────────────────────────────────────────────────────┤
│ COST SPECTRUM (per 100 episodes) │
│ │
│ $0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ $37 │
│ │ │ │
│ ▼ ▼ │
│ Local/Ollama OpenAI (full) │
│ ($0) ($37) │
│ │
│ DeepSeek ─── Grok ─── Anthropic ─── Gemini ─── OpenAI (text) ─── OpenAI │
│ ($0.02) ($0.03) ($0.40) ($0.95) ($0.55) ($37) │
└─────────────────────────────────────────────────────────────────────────────┘
Quick Decision Matrix¶
| If you need... | Choose | Why |
|---|---|---|
| Complete Privacy | Local ML / Hybrid ML / Ollama | Data never leaves your device |
| Lowest Cost | Local ML / Hybrid ML / Ollama | $0 (just electricity) |
| Highest Quality | OpenAI | Industry leader (measured) |
| Full Capabilities | Local ML / Ollama | All 3 capabilities (transcription + speaker detection + summarization) |
| Local MAP + LLM REDUCE | Hybrid ML (Ollama/llama_cpp) | LongT5 MAP + local LLM synthesis (RFC-042) |
| Real-Time Info | Grok | Real-time information access (RFC-036) |
| Lowest Cloud Cost | DeepSeek | 95% cheaper than OpenAI (RFC-034) |
| EU Data Residency | Mistral | European servers (RFC-033) |
| Huge Context | Gemini | 2 million token window (RFC-035) |
| Free Development | Gemini / Grok | Generous free tiers (RFC-035, RFC-036) |
| Self-Hosted | Ollama | Offline/air-gapped (RFC-037) |
Empirical Highlights¶
All claims below are backed by measured data. For the full metrics tables, methodology, and metric definitions, see the Evaluation Reports.
Cloud providers (vs silver GPT-4o reference)¶
Best non-OpenAI cloud: Gemini (gemini-2.0-flash) — highest ROUGE-L (33.3%)
and embedding similarity (87.3%) among non-OpenAI providers, with the fastest latency
(2.7s/ep). Mistral (mistral-small-latest) is a close second (32.5% ROUGE-L,
2.8s/ep).
| Provider | ROUGE-L | Embed | Latency |
|---|---|---|---|
| OpenAI (GPT-4o) | 58.8% | 92.7% | 15.4s |
| Gemini | 33.3% | 87.3% | 2.7s |
| Mistral | 32.5% | 84.8% | 2.8s |
| Grok | 29.5% | 85.4% | 13.2s |
| Anthropic | 29.4% | 81.8% | 4.8s |
| DeepSeek | 26.3% | 85.0% | 14.2s |
OpenAI scores highest because the silver reference is GPT-4o. Compare non-OpenAI providers against each other for a fairer picture.
Full table: Smoke v1 report — Cloud LLMs
Local Ollama (vs silver GPT-4o reference)¶
Best local models: Mistral Small 3.2 and Qwen 2.5:32b tie at 38.4%
ROUGE-L — both outperform every cloud provider except OpenAI. Mistral Small 3.2
leads on ROUGE-1, BLEU, and embedding similarity. Qwen 3.5:9b (with
reasoning_effort: none) is the best smaller model (~30% ROUGE-L, 85.2% embed).
| Model | ROUGE-L | Embed | Latency |
|---|---|---|---|
| mistral-small3.2:latest | 38.4% | 85.8% | 48.6s |
| qwen2.5:32b | 38.4% | 85.2% | 54.8s |
| mistral:7b | 32.8% | 80.4% | 17.4s |
| qwen3.5:9b | 30.3% | 85.2% | 21.9s |
| qwen2.5:7b | 28.3% | 84.9% | 12.1s |
Ollama latencies are hardware-dependent. Re-run on your machine before making decisions.
Full table: Smoke v1 report — Local Ollama
Detailed Cost Analysis¶
Per 100 Episodes — Complete Breakdown¶
| Provider | Transcription | Speaker | Summary | Total | vs OpenAI |
|---|---|---|---|---|---|
| Local ML | $0 | $0 | $0 | $0 | -100% |
| Ollama | N/A | $0 | $0 | $0 | -100% |
| DeepSeek | N/A | N/A | $0.016 | $0.016 | -97% |
| Grok (beta) | N/A | N/A | $0.00 | $0.00 | -100% |
| Mistral (Small) | N/A | N/A | $0.11 | $0.11 | -80% |
| Anthropic (Haiku) | N/A | N/A | $0.40 | $0.40 | -27% |
| Gemini (Flash) | $0.90 | N/A | $0.05 | $0.95 | +73% |
| OpenAI (Nano) | $36.00 | N/A | $0.28 | $36.28 | baseline |
| OpenAI (Mini) | $36.00 | N/A | $1.40 | $37.40 | +3% |
| Mistral (Large) | N/A | N/A | $9.00 | $9.00 | -75% |
Monthly Cost Projections¶
Monthly costs at different scales
═══════════════════════════════════════════════════════════════════════════
100 ep/month 1,000 ep/month 10,000 ep/month
──────────── ────────────── ───────────────
Local ML $0 $0 $0
DeepSeek $0.02 $0.16 $1.60
Grok $0.03 $0.26 $2.60
Anthropic $0.40 $4.00 $40.00
OpenAI (text only) $0.55 $5.50 $55.00
OpenAI (full) $37.40 $374.00 $3,740.00
⚠️ At 10,000 episodes/month, OpenAI full stack costs $3,740!
Using local transcription + DeepSeek: $1.60 (99.96% savings)
Key insight: Transcription dominates cloud costs (90%+). Use local Whisper + cloud text processing to save massively.
Decision Flowchart¶
START
│
▼
┌─────────────────┐
│ What's your │
│ TOP priority? │
└────────┬────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ PRIVACY │ │ COST │ │ QUALITY │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
Need transcription? Need transcription? Budget matters?
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│Yes │No │ │Yes │No │ │Yes │No │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│LOCAL │ │OLLAMA│ │LOCAL │ │DEEP │ │GPT-5 │ │GPT-5 │
│ ML │ │ │ │Whisper│ │SEEK │ │ Mini │ │ │
│ │ │ │ │ + │ │ │ │ │ │ │
│ │ │ │ │DeepSk │ │ │ │ │ │ │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ SPEED │ │ CONTEXT │ │ EU │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ GROK │ │ GEMINI │ │ MISTRAL │
│ │ │ Pro │ │ │
│ Real-Time│ │ 2M │ │ Full │
│ faster │ │ tokens │ │ Stack │
└─────────┘ └─────────┘ └─────────┘
Recommended Configurations¶
Configuration 1: Ultra-Budget ($0.016/100 episodes)¶
# 97% cheaper than OpenAI
transcription_provider: whisper # Free (local)
speaker_detector_provider: spacy # Free (local; DeepSeek: summarization only)
summary_provider: deepseek # $0.016/100
deepseek_api_key: ${DEEPSEEK_API_KEY}
Configuration 2: Quality-First (~$42/100 episodes)¶
# Maximum quality
transcription_provider: openai
speaker_detector_provider: spacy # OpenAI: summarization only (no speaker detection)
summary_provider: openai
openai_summary_model: gpt-5
openai_api_key: ${OPENAI_API_KEY}
Configuration 3: Privacy-First ($0)¶
# Data never leaves your device
transcription_provider: whisper # Local
speaker_detector_provider: ollama # Local Ollama
summary_provider: ollama # Local Ollama
ollama_speaker_model: llama3.1:8b
ollama_summary_model: llama3.1:8b
# For better quality (12-16 GB RAM):
# ollama_speaker_model: llama3.3:latest
# ollama_summary_model: llama3.3:latest
Configuration 4: Speed-First (~$0.25/100 episodes)¶
# Fast cloud summarization
transcription_provider: whisper # Local
speaker_detector_provider: spacy # Local (Grok: summarization only)
summary_provider: grok
grok_summary_model: grok-2
grok_api_key: ${GROK_API_KEY}
Configuration 5: EU Compliant (Mistral Summarization)¶
# European data residency for summarization; local for other capabilities
transcription_provider: whisper # Local (Mistral: summarization only)
speaker_detector_provider: spacy # Local (Mistral: summarization only)
summary_provider: mistral
mistral_summary_model: mistral-large-latest
mistral_api_key: ${MISTRAL_API_KEY}
Configuration 6: Free Development (~$0)¶
# Maximize free tiers
transcription_provider: whisper # Local
speaker_detector_provider: spacy # Local (Gemini/Grok don't support speaker detection)
summary_provider: grok # Free tier
grok_summary_model: grok-beta
Summary¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ KEY TAKEAWAYS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 🥇 CHEAPEST CLOUD: DeepSeek $0.016/100 episodes (97% off) │
│ 🥇 HIGHEST QUALITY: OpenAI GPT-4o Industry benchmark │
│ 🥇 LARGEST CONTEXT: Gemini Pro 2,000,000 tokens │
│ 🥇 BEST FREE TIER: Gemini/Grok Generous limits │
│ 🥇 REAL-TIME INFO: Grok X/Twitter integration │
│ 🥇 EU COMPLIANT: Mistral European summarization provider │
│ 🥇 COMPLETE PRIVACY: Local/Ollama Data never leaves device │
│ 🥇 BEST LOCAL MODEL: Mistral Small 3.2 / Qwen 2.5:32b (38.4% ROUGE-L)│
│ │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 📈 COST INSIGHT: │
│ Transcription = 90%+ of cloud costs │
│ → Use local Whisper + cloud text = massive savings │
│ │
│ 📊 EVAL INSIGHT: │
│ Gemini is best non-OpenAI cloud (33.3% ROUGE-L, 87.3% embed) │
│ Ollama top models beat all cloud except OpenAI (38.4% ROUGE-L) │
│ → See eval reports for full data │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Related Documentation¶
- Provider Deep Dives — per-provider cards, magic quadrant, visual comparisons
- Evaluation Reports — methodology, metrics, and full comparison data
- Provider Configuration Quick Reference
- Ollama Provider Guide — complete Ollama setup and troubleshooting
- Provider Implementation Guide
- ML Provider Reference
- PRD-006: OpenAI Provider
- PRD-009: Anthropic Provider
- PRD-010: Mistral Provider
- PRD-011: DeepSeek Provider
- PRD-012: Gemini Provider
- PRD-013: Grok Provider (xAI)
- PRD-014: Ollama Provider