API Migration Guide¶
Documentation for migrating between major and minor versions of the podcast_scraper API.
v2.3.2 to v2.4.0 (Current)¶
v2.4.0 introduces a multi-provider ecosystem and changes several defaults.
⚠️ Breaking Behavior Changes¶
These are not code-breaking but change the default behavior of the pipeline:
- Automatic Transcription:
transcribe_missingnow defaults totrue. - Migration: If you want to only download existing transcripts, explicitly set
transcribe_missing: falsein your config. - Whisper Model: The default
whisper_modelchanged frombasetobase.en. - Migration: For non-English podcasts, you must now explicitly set
whisper_model: base(or another multilingual model). - Output Structure: Transcripts and metadata are now placed in subdirectories.
- Migration: Update any scripts that assume all files are in the root run directory.
🚀 Multi-Provider Configuration¶
v2.4.0 replaces specific provider flags with a unified provider system:
- New fields:
transcription_provider,speaker_detector_provider,summary_provider. - Supported providers:
whisper,spacy,transformers(local), and cloud providers likeopenai,anthropic,mistral, etc.
v1.0 to v2.0¶
Version 2.0 refactored the monolithic v1.0 into a clean modular architecture.
Modular Architecture¶
Before (v1.0): Monolithic podcast_scraper.py file with no formal public API.
After (v2.0): focused modules with 4 primary public exports:
from podcast_scraper import Config, load_config_file, run_pipeline, cli
New Usage Pattern¶
import podcast_scraper
# Configuration
config = podcast_scraper.Config(
rss_url="https://example.com/feed.xml",
output_dir="./transcripts",
max_episodes=10,
)
# Run pipeline
count, summary = podcast_scraper.run_pipeline(config)
Version History¶
| Version | Date | Highlights |
|---|---|---|
| v2.4.0 | 2026-01 | Multi-provider ecosystem, production defaults, cache CLI. |
| v2.3.0 | 2025-11 | Added service API and episode summarization. |
| v2.2.0 | 2025-11 | Metadata generation (JSON/YAML). |
| v2.1.0 | 2025-11 | Automatic speaker detection (NER). |
| v2.0.0 | 2025-11 | Modular architecture foundation. |
| v1.0.0 | 2025-11 | Initial monolithic release. |
Checking API Version¶
import podcast_scraper
# Both will return the same string, e.g., "2.4.0"
print(podcast_scraper.__version__)
print(podcast_scraper.__api_version__)