Version 2.0.0¶

Release Date: November 13, 2025 Type: Major Release Last Updated: November 18, 2025

🎉 Major Release: Refactored Architecture & Comprehensive Documentation¶

Version 2.0.0 represents a significant milestone with a complete codebase refactoring, comprehensive documentation, and new features.

The entire codebase has been refactored from a single-file implementation into a well-organized, modular architecture:

This refactoring improves:

Maintainability: Clear separation of concerns
Testability: Isolated modules with focused responsibilities
Extensibility: Easy to add new features without touching core logic
API Stability: Clean public API surface (Config, run_pipeline, load_config_file)

Added extensive documentation infrastructure:

Architecture Documentation (docs/architecture/ARCHITECTURE.md): Complete system architecture overview
Product Requirements Documents (PRDs):
PRD-001: Transcript Acquisition Pipeline
PRD-002: Whisper Fallback Transcription
PRD-003: User Interfaces & Configuration
Request for Comments (RFCs): 10+ RFCs documenting design decisions
RFC-001 through RFC-010 covering all major features
Testing Strategy (docs/architecture/TESTING_STRATEGY.md): Comprehensive testing approach
API Documentation: Migration guides and API comparisons
MkDocs Site: Live documentation at https://chipi.github.io/podcast_scraper/

Named Entity Recognition (NER): Automatically extracts host and guest names from episode metadata using spaCy
Language-Aware Processing: Single language configuration drives both Whisper model selection and NER
Smart Model Selection: Automatically prefers English-only Whisper models (.en variants) for better performance
Host/Guest Distinction: Intelligently identifies recurring hosts vs. episode-specific guests
Caching Support: Optional host detection caching across episodes for performance
Graceful Fallback: Works seamlessly when spaCy is unavailable

For users upgrading from v1.0.0:

Full Changelog: https://github.com/chipi/podcast_scraper/compare/v1.0.0...v2.0.0