Version 2.0.0¶
Release Date: November 13, 2025 Type: Major Release Last Updated: November 18, 2025
🎉 Major Release: Refactored Architecture & Comprehensive Documentation¶
Version 2.0.0 represents a significant milestone with a complete codebase refactoring, comprehensive documentation, and new features.
🏗️ Major Architectural Changes¶
Codebase Refactoring into Modular Structure (#3)¶
The entire codebase has been refactored from a single-file implementation into a well-organized, modular architecture:
cli.py: Command-line interface and argument parsingconfig.py: Configuration models and validation (Pydantic-based)workflow.py: High-level pipeline orchestrationrss_parser.py: RSS feed parsing and episode extractiondownloader.py: Resilient HTTP download utilitiesepisode_processor.py: Episode-level processing logicfilesystem.py: Filesystem utilities and path managementwhisper_integration.py: Whisper transcription integrationprogress.py: Pluggable progress reporting interfacemodels.py: Shared data models (dataclasses)
This refactoring improves:
- Maintainability: Clear separation of concerns
- Testability: Isolated modules with focused responsibilities
- Extensibility: Easy to add new features without touching core logic
- API Stability: Clean public API surface (
Config,run_pipeline,load_config_file)
Comprehensive Documentation (#9)¶
Added extensive documentation infrastructure:
- Architecture Documentation (
docs/architecture/ARCHITECTURE.md): Complete system architecture overview - Product Requirements Documents (PRDs):
- PRD-001: Transcript Acquisition Pipeline
- PRD-002: Whisper Fallback Transcription
- PRD-003: User Interfaces & Configuration
- Request for Comments (RFCs): 10+ RFCs documenting design decisions
- RFC-001 through RFC-010 covering all major features
- Testing Strategy (
docs/architecture/TESTING_STRATEGY.md): Comprehensive testing approach - API Documentation: Migration guides and API comparisons
- MkDocs Site: Live documentation at https://chipi.github.io/podcast_scraper/
✨ New Features¶
RFC-010: Automatic Speaker Name Detection (#11)¶
- Named Entity Recognition (NER): Automatically extracts host and guest names from episode metadata using spaCy
- Language-Aware Processing: Single
languageconfiguration drives both Whisper model selection and NER - Smart Model Selection: Automatically prefers English-only Whisper models (
.envariants) for better performance - Host/Guest Distinction: Intelligently identifies recurring hosts vs. episode-specific guests
- Caching Support: Optional host detection caching across episodes for performance
- Graceful Fallback: Works seamlessly when spaCy is unavailable
🔧 Improvements¶
- MIT License: Added proper license file and documentation
- CI/CD Enhancements: Improved workflows and tooling alignment (#10)
- Code Quality: Various code review improvements (#8)
- Security: CodeQL analysis workflow configuration
- Type Safety: Full type hints and Pydantic validation
📚 Documentation Highlights¶
- Complete architecture documentation
- 10+ RFCs covering design decisions
- Testing strategy and requirements
- API migration guides (v1 → v2)
- Product requirements documents
- Live MkDocs documentation site
🔄 Migration Notes¶
For users upgrading from v1.0.0:
- The public API remains stable (
Config,run_pipeline,load_config_file) - Internal module structure has changed but is transparent to users
- New optional features (speaker detection) are opt-in via
--auto-speakers - All existing functionality preserved and enhanced
Full Changelog: https://github.com/chipi/podcast_scraper/compare/v1.0.0...v2.0.0