API Boundaries and Architecture Assessment¶
This document describes the public API boundaries of podcast_scraper and identifies any API creeps (internal details that shouldn't be exposed).
Public API Surface¶
Core Public API (__init__.py)¶
The main package exposes a minimal, clean API:
from podcast_scraper import Config, load_config_file, run_pipeline, service
# Configuration
config = Config(rss_url="...", ...)
config_dict = load_config_file("config.yaml")
config = Config(**config_dict)
# Direct pipeline execution
count, summary = run_pipeline(config)
# Service API (for daemon/service use)
result = service.run(config)
result = service.run_from_config_file("config.yaml")
Key Components¶
Config- Immutable Pydantic model for all runtime options.load_config_file(path: str) -> Dict[str, Any]- Loader for JSON/YAML configs.run_pipeline(cfg: Config) -> Tuple[int, str]- Orchestrates the entire job.service.run(cfg: Config) -> ServiceResult- High-level service wrapper.service.run_from_config_file(path: str) -> ServiceResult- Convenience wrapper for services.
ServiceResult¶
Provides structured outcomes for automated processing:
episodes_processed: int- Number of episodes processed.summary: str- Human-readable summary.success: bool- Whether the run completed successfully.error: Optional[str]- Error message if failed.
CLI API (cli.py)¶
The CLI module is for interactive command-line use:
cli.main()- Main CLI entry point.cli.parse_args()- Argument parsing (internal, but used by tests).
Internal Modules (Not Public API)¶
The following modules are internal implementation details and should not be imported directly:
workflow.py- Core pipeline logic (userun_pipelinefrom__init__.py).episode_processor.py- Episode processing internals.rss_parser.py- RSS parsing internals.downloader.py- HTTP download internals.filesystem.py- File system utilities.metadata.py- Metadata generation internals.speaker_detection.py- Speaker detection internals.summarizer.py- Summarization internals.whisper_integration.py- Whisper integration internals.progress.py- Progress reporting internals.models/- Internal data models (RssFeed, Episode, TranscriptionJob).providers/- Multi-provider implementations (accessed via factories).
API Creeps Assessment¶
✅ Good: Clean Separation¶
workflow.run_pipeline- Properly exposed as public API.config.Config- Public API for configuration.config.load_config_file- Public API for config loading.- Internal functions prefixed with
_- Good convention followed throughout.
⚠️ Potential Issues¶
progressmodule - Currently not in__all__, but may be used by advanced external integrations. Keep as internal unless a public hook is requested.- Provider Protocols - While internal, they define the contract for extensibility. If we allow user-defined providers, these would need to move to public API.
Architecture Isolation¶
Separation of Concerns¶
- CLI (
cli.py): Handles argument parsing, formats terminal output, and manages interactive progress bars. - Service (
service.py): Optimized for daemons; works with config files only, returns structured results, and has no user interaction. - Workflow (
workflow.py): Core pipeline orchestration; contains business logic but is agnostic of CLI or Service interfaces.
Recommendations¶
- Keep
workflow.run_pipelineas the single entry point for all interfaces. - Maintain strict module boundaries to prevent circular imports and reduce coupling.
- Version internal APIs separately if they are exposed for plugin development in the future.
Usage Patterns¶
Interactive CLI Use¶
python -m podcast_scraper.cli https://example.com/feed.xml --max-episodes 10
Automated Service Use¶
python -m podcast_scraper.service --config config.yaml
Programmatic Python Use¶
from podcast_scraper import service
result = service.run_from_config_file("config.yaml")
if not result.success:
# Handle error
print(f"Failed: {result.error}")
Versioning Policy¶
- Major (X.y.z): Breaking API changes (e.g., signature changes, removal of classes).
- Minor (x.Y.z): New features, backward compatible (e.g., new optional fields, new providers).
- Patch (x.y.Z): Bug fixes, backward compatible.
See Versioning for full details.