Skip to content

RFC-007: CLI Interface & Validation

  • Status: Completed
  • Authors: GPT-5 Codex (initial documentation)
  • Stakeholders: Maintainers, operators, documentation writers
  • Related PRD: docs/prd/PRD-003-user-interface-config.md

Abstract

Specify the structure and behavior of the command-line interface, including argument parsing, validation, configuration merging, and integration with the pipeline.

Problem Statement

The CLI is the primary user entry point. It must expose all critical functionality while preventing invalid runs through proactive validation. Additionally, it needs to support configuration files without surprising precedence rules.

Constraints & Assumptions

  • CLI entry is python -m podcast_scraper.cli or podcast_scraper.cli.main() from Python.
  • Arguments are parsed using argparse; we must remain compatible with Python 3.10+ standard library.
  • Validation should surface actionable errors without stack traces (exit code 1).

Design & Implementation

  1. Argument parsing
  2. parse_args defines all flags documented in README (RSS URL, output, max episodes, Whisper flags, etc.).
  3. Supports --config for JSON/YAML files; merges validated values into parser defaults.
  4. --version prints version string and exits.
  5. Validation
  6. validate_args enforces URL schemes, numeric ranges, Whisper model choices, speaker name counts, and output directory validity.
  7. Raises ValueError with aggregated error messages for user-friendly output.
  8. Config merging
  9. Config files loaded via config.load_config_file then parsed through config.Config for schema enforcement.
  10. CLI arguments override config defaults; unspecified CLI flags inherit config values.
  11. Config construction
  12. _build_config transforms CLI namespace into config.Config (populating derived fields like output dir and speaker list).
  13. Integration hooks
  14. main accepts injectable apply_log_level_fn, run_pipeline_fn, and logger for testing.
  15. Registers toolkit progress factory (progress.set_progress_factory) with CLI-specific tqdm wrapper.
  16. Exit semantics
  17. Validation or configuration errors return exit code 1 without stack traces.
  18. Pipeline exceptions are caught, logged, and return exit code 1.

Key Decisions

  • Two-phase validation (argparse + Pydantic) catches both syntactic and semantic errors before running pipeline.
  • Config precedence ensures reproducible defaults while allowing on-the-fly overrides.
  • Injectable dependencies improve testability (e.g., verifying CLI surfaces errors correctly).

Alternatives Considered

  • Click/Typer frameworks: Rejected to minimize dependencies and maintain explicit control over parsing/validation flow.
  • Silent failure on validation errors: Rejected; clear logging and exit status are vital for automation.

Testing Strategy

  • CLI tests in tests/test_podcast_scraper.py cover success cases, invalid arguments, config loading precedence, and version flag behavior.
  • Unit tests simulate argument lists to hit edge cases (e.g., invalid speaker counts, unknown config keys).

Rollout & Monitoring

  • Help text (--help) reviewed for accuracy each release.
  • Version string maintained in cli.__version__ and exported via __init__ for tooling.

References

  • Source: podcast_scraper/cli.py
  • Config schema: docs/rfc/RFC-008-config-model.md
  • Progress integration: docs/rfc/RFC-009-progress-integration.md