PRD-003: User Interfaces & Configuration¶

Status: ✅ Implemented (v2.0.0)
Related RFCs: RFC-007, RFC-008, RFC-009
Related UX specs:
UXS-001: GI / KG viewer (local visualization served with the tool)

Summary¶

Define how operators interact with the podcast scraper via CLI flags and configuration files. Ensure a consistent experience across modes while exposing progress feedback and logging controls.

Background & Context¶

Users frequently run the tool from terminals or automation scripts, requiring a predictable CLI surface.
Many production runs rely on reusable configuration files for reproducibility.
The CLI is also the public API showcase; Python consumers call into the same Config and run_pipeline primitives.

Goals¶

Provide a single configuration model (Config) that powers both CLI and Python integration.
Ensure CLI validation guards against common mistakes before work begins.
Allow reusable JSON/YAML config files that slot into automation pipelines.
Offer progress and logging visibility tuned for terminal usage while remaining embeddable.

Non-Goals¶

Building a GUI or web interface.
Secret management or credential storage (handled externally if needed).
Rich analytics dashboards (out of scope).

Personas¶

Operator Owen: Runs the CLI manually, tweaking flags to experiment with output runs.
Automation Alex: Integrates the scraper into nightly jobs using configuration files.
Integrator Iris: Imports the Python API into another application and needs a stable programmatic interface.

User Stories¶

As Operator Owen, I can run python -m podcast_scraper.cli <rss_url> with sensible defaults and see progress bars and status logs.
As Automation Alex, I can maintain a JSON/YAML config file checked into version control and use --config to load it.
As a user, I can request version info (--version) and set log level verbosity per run.
As Integrator Iris, I can call podcast_scraper.Config + podcast_scraper.run_pipeline directly in Python with the same semantics.
As any user, I can enable automatic speaker name detection (--auto-speakers) without manually specifying names for each episode (RFC-010).
As any user, I can configure the podcast language (--language) to optimize both Whisper transcription and speaker name detection.
As any user, I can provide manual speaker names (--speaker-names) as fallback when automatic detection fails.

Functional Requirements¶

FR1: CLI must validate inputs (RSS URL, numeric ranges, Whisper model choices) and surface actionable error messages.
FR2: CLI flags map to Config fields; precedence is CLI > config file defaults (with validation).
FR3: Support both JSON and YAML configuration files loaded via --config with schema validation.
FR4: Expose logging controls (--log-level) and default to INFO.
FR5: Default progress reporter uses tqdm; expose abstraction (progress.set_progress_factory) to override in embedded contexts.
FR6: Provide --dry-run, --skip-existing, --clean-output, --workers, and other operational flags documented in README.
FR7: Ensure exit codes communicate success (0) vs. validation or runtime failures (1).
FR8: Export Python API surface (Config, load_config_file, run_pipeline, cli.main) from podcast_scraper.__init__.
FR9: Support --language flag (default "en") that configures both Whisper transcription language and NER model selection (RFC-010).
FR10: Support --auto-speakers flag (default true) to enable/disable automatic speaker name detection via NER (RFC-010).
FR11: Support --ner-model flag for advanced users to override default spaCy model selection (RFC-010).
FR12: Support --cache-detected-hosts flag (default true) to control host detection memoization (RFC-010).
FR13: Maintain fallback chain: automatic detection > manual --speaker-names fallback (when detection fails) > default ["Host", "Guest"].

Success Metrics¶

CLI onboarding: a new user can run the default command with only an RSS URL and receive useful output/logging.
Config file onboarding: loading an invalid config produces a clear validation error (no partial runs).
Python API: integration tests confirm parity with CLI semantics.

Dependencies¶

Validation and configuration logic described in docs/rfc/RFC-007-cli-interface.md and docs/rfc/RFC-008-config-model.md.
Progress abstraction detailed in docs/rfc/RFC-009-progress-integration.md.
Automatic speaker name detection and language configuration in docs/rfc/RFC-010-speaker-name-detection.md.

Release Checklist¶

[ ] CLI help text audited and examples verified in README.
[ ] Integration tests cover CLI happy path, invalid args, config file precedence, programmatic usage.
[ ] Version string maintained in sync (__version__).

Viewer v2 — Theme & Appearance (planned)¶

The GI/KG viewer v2 (RFC-062) extends this PRD with a token-based theming system that separates visual decisions from component code. This is specified in UXS-001 and implemented via CSS custom properties + optional preset files.

Key capabilities:

Semantic tokens: All colors, typography, spacing, and radii are defined as named tokens (e.g. canvas, primary, gi, series-1). Components consume tokens, never hard-coded values.
Light/dark: Driven by prefers-color-scheme; dark mode is the design baseline.
Preset experimentation: Alternate value files (compact.css, relaxed.css) can override tunable parameters (fonts, spacing, border-radius) without touching component code. Once finalized, the chosen values are frozen in UXS-001.
Frozen vs open: Token names and the pairing/split conventions are architectural (frozen). Token values (exact hex, font family, spacing unit) are open for tuning during early development.

See UXS-001 § Tunable parameters for the full frozen/open matrix, and RFC-062 decision #6 for the implementation approach.

Open Questions¶

Should we support environment variable substitution in config files? Not currently planned.
Do we need subcommands for future expansion (e.g., inspect, clean)? Monitor user feedback.

RFC-010 Integration¶

This PRD integrates with RFC-010 (Automatic Speaker Name Detection) to provide new configuration options:

Language Configuration: The --language flag (default "en") controls both Whisper model selection and NER model selection. Config file supports language field.
Automatic Speaker Detection: The --auto-speakers flag (default true) enables/disables automatic extraction of speaker names from episode metadata. Config file supports auto_speakers boolean field.
NER Model Override: Advanced users can specify --ner-model to override default spaCy model selection (e.g., en_core_web_sm). Config file supports ner_model field.
Caching Control: The --cache-detected-hosts flag (default true) controls whether host detection is memoized across episodes. Config file supports cache_detected_hosts boolean field.
Precedence Rules:
Automatic detection runs first when --auto-speakers is enabled.
Manual --speaker-names are ONLY used as fallback when automatic detection fails (not as override).
Manual names format: first item = host, second item = guest (e.g., ["Lenny", "Guest"]).
When guest detection fails: keep detected hosts (if any) + use manual guest name as fallback.
If detection succeeds, manual names are ignored; if detection fails, manual names are used as fallback.
Validation: CLI validates language codes, NER model names, and ensures speaker name lists meet minimum requirements.