Service API¶
The Service API provides a clean, programmatic interface optimized for non-interactive use, such as running as a daemon or service (e.g., with supervisor, systemd).
Overview¶
The service API is designed to:
- Work exclusively with configuration files (no CLI arguments)
- Provide structured return values and error handling
- Be suitable for process management tools
- Maintain clean separation from CLI concerns
- Use the same validated
Configmodel as the CLI:service.runbuildsConfig(**config_dict)from the merged configuration. There is no separate allowlist of keys in the service layer, so documented fields such aspreprocessing_mp3_bitrate_kbpsare accepted whenever they are valid onConfig(GitHub #561).
Quick Start¶
from podcast_scraper import service, Config
# Option 1: From Config object
cfg = Config(
rss="https://example.com/feed.xml",
output_dir="./transcripts"
)
result = service.run(cfg)
if result.success:
print(f"Processed {result.episodes_processed} episodes")
print(f"Summary: {result.summary}")
else:
print(f"Error: {result.error}")
# Option 2: From config file
result = service.run_from_config_file("config.yaml")
Multi-feed (GitHub #440): If the loaded config has two or more feed entries in rss_urls (from YAML feeds / rss_urls, a promoted rss list, or objects with url plus optional per-feed overrides), service.run / run_from_config_file runs one pipeline per feed under <output_dir>/feeds/<stable_feed_id>/, matching the CLI. output_dir must be set in that case. After the batch, #506 writes corpus_manifest.json and corpus_run_summary.json at the corpus parent; with vector_search and FAISS, #505 builds one <output_dir>/search index. The return value’s multi_feed_summary field holds the same JSON-shaped dict as corpus_run_summary.json (or None on single-feed runs), including batch_incidents and per-feed episode_incidents_unique (schema 1.1.0). Field tables: CORPUS_MULTI_FEED_ARTIFACTS.md. See also CONFIGURATION.md — RSS and multi-feed.
Soft-only multi-feed success (GitHub #559): multi_feed_strict defaults to false (lenient). A multi-feed run can then return success=True with error=None if every failed feed is classified as soft (same rules as in CONFIGURATION.md — RSS and multi-feed). In that case the aggregated per-feed messages are on ServiceResult.soft_failures (non-empty string). If success is false because of a hard failure or strict mode (multi_feed_strict: true), soft_failures stays None. multi_feed_summary / corpus_run_summary.json still report overall_ok: false when any feed failed. In Python, pass multi_feed_strict= into Config; deprecated YAML-only keys are documented in the same CONFIGURATION section.
Episode selection (GitHub #521): The same episode_order, episode_since, episode_until, episode_offset, and max_episodes fields in YAML/JSON apply to each inner single-feed run. See CONFIGURATION.md — Episode selection.
Append / resume (GitHub #444): If Config.append is true, each inner run uses a stable run_append_* directory and skips episodes that are already complete on disk (metadata episode_id + required artifacts). Incompatible with clean_output. See CONFIGURATION.md — Append / resume.
Corpus lock (multi-feed): While two or more feeds are processed, service.run acquires an advisory exclusive lock file .podcast_scraper.lock under the corpus parent (output_dir) using filelock. If another process already holds the lock, the call returns immediately with success=False, episodes_processed=0, and error describing the lock conflict. Disable locking with environment variable PODCAST_SCRAPER_CORPUS_LOCK=0 (tests, advanced workflows). Single-feed service.run does not use this lock.
API Reference¶
run
¶
run(cfg: Config) -> ServiceResult
Run the podcast scraping pipeline with the given configuration.
This is the main entry point for programmatic use. It executes the full pipeline and returns a structured result suitable for service/daemon use.
When cfg.rss_urls contains two or more URLs (e.g. from YAML feeds:), runs one
pipeline per feed under output_dir/feeds/<stable_name>/ (GitHub #440), same layout as
the multi-feed CLI.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
Config
|
Configuration object (can be created from Config() or Config(**load_config_file())) |
required |
Returns:
| Type | Description |
|---|---|
ServiceResult
|
ServiceResult with processing results |
Example
from podcast_scraper import service, config cfg = config.Config(rss_url="https://example.com/feed.xml") result = service.run(cfg) if result.success: ... print(f"Success: {result.summary}") ... else: ... print(f"Error: {result.error}")
Source code in src/podcast_scraper/service.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
run_from_config_file
¶
run_from_config_file(config_path: str | Path) -> ServiceResult
Run the pipeline from a configuration file.
Convenience function that loads a config file and runs the pipeline. This is the recommended entry point for service/daemon usage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str | Path
|
Path to configuration file (JSON or YAML) |
required |
Returns:
| Type | Description |
|---|---|
ServiceResult
|
ServiceResult with processing results |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If config file doesn't exist |
ValueError
|
If config file is invalid |
Example
from podcast_scraper import service result = service.run_from_config_file("config.yaml") if not result.success: ... sys.exit(1)
Source code in src/podcast_scraper/service.py
304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | |
main
¶
main() -> int
Main entry point for service mode (CLI-like but config-file only).
This function is designed to be called as a script entry point: python -m podcast_scraper.service --config config.yaml
It accepts a --config argument (optional if PODCAST_SCRAPER_CONFIG env var is set) and is optimized for non-interactive use.
Config file resolution order: 1. --config argument (if provided) 2. PODCAST_SCRAPER_CONFIG environment variable 3. Default: /app/config.yaml (for Docker/service usage)
Returns:
| Type | Description |
|---|---|
int
|
Exit code (0 for success, 1 for failure) |
Source code in src/podcast_scraper/service.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | |
ServiceResult Class¶
ServiceResult
dataclass
¶
ServiceResult(episodes_processed: int, summary: str, success: bool = True, error: Optional[str] = None, multi_feed_summary: Optional[Dict[str, Any]] = None, soft_failures: Optional[str] = None)
Result of a service run.
Attributes:
| Name | Type | Description |
|---|---|---|
episodes_processed |
int
|
Number of episodes processed (transcripts saved/planned) |
summary |
str
|
Human-readable summary message |
success |
bool
|
Whether the run completed successfully |
error |
Optional[str]
|
Error message if success is False, None otherwise |
multi_feed_summary |
Optional[Dict[str, Any]]
|
When |
soft_failures |
Optional[str]
|
When |
Daemon Usage¶
Systemd Service¶
[Unit]
Description=Podcast Scraper Service
After=network.target
[Service]
Type=simple
User=podcast
WorkingDirectory=/opt/podcast-scraper
ExecStart=/usr/bin/python3 -m podcast_scraper.service --config /etc/podcast-scraper/config.yaml
Restart=on-failure
RestartSec=30
[Install]
WantedBy=multi-user.target
Supervisor Configuration¶
[program:podcast_scraper]
command=/usr/bin/python3 -m podcast_scraper.service --config /etc/podcast-scraper/config.yaml
directory=/opt/podcast-scraper
user=podcast
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/podcast-scraper.log
Programmatic Error Handling¶
import sys
from podcast_scraper import service
result = service.run_from_config_file("config.yaml")
if not result.success:
# Log error and exit with appropriate code
print(f"Service failed: {result.error}", file=sys.stderr)
sys.exit(1)
# success is True; multi-feed may still have soft-classified feed failures
if result.soft_failures:
print(f"Warning (soft-only feed failures): {result.soft_failures}", file=sys.stderr)
print(f"Success: {result.summary}")
sys.exit(0)
Docker Usage¶
For Docker-based deployments, see the Docker Service Guide which covers:
- Service-oriented Docker execution
- Environment variables and volume mounts
- Supervisor integration
- Docker Compose examples
- Troubleshooting
See Also¶
- Configuration - Configuration options
- API Reference - Complete API reference
- Docker Service Guide - Docker service deployment