Service API¶

The Service API provides a clean, programmatic interface optimized for non-interactive use, such as running as a daemon or service (e.g., with supervisor, systemd).

Overview¶

The service API is designed to:

Work exclusively with configuration files (no CLI arguments)
Provide structured return values and error handling
Be suitable for process management tools
Maintain clean separation from CLI concerns

Quick Start¶

from podcast_scraper import service, Config

# Option 1: From Config object
cfg = Config(
    rss="https://example.com/feed.xml",
    output_dir="./transcripts"
)
result = service.run(cfg)

if result.success:
    print(f"Processed {result.episodes_processed} episodes")
    print(f"Summary: {result.summary}")
else:
    print(f"Error: {result.error}")

# Option 2: From config file
result = service.run_from_config_file("config.yaml")

API Reference¶

run ¶

run(cfg: Config) -> ServiceResult

Run the podcast scraping pipeline with the given configuration.

This is the main entry point for programmatic use. It executes the full pipeline and returns a structured result suitable for service/daemon use.

Parameters:

Name	Type	Description	Default
`cfg`	`Config`	Configuration object (can be created from Config() or Config(**load_config_file()))	required

Returns:

Type	Description
`ServiceResult`	ServiceResult with processing results

Example

from podcast_scraper import service, config cfg = config.Config(rss_url="https://example.com/feed.xml") result = service.run(cfg) if result.success: ... print(f"Success: {result.summary}") ... else: ... print(f"Error: {result.error}")

Source code in src/podcast_scraper/service.py

def run(cfg: config.Config) -> ServiceResult:
    """Run the podcast scraping pipeline with the given configuration.

    This is the main entry point for programmatic use. It executes the full pipeline
    and returns a structured result suitable for service/daemon use.

    Args:
        cfg: Configuration object (can be created from Config() or Config(**load_config_file()))

    Returns:
        ServiceResult with processing results

    Example:
        >>> from podcast_scraper import service, config
        >>> cfg = config.Config(rss_url="https://example.com/feed.xml")
        >>> result = service.run(cfg)
        >>> if result.success:
        ...     print(f"Success: {result.summary}")
        ... else:
        ...     print(f"Error: {result.error}")
    """
    try:
        # Apply logging configuration if specified
        if cfg.log_file or cfg.log_level:
            workflow.apply_log_level(
                level=cfg.log_level or "INFO",
                log_file=cfg.log_file,
            )

        # Run the pipeline
        count, summary = workflow.run_pipeline(cfg)

        return ServiceResult(
            episodes_processed=count,
            summary=summary,
            success=True,
            error=None,
        )
    except Exception as e:
        error_safe = redact_for_log(str(e))
        logger.error("Pipeline execution failed: %s", error_safe, exc_info=True)
        return ServiceResult(
            episodes_processed=0,
            summary="",
            success=False,
            error=error_safe,
        )

run_from_config_file ¶

run_from_config_file(config_path: str | Path) -> ServiceResult

Run the pipeline from a configuration file.

Convenience function that loads a config file and runs the pipeline. This is the recommended entry point for service/daemon usage.

Parameters:

Name	Type	Description	Default
`config_path`	`str \| Path`	Path to configuration file (JSON or YAML)	required

Returns:

Type	Description
`ServiceResult`	ServiceResult with processing results

Raises:

Type	Description
`FileNotFoundError`	If config file doesn't exist
`ValueError`	If config file is invalid

Example

from podcast_scraper import service result = service.run_from_config_file("config.yaml") if not result.success: ... sys.exit(1)

Source code in src/podcast_scraper/service.py

def run_from_config_file(config_path: str | Path) -> ServiceResult:
    """Run the pipeline from a configuration file.

    Convenience function that loads a config file and runs the pipeline.
    This is the recommended entry point for service/daemon usage.

    Args:
        config_path: Path to configuration file (JSON or YAML)

    Returns:
        ServiceResult with processing results

    Raises:
        FileNotFoundError: If config file doesn't exist
        ValueError: If config file is invalid

    Example:
        >>> from podcast_scraper import service
        >>> result = service.run_from_config_file("config.yaml")
        >>> if not result.success:
        ...     sys.exit(1)
    """
    try:
        config_dict = config.load_config_file(str(config_path))
        cfg = config.Config(**config_dict)
    except FileNotFoundError:
        error_msg = f"Configuration file not found: {config_path}"
        error_safe = redact_for_log(error_msg)
        logger.error("%s", error_safe)
        return ServiceResult(
            episodes_processed=0,
            summary="",
            success=False,
            error=error_safe,
        )
    except Exception as exc:
        error_safe = redact_for_log(f"Failed to load configuration file: {exc}")
        logger.error("%s", error_safe)
        return ServiceResult(
            episodes_processed=0,
            summary="",
            success=False,
            error=error_safe,
        )

    return run(cfg)

main ¶

main() -> int

Main entry point for service mode (CLI-like but config-file only).

This function is designed to be called as a script entry point: python -m podcast_scraper.service --config config.yaml

It accepts a --config argument (optional if PODCAST_SCRAPER_CONFIG env var is set) and is optimized for non-interactive use.

Config file resolution order: 1. --config argument (if provided) 2. PODCAST_SCRAPER_CONFIG environment variable 3. Default: /app/config.yaml (for Docker/service usage)

Returns:

Type	Description
`int`	Exit code (0 for success, 1 for failure)

Source code in src/podcast_scraper/service.py

def main() -> int:
    """Main entry point for service mode (CLI-like but config-file only).

    This function is designed to be called as a script entry point:
    python -m podcast_scraper.service --config config.yaml

    It accepts a --config argument (optional if PODCAST_SCRAPER_CONFIG env var is set)
    and is optimized for non-interactive use.

    Config file resolution order:
    1. --config argument (if provided)
    2. PODCAST_SCRAPER_CONFIG environment variable
    3. Default: /app/config.yaml (for Docker/service usage)

    Returns:
        Exit code (0 for success, 1 for failure)
    """
    import argparse
    import os

    # Initialize ML environment variables early (before any ML imports)
    setup.initialize_ml_environment()

    # Default config path (for Docker/service usage)
    default_config = os.getenv("PODCAST_SCRAPER_CONFIG", "/app/config.yaml")

    parser = argparse.ArgumentParser(
        description="Podcast Scraper Service - Run pipeline from configuration file",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Run with config file
  python -m podcast_scraper.service --config config.yaml

  # Run with environment variable
  PODCAST_SCRAPER_CONFIG=/path/to/config.yaml python -m podcast_scraper.service

  # Run with default path (Docker/service mode)
  python -m podcast_scraper.service

  # For supervisor/systemd usage
  [program:podcast_scraper]
  command=python -m podcast_scraper.service --config /path/to/config.yaml
  autostart=true
  autorestart=true
        """,
    )
    parser.add_argument(
        "--config",
        default=None,
        help=(
            "Path to configuration file (JSON or YAML). "
            "If not provided, uses PODCAST_SCRAPER_CONFIG environment variable "
            f"or default: {default_config}"
        ),
    )
    parser.add_argument(
        "--version",
        action="version",
        version=f"podcast_scraper {__version__}",
    )

    args = parser.parse_args()

    # Resolve config file path
    config_path = args.config or default_config

    # Run the service
    result = run_from_config_file(config_path)

    # Print results
    if result.success:
        print(result.summary)
        return 0
    else:
        print(f"Error: {result.error}", file=sys.stderr)
        return 1

ServiceResult Class¶

ServiceResult `dataclass` ¶

ServiceResult(episodes_processed: int, summary: str, success: bool = True, error: Optional[str] = None)

Result of a service run.

Attributes:

Name	Type	Description
`episodes_processed`	`int`	Number of episodes processed (transcripts saved/planned)
`summary`	`str`	Human-readable summary message
`success`	`bool`	Whether the run completed successfully
`error`	`Optional[str]`	Error message if success is False, None otherwise

Daemon Usage¶

Systemd Service¶

[Unit]
Description=Podcast Scraper Service
After=network.target

[Service]
Type=simple
User=podcast
WorkingDirectory=/opt/podcast-scraper
ExecStart=/usr/bin/python3 -m podcast_scraper.service --config /etc/podcast-scraper/config.yaml
Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target

Supervisor Configuration¶

[program:podcast_scraper]
command=/usr/bin/python3 -m podcast_scraper.service --config /etc/podcast-scraper/config.yaml
directory=/opt/podcast-scraper
user=podcast
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/podcast-scraper.log

Programmatic Error Handling¶

import sys
from podcast_scraper import service

result = service.run_from_config_file("config.yaml")

if not result.success:
    # Log error and exit with appropriate code
    print(f"Service failed: {result.error}", file=sys.stderr)
    sys.exit(1)

# Continue with success
print(f"Success: {result.summary}")
sys.exit(0)

Docker Usage¶

For Docker-based deployments, see the Docker Service Guide which covers:

Service-oriented Docker execution
Environment variables and volume mounts
Supervisor integration
Docker Compose examples
Troubleshooting

Service API¶

Overview¶

Quick Start¶

API Reference¶

run ¶

run_from_config_file ¶

main ¶

ServiceResult Class¶

ServiceResult dataclass ¶

Daemon Usage¶

Systemd Service¶

Supervisor Configuration¶

Programmatic Error Handling¶

Docker Usage¶

See Also¶

ServiceResult `dataclass` ¶