Skip to content

Service API

The Service API provides a clean, programmatic interface optimized for non-interactive use, such as running as a daemon or service (e.g., with supervisor, systemd).

Overview

The service API is designed to:

  • Work exclusively with configuration files (no CLI arguments)
  • Provide structured return values and error handling
  • Be suitable for process management tools
  • Maintain clean separation from CLI concerns

Quick Start

from podcast_scraper import service, Config

# Option 1: From Config object
cfg = Config(
    rss="https://example.com/feed.xml",
    output_dir="./transcripts"
)
result = service.run(cfg)

if result.success:
    print(f"Processed {result.episodes_processed} episodes")
    print(f"Summary: {result.summary}")
else:
    print(f"Error: {result.error}")

# Option 2: From config file
result = service.run_from_config_file("config.yaml")

API Reference

run

run(cfg: Config) -> ServiceResult

Run the podcast scraping pipeline with the given configuration.

This is the main entry point for programmatic use. It executes the full pipeline and returns a structured result suitable for service/daemon use.

Parameters:

Name Type Description Default
cfg Config

Configuration object (can be created from Config() or Config(**load_config_file()))

required

Returns:

Type Description
ServiceResult

ServiceResult with processing results

Example

from podcast_scraper import service, config cfg = config.Config(rss_url="https://example.com/feed.xml") result = service.run(cfg) if result.success: ... print(f"Success: {result.summary}") ... else: ... print(f"Error: {result.error}")

Source code in src/podcast_scraper/service.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def run(cfg: config.Config) -> ServiceResult:
    """Run the podcast scraping pipeline with the given configuration.

    This is the main entry point for programmatic use. It executes the full pipeline
    and returns a structured result suitable for service/daemon use.

    Args:
        cfg: Configuration object (can be created from Config() or Config(**load_config_file()))

    Returns:
        ServiceResult with processing results

    Example:
        >>> from podcast_scraper import service, config
        >>> cfg = config.Config(rss_url="https://example.com/feed.xml")
        >>> result = service.run(cfg)
        >>> if result.success:
        ...     print(f"Success: {result.summary}")
        ... else:
        ...     print(f"Error: {result.error}")
    """
    try:
        # Apply logging configuration if specified
        if cfg.log_file or cfg.log_level:
            workflow.apply_log_level(
                level=cfg.log_level or "INFO",
                log_file=cfg.log_file,
            )

        # Run the pipeline
        count, summary = workflow.run_pipeline(cfg)

        return ServiceResult(
            episodes_processed=count,
            summary=summary,
            success=True,
            error=None,
        )
    except Exception as e:
        error_safe = redact_for_log(str(e))
        logger.error("Pipeline execution failed: %s", error_safe, exc_info=True)
        return ServiceResult(
            episodes_processed=0,
            summary="",
            success=False,
            error=error_safe,
        )

run_from_config_file

run_from_config_file(config_path: str | Path) -> ServiceResult

Run the pipeline from a configuration file.

Convenience function that loads a config file and runs the pipeline. This is the recommended entry point for service/daemon usage.

Parameters:

Name Type Description Default
config_path str | Path

Path to configuration file (JSON or YAML)

required

Returns:

Type Description
ServiceResult

ServiceResult with processing results

Raises:

Type Description
FileNotFoundError

If config file doesn't exist

ValueError

If config file is invalid

Example

from podcast_scraper import service result = service.run_from_config_file("config.yaml") if not result.success: ... sys.exit(1)

Source code in src/podcast_scraper/service.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def run_from_config_file(config_path: str | Path) -> ServiceResult:
    """Run the pipeline from a configuration file.

    Convenience function that loads a config file and runs the pipeline.
    This is the recommended entry point for service/daemon usage.

    Args:
        config_path: Path to configuration file (JSON or YAML)

    Returns:
        ServiceResult with processing results

    Raises:
        FileNotFoundError: If config file doesn't exist
        ValueError: If config file is invalid

    Example:
        >>> from podcast_scraper import service
        >>> result = service.run_from_config_file("config.yaml")
        >>> if not result.success:
        ...     sys.exit(1)
    """
    try:
        config_dict = config.load_config_file(str(config_path))
        cfg = config.Config(**config_dict)
    except FileNotFoundError:
        error_msg = f"Configuration file not found: {config_path}"
        error_safe = redact_for_log(error_msg)
        logger.error("%s", error_safe)
        return ServiceResult(
            episodes_processed=0,
            summary="",
            success=False,
            error=error_safe,
        )
    except Exception as exc:
        error_safe = redact_for_log(f"Failed to load configuration file: {exc}")
        logger.error("%s", error_safe)
        return ServiceResult(
            episodes_processed=0,
            summary="",
            success=False,
            error=error_safe,
        )

    return run(cfg)

main

main() -> int

Main entry point for service mode (CLI-like but config-file only).

This function is designed to be called as a script entry point: python -m podcast_scraper.service --config config.yaml

It accepts a --config argument (optional if PODCAST_SCRAPER_CONFIG env var is set) and is optimized for non-interactive use.

Config file resolution order: 1. --config argument (if provided) 2. PODCAST_SCRAPER_CONFIG environment variable 3. Default: /app/config.yaml (for Docker/service usage)

Returns:

Type Description
int

Exit code (0 for success, 1 for failure)

Source code in src/podcast_scraper/service.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
def main() -> int:
    """Main entry point for service mode (CLI-like but config-file only).

    This function is designed to be called as a script entry point:
    python -m podcast_scraper.service --config config.yaml

    It accepts a --config argument (optional if PODCAST_SCRAPER_CONFIG env var is set)
    and is optimized for non-interactive use.

    Config file resolution order:
    1. --config argument (if provided)
    2. PODCAST_SCRAPER_CONFIG environment variable
    3. Default: /app/config.yaml (for Docker/service usage)

    Returns:
        Exit code (0 for success, 1 for failure)
    """
    import argparse
    import os

    # Initialize ML environment variables early (before any ML imports)
    setup.initialize_ml_environment()

    # Default config path (for Docker/service usage)
    default_config = os.getenv("PODCAST_SCRAPER_CONFIG", "/app/config.yaml")

    parser = argparse.ArgumentParser(
        description="Podcast Scraper Service - Run pipeline from configuration file",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Run with config file
  python -m podcast_scraper.service --config config.yaml

  # Run with environment variable
  PODCAST_SCRAPER_CONFIG=/path/to/config.yaml python -m podcast_scraper.service

  # Run with default path (Docker/service mode)
  python -m podcast_scraper.service

  # For supervisor/systemd usage
  [program:podcast_scraper]
  command=python -m podcast_scraper.service --config /path/to/config.yaml
  autostart=true
  autorestart=true
        """,
    )
    parser.add_argument(
        "--config",
        default=None,
        help=(
            "Path to configuration file (JSON or YAML). "
            "If not provided, uses PODCAST_SCRAPER_CONFIG environment variable "
            f"or default: {default_config}"
        ),
    )
    parser.add_argument(
        "--version",
        action="version",
        version=f"podcast_scraper {__version__}",
    )

    args = parser.parse_args()

    # Resolve config file path
    config_path = args.config or default_config

    # Run the service
    result = run_from_config_file(config_path)

    # Print results
    if result.success:
        print(result.summary)
        return 0
    else:
        print(f"Error: {result.error}", file=sys.stderr)
        return 1

ServiceResult Class

ServiceResult dataclass

ServiceResult(episodes_processed: int, summary: str, success: bool = True, error: Optional[str] = None)

Result of a service run.

Attributes:

Name Type Description
episodes_processed int

Number of episodes processed (transcripts saved/planned)

summary str

Human-readable summary message

success bool

Whether the run completed successfully

error Optional[str]

Error message if success is False, None otherwise

Daemon Usage

Systemd Service

[Unit]
Description=Podcast Scraper Service
After=network.target

[Service]
Type=simple
User=podcast
WorkingDirectory=/opt/podcast-scraper
ExecStart=/usr/bin/python3 -m podcast_scraper.service --config /etc/podcast-scraper/config.yaml
Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target

Supervisor Configuration

[program:podcast_scraper]
command=/usr/bin/python3 -m podcast_scraper.service --config /etc/podcast-scraper/config.yaml
directory=/opt/podcast-scraper
user=podcast
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/podcast-scraper.log

Programmatic Error Handling

import sys
from podcast_scraper import service

result = service.run_from_config_file("config.yaml")

if not result.success:
    # Log error and exit with appropriate code
    print(f"Service failed: {result.error}", file=sys.stderr)
    sys.exit(1)

# Continue with success
print(f"Success: {result.summary}")
sys.exit(0)

Docker Usage

For Docker-based deployments, see the Docker Service Guide which covers:

  • Service-oriented Docker execution
  • Environment variables and volume mounts
  • Supervisor integration
  • Docker Compose examples
  • Troubleshooting

See Also