Skip to content

Quick Reference

One-page cheat sheet for common podcast_scraper commands.


Setup

# Initial setup

git clone https://github.com/chipi/podcast_scraper.git && cd podcast_scraper
python3 -m venv .venv && source .venv/bin/activate
make init                    # Install all dependencies (auto-uses wheels/spacy if *.whl present)
# Slow ML downloads once: make download-spacy-wheels  →  then make init picks it up

# ML model preloading (for tests)

make preload-ml-models       # Download Whisper, spaCy, transformers models

# Cache management

make backup-cache            # Backup .cache directory (saves to ~/podcast_scraper_cache_backups/)
make restore-cache           # Restore cache from backup (interactive)
make backup-cache-list       # List available backups

Diagnostic Commands (Issue #379)

# Run diagnostic checks
python -m podcast_scraper.cli doctor

# Include network connectivity check
python -m podcast_scraper.cli doctor --verbose

# LLM pricing YAML: show path, metadata, staleness (see docs/api/CLI.md)
python -m podcast_scraper.cli pricing-assumptions
make check-pricing-assumptions

Daily Development

# Before coding

source .venv/bin/activate

# After making changes

make format                  # Auto-format code (black + isort)
make lint                    # Check style (flake8)
make type                    # Type check (mypy)
make quality                 # Code quality (complexity, docstrings, dead code, spelling)

# Before committing

make ci                      # Run full CI suite locally

Testing

Command What it does Time
make test-unit Unit tests (parallel, network blocked) ~30s
make test-integration Integration tests (serial) ~2min
make test-e2e E2E tests (serial, real ML) ~5min
make test-fast Critical path only ~1min
make test All tests ~8min
make test-nightly Nightly tests (production models) ~4hrs
# Run specific test file

pytest tests/unit/test_config.py -v

# Run specific test

pytest tests/unit/test_config.py::test_config_validation -v

# Run with output visible

pytest tests/ -v --no-header

# Debug failing test

pytest tests/path/to/test.py -x -v --tb=short

Documentation

make docs                    # Build docs site
make fix-md                  # Auto-fix markdown issues
make lint-markdown           # Check markdown style

# Local preview

mkdocs serve                 # http://localhost:8000

CLI Usage

# Basic transcript download

python3 -m podcast_scraper.cli https://example.com/feed.xml

# With Whisper transcription

python3 -m podcast_scraper.cli https://example.com/feed.xml \
  --transcribe-missing --whisper-model base

# Full processing

python3 -m podcast_scraper.cli https://example.com/feed.xml \
  --transcribe-missing \
  --generate-metadata \
  --generate-summaries \
  --output-dir ./output

# From config file

python3 -m podcast_scraper.cli --config config.yaml

Git Workflow

# Create feature branch

git checkout -b feature/my-feature

# Make changes, then:

make format && make ci       # Format and verify

# Commit

git add -A
git commit -m "feat: add my feature"

# Push and create PR

git push -u origin feature/my-feature

Debugging

# Enable debug logging

export LOG_LEVEL=DEBUG

# Check test coverage

make test-unit
open htmlcov/index.html

# Check for linting issues

make lint 2>&1 | head -50

# Validate config file

python -c "import yaml; yaml.safe_load(open('config.yaml'))"

Grounded Insights

Grounded insights are key takeaways linked to verbatim quotes (evidence). When the GIL pipeline is enabled:

  • Config: generate_gi: true; optional evidence stack: embedding_model, extractive_qa_model, nli_model (and *_device).
  • Output: One gi.json per episode (co-located with transcript/summary).
  • CLI: gi inspect, gi show-insight, gi explore (see Grounded Insights Guide).
  • Browser viewer: make serve-gi-kg-viz — load *.gi.json / *.kg.json in a local UI (Development Guide).

See the Grounded Insights Guide and GIL Ontology.


Key Files

File Purpose
pyproject.toml Dependencies, pytest config
Makefile Development commands
.pre-commit-config.yaml Pre-commit hooks
tests/conftest.py Shared test fixtures
.markdownlint.json Markdown linting rules

Key Directories

Directory Purpose
src/podcast_scraper/ Main source code
tests/unit/ Unit tests
tests/integration/ Integration tests
tests/e2e/ End-to-end tests
docs/ Documentation
config/examples/ Config examples

Common Issues

Issue Fix
spaCy wheels re-download often make download-spacy-wheels then export PIP_FIND_LINKS="$(pwd)/wheels/spacy" (see Dependencies Guide)
Tests skip "model not cached" make preload-ml-models
Import errors pip install -e ".[dev,ml,llm]"
Whisper fails brew install ffmpeg
make visualize / dependency graphs fail brew install graphviz (macOS) or apt install graphviz (Linux)
CI fails locally make ci

See: Troubleshooting Guide


Docker

# Build variants
make docker-build-llm      # LLM-only variant (~200MB)
make docker-build          # ML-enabled variant (~1-3GB)
make docker-build-fast     # ML variant, no model preloading

# Test both variants
make docker-test

# Run container
docker run -v ./config.yaml:/app/config.yaml \
           -v ./output:/app/output \
           podcast-scraper:latest

# Docker Compose
docker-compose up -d       # ML-enabled variant
docker-compose -f docker-compose.llm-only.yml up -d  # LLM-only variant

See: Docker Service Guide, Docker Variants Guide