Skip to content

Troubleshooting Guide

Common issues and solutions for podcast_scraper development and usage.


Quick Diagnosis

Symptom Likely Cause Solution
Tests skip with "model not cached" ML models not preloaded make preload-ml-models
ModuleNotFoundError: transformers Missing ML dependencies pip install -e ".[ml]"
Whisper fails silently ffmpeg not installed Install ffmpeg (see below)
make visualize fails / "cannot find dot" Graphviz not installed Install Graphviz (see below)
CI fails but local passes Different Python/dependency versions make ci locally
Memory errors in tests ML models loading repeatedly Use @pytest.mark.serial
Import errors after pull Dependencies changed pip install -e ".[dev,ml]"
Tests hang with -s flag tqdm + parallel execution deadlock Use -v instead, or -n 0
OpenAI episodes skipped Audio file size > 25MB Use local Whisper or compress audio
Pipeline hangs on transcription No timeout configured Set transcription_timeout in config
Pipeline hangs on summarization No timeout configured Set summarization_timeout in config
Need structured logs Default logging format Use --json-logs flag
Want to stop on first failure Default continues on errors Use --fail-fast flag
Want to limit failures Default has no limit Use --max-failures N flag
make test-fast or make ci-fast hangs at ~87% pytest-xdist stall near end of run TEST_FAST_WORKERS=2 make test-fast (or make ci-fast)
Unsure if environment is ready Python, ffmpeg, cache, or models missing Run podcast-scraper doctor (see below)

test-fast / ci-fast hangs at ~87%

Symptom: make test-fast or make ci-fast sometimes hangs around 80–90% and never finishes (or takes a very long time). Other times the same run completes.

Cause: Known pytest-xdist behavior: with parallel workers, the run can stall near completion (worker coordination at end of suite). More likely with higher worker counts.

Workaround: Run with fewer workers so the stall is less likely:

# Use 2 workers for the fast test suite (slower but avoids hang)
TEST_FAST_WORKERS=2 make test-fast

# Same for full fast CI
TEST_FAST_WORKERS=2 make ci-fast

Alternative: Run the test phase without parallelism (slow but reliable):

make format-check lint type security
E2E_TEST_MODE=fast $(PYTHON) -m pytest -m 'not nightly and ((not integration and not e2e) or (integration and critical_path) or (e2e and critical_path))' -n 0 --cov=podcast_scraper --cov-report=term-missing --disable-socket --allow-hosts=127.0.0.1,localhost --durations=20

Can we delay shutdown so workers have time to finish?
pytest-xdist does not expose a “grace period” or “wait N seconds before teardown”. The stall is inside xdist’s master–worker coordination; we can’t inject a delay from outside. So we cannot “delay by a few seconds” in xdist itself.

Does changing the scheduler (e.g. --dist loadfile) fix it?
No. Using a different distribution (e.g. by file instead of by test) is not a fundamental solution. The run can still get stuck, often at a different point (e.g. later in the run). The only reliable workarounds are fewer workers (e.g. 2), no parallelism (-n 0), or a timeout cap.

Bounded run – If you need parallelism and it still hangs, cap the run so it exits instead of hanging forever (Linux/macOS with timeout or gtimeout):

timeout 900 make test-fast   # Linux: exit after 15 min
gtimeout 900 make test-fast # macOS (brew install coreutils)

Doctor command (Issue #379, #429)

The doctor subcommand runs environment and dependency checks so you can fix issues before running the pipeline. Use it after a fresh install, when switching machines, or when you see errors about ffmpeg, Python, or ML models.

How to run

# Standard checks (Python, ffmpeg, permissions, cache, ML imports)
python -m podcast_scraper.cli doctor
# or, if installed as a script:
podcast-scraper doctor

# Also check network connectivity
podcast-scraper doctor --check-network

# Also try loading default Whisper and summarizer models once (slow; validates "can load each model")
podcast-scraper doctor --check-models

What it checks

Check Purpose
Python version Must be 3.10 or higher (same as requires-python in pyproject.toml).
ffmpeg Required for audio processing and Whisper. Must be on PATH and runnable.
Write permissions Creates a test file under ~/.podcast_scraper_test to ensure the process can write.
Model cache directory Verifies the Whisper/Transformers cache dir exists and is writable (e.g. .cache/whisper or ~/.cache/huggingface/hub).
ML dependencies Imports PyTorch, Transformers, Whisper, spaCy and prints versions (does not load models by default).
Network (optional, --check-network) Opens a connection to confirm outbound connectivity.
Model load (optional, --check-models) Loads the default Whisper model and default summarizer model once. Slow; use to confirm models download and load correctly.

Exit codes

  • 0 – All checks passed.
  • 1 – One or more checks failed. Fix the reported issues and run doctor again.

When to use

  • After install – Confirm Python 3.10+, ffmpeg, and (if using ML) cache and dependencies.
  • Before a long run – Catch missing ffmpeg or a read-only cache early.
  • After "model not cached" or import errors – Run doctor then doctor --check-models to verify models load.
  • When debugging CI or another machine – Run doctor in that environment and share the output.

Exit codes and partial failures (Issue #429)

The pipeline exits 0 when the run completes, even if some episodes failed (e.g. 404 audio, timeout). Exit 1 only for run-level failure (bad config, missing ffmpeg, unhandled exception). So echo $? after a run does not tell you whether every episode succeeded—only that the run finished.

To see per-episode results:

  • Open output_dir/run_<suffix>/index.json: each episode has status (ok / failed / skipped) and on failure error_type, error_message, error_stage.
  • Or use run.json in the same directory; it links to index.json via index_file.

Flags:

  • --fail-fast: Stop after the first episode failure (run still exits 0 when it finishes).
  • --max-failures N: Stop after N episode failures (run still exits 0 when it finishes).

See Development Guide - CLI exit codes for the full policy.

ML Dependencies

"Model not cached" Test Skips

Symptom: Tests skip with messages like "Whisper model not cached" or "spaCy model not available".

Solution:

# Preload all ML models (requires network)
make preload-ml-models

# Verify models are cached (project-local cache)
ls -la .cache/whisper/          # Whisper models (tiny.en.pt, etc.)
ls -la .cache/huggingface/hub/  # Transformers models (bart, led)
python -c "import spacy; spacy.load('en_core_web_sm')"

Note: Models are cached in the project-local .cache/ directory, not ~/.cache/. See .cache/README.md for cache structure details.

Backup/Restore: If you need to backup or restore your cache (e.g., when switching machines or after cleanup):

# Backup cache
make backup-cache

# Restore cache (interactive)
make restore-cache

See .cache/README.md for detailed backup/restore instructions.

Whisper Model Download Fails

Symptom: Network errors when loading Whisper models.

Solution:

# Preload models using make target (recommended)
make preload-ml-models

# Or download model manually
python -c "import whisper; whisper.load_model('tiny.en')"

# Check project-local cache
ls .cache/whisper/

# Use smaller model for testing
python3 -m podcast_scraper.cli feed.xml --whisper-model tiny

transformers/torch Import Errors

Symptom: ModuleNotFoundError: No module named 'transformers'

Solution:

# Install ML dependencies

pip install -e ".[ml]"

# Or for development

pip install -e ".[dev,ml]"

Memory Issues with ML Models

Symptom: Tests crash with memory errors, or system becomes unresponsive.

Causes:

  • Multiple tests loading same models in parallel
  • Large models (LED, BART) consuming GPU/CPU memory
  • GPU memory contention when both Whisper and summarization use MPS simultaneously
  • Too many parallel workers for available RAM

MPS Memory Contention (Apple Silicon):

If you're experiencing crashes or memory errors on Apple Silicon when both Whisper and summarization use MPS:

# Enable MPS exclusive mode (default, but verify it's enabled)
export MPS_EXCLUSIVE=1

# Or in config file
# mps_exclusive: true

This serializes GPU work so transcription completes before summarization starts, preventing both models from competing for GPU memory. I/O operations (downloads, parsing) remain parallel.

Memory estimates per test type:

Test Type Per Worker 8 Workers Recommended RAM
Unit ~100 MB ~1 GB 4 GB
Integration ~1-2 GB ~8-16 GB 16 GB
E2E ~1.5-3 GB ~12-24 GB 32 GB

Solutions:

# Reduce parallel workers (default is 8)
PYTEST_WORKERS=4 make test-integration

# Set smaller batch sizes
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

Parallelism configuration:

The Makefile uses a memory-aware calculation that considers both available RAM and CPU:

  • Memory-aware: Calculates workers based on available memory and memory per worker
  • Unit tests: ~100 MB per worker
  • Integration tests: ~1.5 GB per worker (ML models)
  • E2E tests: ~2 GB per worker (full pipeline)
  • CPU-based limit: Reserves 2 cores for system, caps at 8 workers
  • Platform-aware: More conservative on macOS (reduces workers by 1)
  • System reserve: Reserves 4 GB for system operations
  • Override with PYTEST_WORKERS=N environment variable

The calculation uses scripts/tools/calculate_test_workers.py which automatically selects the optimal number of workers based on your system's available resources.

Memory analysis script:

# Analyze memory usage during tests
python scripts/tools/analyze_test_memory.py --test-target test-unit

# With limited workers
python scripts/tools/analyze_test_memory.py --test-target test-integration --max-workers 4

Whisper / Transcription

ffmpeg Not Found

Symptom: Whisper transcription fails silently or with "ffmpeg not found".

Solution:

# macOS

brew install ffmpeg

# Ubuntu/Debian

sudo apt install ffmpeg

# Verify installation

ffmpeg -version

Graphviz (dot) Not Found

Symptom: make visualize or make deps-graph fails with "cannot find 'dot'" or similar.

Cause: Architecture dependency graphs are generated by pydeps, which requires the Graphviz dot binary.

Solution:

# macOS
brew install graphviz

# Ubuntu/Debian
sudo apt install graphviz

# Verify installation
dot -V

Note: Diagrams are not generated in CI; you must run make visualize locally and commit docs/architecture/diagrams/*.svg. CI runs make visualize and fails if diagrams are stale. See Architecture visualizations and the Release checklist.

Episodes Skipped with OpenAI Provider

Symptom: Some episodes are skipped when using the OpenAI transcription provider, with a log message about "exceeds OpenAI API limit (25 MB)".

Cause: The OpenAI Whisper API has a hard limit of 25MB per audio file. To avoid API errors and wasting bandwidth, the system checks the file size before downloading.

Solutions:

  1. Use Local Whisper: The local Whisper provider does not have this file size limit (it's only limited by your system's RAM).
  2. Compress Audio: Use a tool like ffmpeg to reduce the bitrate or convert to a more efficient format (like Mono instead of Stereo) before processing if you must use the OpenAI API.
  3. Wait for Future Feature: Plan for automatic audio preprocessing (downsampling/mono conversion) is in the roadmap.

Ollama Provider Issues

"Ollama server is not running"

Symptom: Error message: "Ollama server is not running. Please start it with: ollama serve"

Solution:

# Start Ollama server (keep terminal open)
ollama serve

# Verify server is running
curl http://localhost:11434/api/tags

# In another terminal, test
ollama list

ollama list Hangs

Symptom: Command hangs with no output.

Cause: Ollama server is not running.

Solution:

# 1. Start Ollama server in separate terminal
ollama serve

# 2. Keep that terminal open, then in another terminal:
ollama list

# 3. If still hangs, check if server is responding
curl http://localhost:11434/api/tags

"Model 'llama3.3:latest' is not available"

Symptom: Error: "Model 'X' is not available in Ollama. Install it with: ollama pull X"

Solution:

# Pull the required model
ollama pull llama3.3:latest

# Verify it's available
ollama list

# Test the model
ollama run llama3.3:latest "Test"

Ollama Process Won't Die

Symptom: Can't kill Ollama process.

Solution:

# Kill by name
pkill ollama

# Or force kill
killall ollama

# Or find and kill manually
ps aux | grep ollama
kill <PID>

# If running as service (macOS)
brew services stop ollama

Slow Performance with Ollama

Symptom: Ollama inference is very slow.

Solutions:

  1. Use smaller model:
ollama pull llama3.1:8b       # Smallest, fastest (6GB+ RAM)
# OR
ollama pull llama3.2:latest   # Medium size (8GB+ RAM)
  1. Increase timeout:
ollama_timeout: 600  # 10 minutes for slow models
  1. Check hardware:
  2. CPU-only inference is slow
  3. Consider GPU acceleration if available
  4. Use model size appropriate for your RAM

See Ollama Provider Guide for detailed troubleshooting.


Test Failures

Tests Pass Locally but Fail in CI

Common causes:

  1. Different Python version - CI uses Python 3.10+
  2. Missing dependencies - CI installs fresh each time
  3. Network calls - CI blocks external network in unit tests
  4. File paths - Hardcoded paths that don't exist in CI

Debug steps:

# Run full CI suite locally

make ci

# Run with same isolation as CI

make test-unit  # Network blocked for unit tests

# Check Python version

python --version

Flaky Tests

Symptom: Tests pass sometimes, fail other times.

Common causes:

  • Race conditions in parallel execution
  • Shared state between tests
  • Network timeouts

Solutions:

# Run serially to identify race conditions

pytest tests/integration/ -x -v --no-header

# Check for shared fixtures

grep -r "scope=" tests/conftest.py

# Add serial marker for problematic tests
# @pytest.mark.serial

Test Hangs with -s Flag

Symptom: Tests hang indefinitely when using -s (no capture) with parallel execution.

Root cause: The -s flag disables pytest's output capturing. When combined with pytest-xdist parallel execution (-n auto), this causes deadlocks because:

  1. Multiple worker processes write to stdout/stderr simultaneously
  2. No buffering means writes can interleave and block
  3. tqdm progress bars (used by Whisper) compete for terminal control
  4. Terminal locking causes processes to wait indefinitely

Files using tqdm:

File Usage
src/podcast_scraper/providers/ml/whisper_utils.py InterceptedTqdm class
src/podcast_scraper/transcription/whisper_provider.py InterceptedTqdm class
src/podcast_scraper/providers/ml/ml_provider.py InterceptedTqdm class
src/podcast_scraper/cli.py _TqdmProgress class

Structural Fix: Tests set TQDM_DISABLE=1 environment variable in tests/conftest.py to disable all tqdm progress bars during test execution.

Workarounds:

# Use -v instead of -s (provides verbose output without hang)
pytest tests/unit/ -v

# Disable parallelism when using -s
pytest tests/unit/ -s -n 0

# Use sequential Makefile targets
make test-unit-sequential

# Use --tb=short for better error output
pytest tests/unit/ --tb=short

# For debugging, use --pdb instead
pytest tests/unit/ --pdb

CI/CD Issues

Pre-commit Hooks Failing

Symptom: Commits rejected by pre-commit hooks.

Solution:

# Run formatters

make format

# Fix markdown

make fix-md

# Run all checks

make lint

Documentation Build Fails

Symptom: mkdocs build fails with import errors.

Solution:

# Install docs dependencies

pip install mkdocs mkdocs-material pymdown-extensions mkdocstrings mkdocstrings-python

# For API docs that import the package

pip install -e ".[ml]"

Coverage Below Threshold

Symptom: CI fails with "Coverage below 70%".

Solution:

# Check current coverage

make test-unit
open htmlcov/index.html

# Identify uncovered code

coverage report --show-missing

Development Environment

Virtual Environment Issues

Symptom: Wrong Python version or packages not found.

Solution:

# Create fresh venv

rm -rf .venv
python3.10 -m venv .venv
source .venv/bin/activate

# Reinstall everything

make init

Import Errors After Git Pull

Symptom: ImportError or ModuleNotFoundError after pulling changes.

Solution:

# Reinstall package in editable mode

pip install -e ".[dev,ml]"

# Or use make target

make init

mypy Type Errors

Symptom: make type fails with type errors.

Common fixes:

# Update type stubs

pip install --upgrade types-requests types-PyYAML

# Check specific file

mypy src/podcast_scraper/your_file.py --show-error-codes

Runtime Issues

Configuration Not Loading

Symptom: CLI ignores config file settings.

Debug steps:

# Validate config file

python -c "import yaml; yaml.safe_load(open('config.yaml'))"

# Check for typos in keys

cat config.yaml

# Use verbose mode

python3 -m podcast_scraper.cli --config config.yaml -v

Output Directory Errors

Symptom: "Permission denied" or "Directory not found".

Solution:

# Check directory exists and is writable

ls -la /path/to/output/
mkdir -p /path/to/output/

# Use absolute path

python3 -m podcast_scraper.cli feed.xml --output-dir /absolute/path/

RSS Feed Parsing Errors

Symptom: "Invalid feed" or no episodes found.

Debug steps:

# Check feed is accessible

curl -I "https://example.com/feed.xml"

# Validate RSS format

python -c "import feedparser; print(feedparser.parse('https://example.com/feed.xml'))"

Speaker Detection Issues

Organization Names in RSS Feeds (Issue #393)

Symptom: Speaker detection returns organization names (e.g., "NPR", "BBC") instead of actual host names, or no hosts are detected.

Cause: RSS feed author tags may contain organization/publisher names rather than actual host names. The system automatically filters out organization names that match common patterns (all caps, short, no spaces).

How it works:

  • The system checks RSS author tags for organization patterns:
  • All uppercase (e.g., "NPR", "BBC", "CNN")
  • Short length (≤10 characters)
  • No spaces
  • Organization names are logged as "publisher metadata" and excluded from host detection
  • The system falls back to NER extraction from feed title/description if author tags only contain organizations

Solutions:

  1. Use manual speaker names (if automatic detection fails):
python3 -m podcast_scraper.cli feed.xml --speaker-names "Host Name" "Guest"
  1. Check debug logs to see what was detected:
export LOG_LEVEL=DEBUG
python3 -m podcast_scraper.cli feed.xml

Look for messages like: "RSS author 'NPR' appears to be an organization name"

  1. Verify RSS feed metadata - Some feeds have proper author tags with actual host names, while others only have publisher information.

Note: This is expected behavior - organization names are intentionally filtered out because they represent publishers, not actual speakers. The system prioritizes person names over organization names for speaker detection.



Getting Help

If your issue isn't covered here:

  1. Run doctor to capture environment state:
podcast-scraper doctor --check-network > doctor_output.txt 2>&1
# Optionally include model load check (slow):
podcast-scraper doctor --check-models >> doctor_output.txt 2>&1

Attach doctor_output.txt when opening an issue.

  1. Search existing issues: GitHub Issues

  2. Check logs:

# Enable debug logging
export LOG_LEVEL=DEBUG
python3 -m podcast_scraper.cli ...
  1. Open a new issue with:
  2. Python version (python --version)
  3. OS and version
  4. Full error message/traceback
  5. Steps to reproduce
  6. Doctor output if relevant (see step 1)