Troubleshooting Guide¶

Common issues and solutions for podcast_scraper development and usage.

Quick Diagnosis¶

Symptom	Likely Cause	Solution
Tests skip with "model not cached"	ML models not preloaded	`make preload-ml-models`
`ModuleNotFoundError: transformers`	Missing ML dependencies	`pip install -e ".[ml]"`
Whisper fails silently	ffmpeg not installed	Install ffmpeg (see below)
`make visualize` fails / "cannot find dot"	Graphviz not installed	Install Graphviz (see below)
CI fails but local passes	Different Python/dependency versions	`make ci` locally
Memory errors in tests	ML models loading repeatedly	Use `@pytest.mark.serial`
Import errors after pull	Dependencies changed	`pip install -e ".[dev,ml]"`
Tests hang with `-s` flag	tqdm + parallel execution deadlock	Use `-v` instead, or `-n 0`
OpenAI episodes skipped	Audio file size > 25MB	Use local Whisper or compress audio
Pipeline hangs on transcription	No timeout configured	Set `transcription_timeout` in config
Pipeline hangs on summarization	No timeout configured	Set `summarization_timeout` in config
Need structured logs	Default logging format	Use `--json-logs` flag
Want to stop on first failure	Default continues on errors	Use `--fail-fast` flag
Want to limit failures	Default has no limit	Use `--max-failures N` flag
`make test-fast` or `make ci-fast` hangs at ~87%	pytest-xdist stall near end of run	`TEST_FAST_WORKERS=2 make test-fast` (or `make ci-fast`)
Unsure if environment is ready	Python, ffmpeg, cache, or models missing	Run `podcast-scraper doctor` (see below)

test-fast / ci-fast hangs at ~87%¶

Symptom: make test-fast or make ci-fast sometimes hangs around 80–90% and never finishes (or takes a very long time). Other times the same run completes.

Cause: Known pytest-xdist behavior: with parallel workers, the run can stall near completion (worker coordination at end of suite). More likely with higher worker counts.

Workaround: Run with fewer workers so the stall is less likely:

# Use 2 workers for the fast test suite (slower but avoids hang)
TEST_FAST_WORKERS=2 make test-fast

# Same for full fast CI
TEST_FAST_WORKERS=2 make ci-fast

Alternative: Run the test phase without parallelism (slow but reliable):

make format-check lint type security
E2E_TEST_MODE=fast $(PYTHON) -m pytest -m 'not nightly and ((not integration and not e2e) or (integration and critical_path) or (e2e and critical_path))' -n 0 --cov=podcast_scraper --cov-report=term-missing --disable-socket --allow-hosts=127.0.0.1,localhost --durations=20

Can we delay shutdown so workers have time to finish?
pytest-xdist does not expose a “grace period” or “wait N seconds before teardown”. The stall is inside xdist’s master–worker coordination; we can’t inject a delay from outside. So we cannot “delay by a few seconds” in xdist itself.

Does changing the scheduler (e.g. --dist loadfile) fix it?
No. Using a different distribution (e.g. by file instead of by test) is not a fundamental solution. The run can still get stuck, often at a different point (e.g. later in the run). The only reliable workarounds are fewer workers (e.g. 2), no parallelism (-n 0), or a timeout cap.

Bounded run – If you need parallelism and it still hangs, cap the run so it exits instead of hanging forever (Linux/macOS with timeout or gtimeout):

timeout 900 make test-fast   # Linux: exit after 15 min
gtimeout 900 make test-fast # macOS (brew install coreutils)

Doctor command (Issue #379, #429)¶

The doctor subcommand runs environment and dependency checks so you can fix issues before running the pipeline. Use it after a fresh install, when switching machines, or when you see errors about ffmpeg, Python, or ML models.

How to run¶

# Standard checks (Python, ffmpeg, permissions, cache, ML imports)
python -m podcast_scraper.cli doctor
# or, if installed as a script:
podcast-scraper doctor

# Also check network connectivity
podcast-scraper doctor --check-network

# Also try loading default Whisper and summarizer models once (slow; validates "can load each model")
podcast-scraper doctor --check-models

What it checks¶

Check	Purpose
Python version	Must be 3.10 or higher (same as `requires-python` in pyproject.toml).
ffmpeg	Required for audio processing and Whisper. Must be on PATH and runnable.
Write permissions	Creates a test file under `~/.podcast_scraper_test` to ensure the process can write.
Model cache directory	Verifies the Whisper/Transformers cache dir exists and is writable (e.g. `.cache/whisper` or `~/.cache/huggingface/hub`).
ML dependencies	Imports PyTorch, Transformers, Whisper, spaCy and prints versions (does not load models by default).
Network (optional, `--check-network`)	Opens a connection to confirm outbound connectivity.
Model load (optional, `--check-models`)	Loads the default Whisper model and default summarizer model once. Slow; use to confirm models download and load correctly.

Exit codes¶

0 – All checks passed.
1 – One or more checks failed. Fix the reported issues and run doctor again.

When to use¶

After install – Confirm Python 3.10+, ffmpeg, and (if using ML) cache and dependencies.
Before a long run – Catch missing ffmpeg or a read-only cache early.
After "model not cached" or import errors – Run doctor then doctor --check-models to verify models load.
When debugging CI or another machine – Run doctor in that environment and share the output.

Exit codes and partial failures (Issue #429)¶

The pipeline exits 0 when the run completes, even if some episodes failed (e.g. 404 audio, timeout). Exit 1 only for run-level failure (bad config, missing ffmpeg, unhandled exception). So echo $? after a run does not tell you whether every episode succeeded—only that the run finished.

To see per-episode results:

Open output_dir/run_<suffix>/index.json: each episode has status (ok / failed / skipped) and on failure error_type, error_message, error_stage.
Or use run.json in the same directory; it links to index.json via index_file.

Flags:

--fail-fast: Stop after the first episode failure (run still exits 0 when it finishes).
--max-failures N: Stop after N episode failures (run still exits 0 when it finishes).

See Development Guide - CLI exit codes for the full policy.

ML Dependencies¶

"Model not cached" Test Skips¶

Symptom: Tests skip with messages like "Whisper model not cached" or "spaCy model not available".

Solution:

# Preload all ML models (requires network)
make preload-ml-models

# Verify models are cached (project-local cache)
ls -la .cache/whisper/          # Whisper models (tiny.en.pt, etc.)
ls -la .cache/huggingface/hub/  # Transformers models (bart, led)
python -c "import spacy; spacy.load('en_core_web_sm')"

Note: Models are cached in the project-local .cache/ directory, not ~/.cache/. See .cache/README.md for cache structure details.

Backup/Restore: If you need to backup or restore your cache (e.g., when switching machines or after cleanup):

# Backup cache
make backup-cache

# Restore cache (interactive)
make restore-cache

See .cache/README.md for detailed backup/restore instructions.

Whisper Model Download Fails¶

Symptom: Network errors when loading Whisper models.

Solution:

# Preload models using make target (recommended)
make preload-ml-models

# Or download model manually
python -c "import whisper; whisper.load_model('tiny.en')"

# Check project-local cache
ls .cache/whisper/

# Use smaller model for testing
python3 -m podcast_scraper.cli feed.xml --whisper-model tiny

transformers/torch Import Errors¶

Symptom: ModuleNotFoundError: No module named 'transformers'

Solution:

# Install ML dependencies

pip install -e ".[ml]"

# Or for development

pip install -e ".[dev,ml]"

Memory Issues with ML Models¶

Symptom: Tests crash with memory errors, or system becomes unresponsive.

Causes:

Multiple tests loading same models in parallel
Large models (LED, BART) consuming GPU/CPU memory
GPU memory contention when both Whisper and summarization use MPS simultaneously
Too many parallel workers for available RAM

MPS Memory Contention (Apple Silicon):

If you're experiencing crashes or memory errors on Apple Silicon when both Whisper and summarization use MPS:

# Enable MPS exclusive mode (default, but verify it's enabled)
export MPS_EXCLUSIVE=1

# Or in config file
# mps_exclusive: true

This serializes GPU work so transcription completes before summarization starts, preventing both models from competing for GPU memory. I/O operations (downloads, parsing) remain parallel.

Memory estimates per test type:

Test Type	Per Worker	8 Workers	Recommended RAM
Unit	~100 MB	~1 GB	4 GB
Integration	~1-2 GB	~8-16 GB	16 GB
E2E	~1.5-3 GB	~12-24 GB	32 GB

Solutions:

# Reduce parallel workers (default is 8)
PYTEST_WORKERS=4 make test-integration

# Set smaller batch sizes
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

Parallelism configuration:

The Makefile uses a memory-aware calculation that considers both available RAM and CPU:

Memory-aware: Calculates workers based on available memory and memory per worker
Unit tests: ~100 MB per worker
Integration tests: ~1.5 GB per worker (ML models)
E2E tests: ~2 GB per worker (full pipeline)
CPU-based limit: Reserves 2 cores for system, caps at 8 workers
Platform-aware: More conservative on macOS (reduces workers by 1)
System reserve: Reserves 4 GB for system operations
Override with PYTEST_WORKERS=N environment variable

The calculation uses scripts/tools/calculate_test_workers.py which automatically selects the optimal number of workers based on your system's available resources.

Memory analysis script:

# Analyze memory usage during tests
python scripts/tools/analyze_test_memory.py --test-target test-unit

# With limited workers
python scripts/tools/analyze_test_memory.py --test-target test-integration --max-workers 4

Whisper / Transcription¶

ffmpeg Not Found¶

Symptom: Whisper transcription fails silently or with "ffmpeg not found".

Solution:

# macOS

brew install ffmpeg

# Ubuntu/Debian

sudo apt install ffmpeg

# Verify installation

ffmpeg -version

Graphviz (dot) Not Found¶

Symptom: make visualize or make deps-graph fails with "cannot find 'dot'" or similar.

Cause: Architecture dependency graphs are generated by pydeps, which requires the Graphviz dot binary.

Solution:

# macOS
brew install graphviz

# Ubuntu/Debian
sudo apt install graphviz

# Verify installation
dot -V

Note: Diagrams are not generated in CI; you must run make visualize locally and commit docs/architecture/diagrams/*.svg. CI runs make visualize and fails if diagrams are stale. See Architecture visualizations and the Release checklist.

Episodes Skipped with OpenAI Provider¶

Symptom: Some episodes are skipped when using the OpenAI transcription provider, with a log message about "exceeds OpenAI API limit (25 MB)".

Cause: The OpenAI Whisper API has a hard limit of 25MB per audio file. To avoid API errors and wasting bandwidth, the system checks the file size before downloading.

Solutions:

Use Local Whisper: The local Whisper provider does not have this file size limit (it's only limited by your system's RAM).
Compress Audio: Use a tool like ffmpeg to reduce the bitrate or convert to a more efficient format (like Mono instead of Stereo) before processing if you must use the OpenAI API.
Wait for Future Feature: Plan for automatic audio preprocessing (downsampling/mono conversion) is in the roadmap.

Ollama Provider Issues¶

"Ollama server is not running"¶

Symptom: Error message: "Ollama server is not running. Please start it with: ollama serve"

Solution:

# Start Ollama server (keep terminal open)
ollama serve

# Verify server is running
curl http://localhost:11434/api/tags

# In another terminal, test
ollama list

`ollama list` Hangs¶

Symptom: Command hangs with no output.

Cause: Ollama server is not running.

Solution:

# 1. Start Ollama server in separate terminal
ollama serve

# 2. Keep that terminal open, then in another terminal:
ollama list

# 3. If still hangs, check if server is responding
curl http://localhost:11434/api/tags

"Model 'llama3.3:latest' is not available"¶

Symptom: Error: "Model 'X' is not available in Ollama. Install it with: ollama pull X"

Solution:

# Pull the required model
ollama pull llama3.3:latest

# Verify it's available
ollama list

# Test the model
ollama run llama3.3:latest "Test"

Ollama Process Won't Die¶

Symptom: Can't kill Ollama process.

Solution:

# Kill by name
pkill ollama

# Or force kill
killall ollama

# Or find and kill manually
ps aux | grep ollama
kill <PID>

# If running as service (macOS)
brew services stop ollama

Slow Performance with Ollama¶

Symptom: Ollama inference is very slow.

Solutions:

Use smaller model:

ollama pull llama3.1:8b       # Smallest, fastest (6GB+ RAM)
# OR
ollama pull llama3.2:latest   # Medium size (8GB+ RAM)

Increase timeout:

ollama_timeout: 600  # 10 minutes for slow models

Check hardware:
CPU-only inference is slow
Consider GPU acceleration if available
Use model size appropriate for your RAM

See Ollama Provider Guide for detailed troubleshooting.

Test Failures¶

Tests Pass Locally but Fail in CI¶

Common causes:

Different Python version - CI uses Python 3.10+
Missing dependencies - CI installs fresh each time
Network calls - CI blocks external network in unit tests
File paths - Hardcoded paths that don't exist in CI

Debug steps:

# Run full CI suite locally

make ci

# Run with same isolation as CI

make test-unit  # Network blocked for unit tests

# Check Python version

python --version

Flaky Tests¶

Symptom: Tests pass sometimes, fail other times.

Common causes:

Race conditions in parallel execution
Shared state between tests
Network timeouts

Solutions:

# Run serially to identify race conditions

pytest tests/integration/ -x -v --no-header

# Check for shared fixtures

grep -r "scope=" tests/conftest.py

# Add serial marker for problematic tests
# @pytest.mark.serial

Test Hangs with `-s` Flag¶

Symptom: Tests hang indefinitely when using -s (no capture) with parallel execution.

Root cause: The -s flag disables pytest's output capturing. When combined with pytest-xdist parallel execution (-n auto), this causes deadlocks because:

Multiple worker processes write to stdout/stderr simultaneously
No buffering means writes can interleave and block
tqdm progress bars (used by Whisper) compete for terminal control
Terminal locking causes processes to wait indefinitely

Files using tqdm:

File	Usage
`src/podcast_scraper/providers/ml/whisper_utils.py`	`InterceptedTqdm` class
`src/podcast_scraper/transcription/whisper_provider.py`	`InterceptedTqdm` class
`src/podcast_scraper/providers/ml/ml_provider.py`	`InterceptedTqdm` class
`src/podcast_scraper/cli.py`	`_TqdmProgress` class

Structural Fix: Tests set TQDM_DISABLE=1 environment variable in tests/conftest.py to disable all tqdm progress bars during test execution.

Workarounds:

# Use -v instead of -s (provides verbose output without hang)
pytest tests/unit/ -v

# Disable parallelism when using -s
pytest tests/unit/ -s -n 0

# Use sequential Makefile targets
make test-unit-sequential

# Use --tb=short for better error output
pytest tests/unit/ --tb=short

# For debugging, use --pdb instead
pytest tests/unit/ --pdb

CI/CD Issues¶

Pre-commit Hooks Failing¶

Symptom: Commits rejected by pre-commit hooks.

Solution:

# Run formatters

make format

# Fix markdown

make fix-md

# Run all checks

make lint

Documentation Build Fails¶

Symptom: mkdocs build fails with import errors.

Solution:

# Install docs dependencies

pip install mkdocs mkdocs-material pymdown-extensions mkdocstrings mkdocstrings-python

# For API docs that import the package

pip install -e ".[ml]"

Coverage Below Threshold¶

Symptom: CI fails with "Coverage below 70%".

Solution:

# Check current coverage

make test-unit
open htmlcov/index.html

# Identify uncovered code

coverage report --show-missing

Development Environment¶

Virtual Environment Issues¶

Symptom: Wrong Python version or packages not found.

Solution:

# Create fresh venv

rm -rf .venv
python3.10 -m venv .venv
source .venv/bin/activate

# Reinstall everything

make init

Import Errors After Git Pull¶

Symptom: ImportError or ModuleNotFoundError after pulling changes.

Solution:

# Reinstall package in editable mode

pip install -e ".[dev,ml]"

# Or use make target

make init

mypy Type Errors¶

Symptom: make type fails with type errors.

Common fixes:

# Update type stubs

pip install --upgrade types-requests types-PyYAML

# Check specific file

mypy src/podcast_scraper/your_file.py --show-error-codes

Runtime Issues¶

Configuration Not Loading¶

Symptom: CLI ignores config file settings.

Debug steps:

# Validate config file

python -c "import yaml; yaml.safe_load(open('config.yaml'))"

# Check for typos in keys

cat config.yaml

# Use verbose mode

python3 -m podcast_scraper.cli --config config.yaml -v

Output Directory Errors¶

Symptom: "Permission denied" or "Directory not found".

Solution:

# Check directory exists and is writable

ls -la /path/to/output/
mkdir -p /path/to/output/

# Use absolute path

python3 -m podcast_scraper.cli feed.xml --output-dir /absolute/path/

RSS Feed Parsing Errors¶

Symptom: "Invalid feed" or no episodes found.

Debug steps:

# Check feed is accessible

curl -I "https://example.com/feed.xml"

# Validate RSS format

python -c "import feedparser; print(feedparser.parse('https://example.com/feed.xml'))"

Speaker Detection Issues¶

Organization Names in RSS Feeds (Issue #393)¶

Symptom: Speaker detection returns organization names (e.g., "NPR", "BBC") instead of actual host names, or no hosts are detected.

Cause: RSS feed author tags may contain organization/publisher names rather than actual host names. The system automatically filters out organization names that match common patterns (all caps, short, no spaces).

How it works:

The system checks RSS author tags for organization patterns:
All uppercase (e.g., "NPR", "BBC", "CNN")
Short length (≤10 characters)
No spaces
Organization names are logged as "publisher metadata" and excluded from host detection
The system falls back to NER extraction from feed title/description if author tags only contain organizations

Solutions:

Use manual speaker names (if automatic detection fails):

python3 -m podcast_scraper.cli feed.xml --speaker-names "Host Name" "Guest"

Check debug logs to see what was detected:

export LOG_LEVEL=DEBUG
python3 -m podcast_scraper.cli feed.xml

Look for messages like: "RSS author 'NPR' appears to be an organization name"

Verify RSS feed metadata - Some feeds have proper author tags with actual host names, while others only have publisher information.

Note: This is expected behavior - organization names are intentionally filtered out because they represent publishers, not actual speakers. The system prioritizes person names over organization names for speaker detection.

Getting Help¶

If your issue isn't covered here:

Run doctor to capture environment state:

podcast-scraper doctor --check-network > doctor_output.txt 2>&1
# Optionally include model load check (slow):
podcast-scraper doctor --check-models >> doctor_output.txt 2>&1

Attach doctor_output.txt when opening an issue.

Search existing issues: GitHub Issues
Check logs:

# Enable debug logging
export LOG_LEVEL=DEBUG
python3 -m podcast_scraper.cli ...

Open a new issue with:
Python version (python --version)
OS and version
Full error message/traceback
Steps to reproduce
Doctor output if relevant (see step 1)

Testing Guide - Test execution and debugging
Development Guide - Environment setup
CI/CD - Pipeline configuration
Dependencies Guide - Package management

Troubleshooting Guide¶

Quick Diagnosis¶

test-fast / ci-fast hangs at ~87%¶

Doctor command (Issue #379, #429)¶

How to run¶

What it checks¶

Exit codes¶

When to use¶

Exit codes and partial failures (Issue #429)¶

ML Dependencies¶

"Model not cached" Test Skips¶

Whisper Model Download Fails¶

transformers/torch Import Errors¶

Memory Issues with ML Models¶

Whisper / Transcription¶

ffmpeg Not Found¶

Graphviz (dot) Not Found¶

Episodes Skipped with OpenAI Provider¶

Ollama Provider Issues¶

"Ollama server is not running"¶

ollama list Hangs¶

"Model 'llama3.3:latest' is not available"¶

Ollama Process Won't Die¶

Slow Performance with Ollama¶

Test Failures¶

Tests Pass Locally but Fail in CI¶

Flaky Tests¶

Test Hangs with -s Flag¶

CI/CD Issues¶

Pre-commit Hooks Failing¶

Documentation Build Fails¶

Coverage Below Threshold¶

Development Environment¶

Virtual Environment Issues¶

Import Errors After Git Pull¶

mypy Type Errors¶

Runtime Issues¶

Configuration Not Loading¶

Output Directory Errors¶

RSS Feed Parsing Errors¶

Speaker Detection Issues¶

Organization Names in RSS Feeds (Issue #393)¶

Getting Help¶

Related Documentation¶

`ollama list` Hangs¶

Test Hangs with `-s` Flag¶