Segfault Mitigation Guide¶

This guide provides strategies for diagnosing and mitigating segmentation faults that occur during pipeline execution, particularly at process shutdown.

Common Causes¶

Segfaults at the end of successful pipeline runs are typically caused by:

PyTorch MPS (Metal Performance Shaders) teardown issues
MPS backend cleanup can trigger segfaults during interpreter shutdown
Especially common with Transformers models on Apple Silicon
Native extension cleanup order
PyTorch, Transformers, spaCy, and Thinc all have native extensions
Destructor ordering during shutdown can cause double-free or use-after-free
Threading + native library interactions
Worker threads holding references to native objects
Cleanup happening in wrong order across threads

Diagnostic Tools¶

Enable Faulthandler (Automatic)¶

Faulthandler is automatically enabled when running the CLI. It provides native backtraces when crashes occur.

To enable manually:

export PYTHONFAULTHANDLER=1
python -m podcast_scraper.cli ...

Or in code:

import faulthandler
faulthandler.enable(all_threads=True)

Check Crash Dump¶

If a crash occurs, check for crash_dump_<pid>.log in the current directory for a backtrace.

Mitigation Strategies (Try in Order)¶

Option 0: Enable MPS Exclusive Mode (Prevent Memory Contention)¶

If both Whisper and summarization use MPS, enable exclusive mode to serialize GPU work and prevent memory contention:

# config.yaml
mps_exclusive: true  # Default: true

Or via environment variable:

export MPS_EXCLUSIVE=1

This ensures transcription completes before summarization starts on MPS, preventing both models from competing for GPU memory. I/O operations (downloads, parsing) remain parallel. This is enabled by default and can help prevent crashes from memory pressure.

When to use: Always enabled by default. Disable (mps_exclusive: false) only if you have sufficient GPU memory and want maximum throughput with concurrent GPU operations.

Option 1: Disable MPS for Summarization (Most Common Fix)¶

Keep Whisper on MPS if stable, but move Transformers summarization to CPU:

# config.yaml
summary_device: cpu

If the segfault disappears, it's almost certainly MPS + Transformers teardown.

Option 2: Run Everything on CPU (Sanity Check)¶

This is the "is it MPS?" test:

# config.yaml
whisper_device: cpu
summary_device: cpu

If CPU run exits cleanly: MPS is involved.

Option 3: Force Safer Threading Settings¶

Set environment variables before running:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export TOKENIZERS_PARALLELISM=false
python -m podcast_scraper.cli ...

This often helps when native libs + threads die at exit.

Option 4: Don't Cleanup Models Explicitly¶

Sometimes explicit del model; gc.collect() + torch.mps.empty_cache() at the very end triggers unstable finalizers.

Try:

Cleanup after each episode (or after each stage) rather than at process shutdown
Skip cleanup entirely and let the process exit (counterintuitive, but avoids double-free / destructor ordering bugs)

Option 5: Isolate Summarization into a Subprocess¶

If stability matters more than performance:

Run the summarization step in a separate Python process
Return only the summary text to the main process

If it segfaults, it won't take down the main run (and you can retry).

Getting Actionable Crash Information¶

Enable Faulthandler with File Output¶

PYTHONFAULTHANDLER=1 python -m podcast_scraper.cli ... 2>&1 | tee run.log

Check crash_dump_<pid>.log for native backtrace.

Check Last Log Lines¶

Paste the very last few log lines before the segfault (right after cleanup starts) to identify the most likely culprit:

Whisper teardown: Look for Whisper model cleanup logs
spaCy/thinc: Look for NER model cleanup logs
Transformers/MPS: Look for summary model cleanup logs

Environment Variables Summary¶

# Threading limits (reduces teardown weirdness)
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export TOKENIZERS_PARALLELISM=false

# Faulthandler (crash diagnostics)
export PYTHONFAULTHANDLER=1

# Run pipeline
python -m podcast_scraper.cli ...