E2E Testing Guide¶
See also:
- Testing Strategy - High-level testing philosophy and test pyramid
- Testing Guide - Quick reference and test execution commands
This guide covers E2E test implementation: real HTTP client, E2E server, ML model usage, and OpenAI mock endpoints.
Overview¶
E2E tests test complete user workflows with real implementations. No mocking allowed (except network isolation).
| Aspect | Requirement |
|---|---|
| Speed | < 60 seconds per test |
| Scope | Complete user workflow |
| Entry points | CLI commands, run_pipeline(), service.run() |
| HTTP | Real client with local E2E server |
| Filesystem | Real file operations |
| ML Models | Real (Whisper, spaCy, Transformers) - NO mocks |
Core Principle: No Mocking¶
E2E tests use real implementations throughout:
- ✅ Real HTTP client (with local server)
- ✅ Real filesystem I/O
- ✅ Real ML models (Whisper, spaCy, Transformers)
- ✅ Real providers (MLProvider, OpenAIProvider)
- ❌ No external network (blocked by network guard)
- ❌ No Whisper mocks
- ❌ No ML model mocks
E2E Server¶
The e2e_server fixture provides a local HTTP server serving test fixtures:
def test_basic_workflow(e2e_server):
# Get URLs for test resources
rss_url = e2e_server.urls.feed("podcast1")
audio_url = e2e_server.urls.audio("p01_e01")
transcript_url = e2e_server.urls.transcript("p01_e01")
# Run complete workflow
result = run_pipeline(rss_url, output_dir)
assert result.success
Available URLs¶
| Method | Returns |
|---|---|
e2e_server.urls.feed(podcast_name) |
RSS feed URL (e.g. /feeds/podcast1/feed.xml) |
e2e_server.urls.audio(episode_id) |
Audio file URL (e.g. /audio/p01_e01.mp3) |
e2e_server.urls.transcript(episode_id) |
Transcript URL (e.g. /transcripts/p01_e01.txt) |
e2e_server.urls.base() |
Server base URL |
e2e_server.urls.openai_api_base() |
OpenAI mock API base (/v1) |
e2e_server.urls.gemini_api_base() |
Gemini mock API base (/v1beta) |
e2e_server.urls.mistral_api_base() |
Mistral mock API base (/v1) |
e2e_server.urls.grok_api_base() |
Grok mock API base (/v1) |
e2e_server.urls.deepseek_api_base() |
DeepSeek mock API base (/v1) |
e2e_server.urls.ollama_api_base() |
Ollama mock API base (/v1) |
e2e_server.urls.anthropic_api_base() |
Anthropic mock API base (base URL, no /v1) |
E2E Feeds (RSS)¶
Feed names and RSS file mapping. Which feed name you can use depends on test mode (see Test Modes).
Full fixtures (used in data_quality and nightly; mapping from PODCAST_RSS_MAP):
| Feed name | RSS file | Description |
|---|---|---|
podcast1 |
p01_mtb.xml |
Main podcast (MTB) |
podcast2 |
p02_software.xml |
Software podcast |
podcast3 |
p03_scuba.xml |
Scuba podcast |
podcast4 |
p04_photo.xml |
Photo podcast |
podcast5 |
p05_investing.xml |
Investing podcast |
edgecases |
p06_edge_cases.xml |
Edge-case episodes |
podcast1_multi_episode |
p01_multi.xml |
5 short episodes (multi-episode tests) |
podcast9_solo |
p09_biohacking.xml |
Solo speaker (host only) |
podcast7_sustainability |
p07_sustainability.xml |
Long-form (~15k words; Issue #283) |
podcast8_solar |
p08_solar.xml |
Long-form (~20k words; Issue #283) |
Fast fixtures (used in fast and multi_episode when set_use_fast_fixtures(True); mapping from PODCAST_RSS_MAP_FAST):
| Feed name | RSS file | Description |
|---|---|---|
podcast1 |
p01_fast.xml |
1 short episode (Path 2: transcription) |
podcast1_with_transcript |
p01_fast_with_transcript.xml |
1 episode with transcript URL (Path 1: download) |
podcast1_multi_episode |
p01_multi.xml |
Same 5-episode feed |
podcast9_solo |
p09_biohacking.xml |
Solo speaker |
podcast7_sustainability |
p07_sustainability.xml |
Long-form |
podcast8_solar |
p08_solar.xml |
Long-form |
Allowed feeds per test mode¶
Set automatically by conftest from E2E_TEST_MODE.
| Mode | Allowed feed names |
|---|---|
fast |
podcast1, podcast1_with_transcript, podcast1_multi_episode, podcast9_solo, podcast7_sustainability, podcast8_solar |
multi_episode |
podcast1_multi_episode, podcast1_with_transcript, edgecases, podcast7_sustainability, podcast8_solar |
nightly |
podcast1, podcast2, podcast3, podcast4, podcast5 (full fixtures) |
data_quality |
All feeds (None = allow all) |
Use e2e_server.urls.feed("podcast1_multi_episode") etc. Only feeds in the allowed set for the current mode are served; others return 404.
E2E Server Options¶
The e2e_server fixture (and the handler class) support these options for controlling behavior:
Error injection (chaos / failure testing):
| Method | Description |
|---|---|
e2e_server.set_error_behavior(url_path, status, delay=0.0) |
For a given path (e.g. "/audio/p01_multi_e03.mp3"), return HTTP status (e.g. 404, 500). Optional delay in seconds. |
e2e_server.clear_error_behavior(url_path) |
Remove error behavior for that path. |
e2e_server.reset() |
Clear all error behaviors and set allowed podcasts to None. |
Example: simulate 404 on audio so the run index records a failed episode:
e2e_server.set_error_behavior("/audio/p01_multi_e03.mp3", 404)
# ... run pipeline ...
# assert index.json has one failed episode with error_type, error_message, error_stage
e2e_server.clear_error_behavior("/audio/p01_multi_e03.mp3")
Allowed podcasts (advanced):
| Method | Description |
|---|---|
e2e_server.set_allowed_podcasts(podcasts) |
Restrict which feed names are served. podcasts: set of names or None for all. Normally set by conftest from E2E_TEST_MODE. |
Fixture mode:
- When fast fixtures are on, feeds resolve via
PODCAST_RSS_MAP_FAST(e.g.podcast1→p01_fast.xml). - When off (e.g. nightly/data_quality), feeds use
PODCAST_RSS_MAP(e.g.podcast1→p01_mtb.xml). - Conftest sets this from
E2E_TEST_MODE; teardown clears error behaviors and resets fast-fixtures mode.
Served Content¶
Content is served from tests/fixtures/:
- RSS feeds:
tests/fixtures/rss/*.xml - Audio files:
tests/fixtures/audio/*.mp3 - Transcripts:
tests/fixtures/transcripts/*.txt
OpenAI Mock Endpoints¶
For API providers (OpenAI), the E2E server provides mock endpoints:
def test_openai_provider(e2e_server):
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
transcription_provider="openai",
openai_api_key="sk-test123",
openai_api_base=e2e_server.urls.openai_api_base(), # Use mock
)
result = run_pipeline(cfg)
assert result.success
Mock Endpoints¶
| Endpoint | Purpose |
|---|---|
/v1/chat/completions |
Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment) |
/v1/audio/transcriptions |
Transcription |
/v1/messages (Anthropic) |
Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment) |
/v1beta/models/{model}:generateContent (Gemini) |
Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment) |
See tests/e2e/fixtures/e2e_http_server.py for implementation.
ML Model Usage¶
E2E tests use real ML models - no mocking allowed.
Test Model Defaults¶
Tests use smaller, faster models for speed:
| Component | Test Model | Production Model |
|---|---|---|
| Whisper | tiny.en |
base.en |
| spaCy | en_core_web_sm |
en_core_web_sm |
| Transformers MAP | facebook/bart-base |
facebook/bart-large-cnn |
| Transformers REDUCE | allenai/led-base-16384 |
allenai/led-large-16384 |
Model Cache Requirements¶
Tests require models to be pre-cached:
# Preload all required models
make preload-ml-models
Use cache helpers to skip gracefully if not cached:
from tests.integration.ml_model_cache_helpers import (
require_whisper_model_cached,
require_transformers_model_cached,
)
def test_with_real_models(e2e_server):
require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
# Test with real models...
Network Guard¶
E2E tests use network isolation to prevent external calls:
pytest tests/e2e/ --disable-socket --allow-hosts=127.0.0.1,localhost
If a test attempts external network access:
SocketBlockedError: A]socket.socket call was blocked
Test Patterns¶
CLI E2E Test¶
@pytest.mark.e2e
def test_cli_transcript_download(e2e_server, tmp_path):
"""Test CLI transcript download command."""
rss_url = e2e_server.urls.feed("podcast1_with_transcript")
result = subprocess.run([
"podcast-scraper", rss_url,
"--output-dir", str(tmp_path),
], capture_output=True)
assert result.returncode == 0
assert (tmp_path / "0001 - Episode 1.txt").exists()
Library API E2E Test¶
@pytest.mark.e2e
def test_run_pipeline(e2e_server, tmp_path):
"""Test run_pipeline() library API."""
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
output_dir=str(tmp_path),
)
result = run_pipeline(cfg)
assert result.success
Service API E2E Test¶
@pytest.mark.e2e
def test_service_run(e2e_server, tmp_path):
"""Test service.run() API."""
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
output_dir=str(tmp_path),
)
result = service.run(cfg)
assert result.success
Full Pipeline with ML¶
@pytest.mark.e2e
@pytest.mark.ml_models
def test_full_pipeline_with_summarization(e2e_server, tmp_path):
"""Test complete pipeline with real ML models."""
require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
output_dir=str(tmp_path),
generate_summaries=True,
summary_model=config.TEST_DEFAULT_SUMMARY_MODEL,
)
result = run_pipeline(cfg)
assert result.success
# Verify summary was generated
Test Modes¶
E2E tests support different modes via the E2E_TEST_MODE environment variable (set by the Makefile). Mode controls which feeds are allowed and whether fast or full fixtures are used; see E2E Feeds (RSS) and Allowed feeds per test mode.
| Mode | Episodes | Fixtures | Use Case |
|---|---|---|---|
fast |
1 per test (via monkeypatch) | Fast | Quick feedback, critical path |
multi_episode |
No limit (e.g. 5) | Fast | Full validation |
nightly |
No limit (e.g. 15 across p01–p05) | Full | Nightly suite |
data_quality |
Multiple, all mock data | Full | Data quality / nightly |
Markers can override effective mode: tests marked @pytest.mark.nightly use nightly when E2E_TEST_MODE is unset; tests marked @pytest.mark.critical_path use fast when unset.
# Run with multi-episode mode
E2E_TEST_MODE=multi_episode make test-e2e
# Run fast E2E (critical path only, 1 episode per test)
make test-e2e-fast
Test Files¶
| Purpose | Test File |
|---|---|
| Network guard | test_network_guard.py |
| OpenAI mocking | test_openai_mock.py |
| E2E server | test_e2e_server.py |
| Fixture mapping | test_fixture_mapping.py |
| Basic workflows | test_basic_e2e.py |
| CLI commands | test_cli_e2e.py |
| Library API | test_library_api_e2e.py |
| Service API | test_service_api_e2e.py |
| Whisper | test_whisper_e2e.py |
| ML models | test_ml_models_e2e.py |
| Error handling | test_error_handling_e2e.py |
| Edge cases | test_edge_cases_e2e.py |
| HTTP behaviors | test_http_behaviors_e2e.py |
| Ollama providers | test_ollama_provider_integration_e2e.py |
Running E2E Tests¶
# All E2E tests
make test-e2e
# Fast (excludes ml_models)
make test-e2e-fast
# Sequential (for debugging)
pytest tests/e2e/ -n 0
# Specific test file
pytest tests/e2e/test_basic_e2e.py -v -m e2e --disable-socket --allow-hosts=127.0.0.1,localhost
Test Markers¶
@pytest.mark.e2e- Required for all E2E tests@pytest.mark.ml_models- Tests requiring real ML models-
@pytest.mark.critical_path- Critical path tests (run in fast suite). See Critical Path Testing Guide -
@pytest.mark.multi_episode- Multi-episode tests @pytest.mark.data_quality- Data quality tests (nightly)
Provider Testing¶
For provider-specific E2E testing (E2E server endpoints, full pipeline with providers):
→ Provider Implementation Guide - Testing Your Provider
Covers:
- E2E server mock endpoint implementation
- Provider works in full pipeline
- Multiple providers work together
- E2E test checklist for new providers
Real API Testing (Manual Mode)¶
Some providers support real API testing for manual validation:
Ollama (Local Server):
# Prerequisites: Ollama installed and running
ollama serve # Start server
ollama pull llama3.3:latest # Pull models
# Run tests with real Ollama
USE_REAL_OLLAMA_API=1 \
pytest tests/e2e/test_ollama_provider_integration_e2e.py -v
OpenAI/Gemini (Cloud APIs):
# Set environment variable to use real APIs
USE_REAL_OPENAI_API=1 pytest tests/e2e/test_openai_provider_integration_e2e.py
USE_REAL_GEMINI_API=1 pytest tests/e2e/test_gemini_provider_integration_e2e.py
Note: Real API mode preserves test output for inspection and will incur costs for cloud APIs. See Ollama Provider Guide for detailed Ollama setup and troubleshooting.
Coverage Targets¶
- Total tests: ~230
- Focus: Complete user workflows, production-like scenarios