E2E Testing Guide¶
See also:
- Testing Strategy - High-level testing philosophy and test pyramid
- Testing Guide - Quick reference and test execution commands
- RSS and feed ingestion - Production RSS path (HTTP, parsing, episode selection); contrasts with local
e2e_serverfixture feeds below
This guide covers pytest E2E test implementation: real HTTP client, E2E server, ML model usage, and OpenAI mock endpoints.
For where Playwright fits in the overall strategy (pyramid, CI jobs, pytest vs browser), see Testing Strategy — Browser UI E2E (Playwright).
Browser E2E (Playwright)¶
The GI/KG Vue viewer (web/gi-kg-viewer) uses Playwright (TypeScript, Firefox), not
pytest. This section summarizes the browser stack only; everything below Overview in this
file remains pytest E2E.
| Topic | Detail |
|---|---|
| Run from repo root | make test-ui-e2e (npm install, playwright install firefox, npm run test:e2e) |
| Run in package | cd web/gi-kg-viewer && npm run test:e2e |
| Config | web/gi-kg-viewer/playwright.config.ts — testDir: ./e2e, webServer runs Vite on 127.0.0.1:5174 with reuseExistingServer: true so a dev server already bound to that port is reused (helps when CI=true is set locally and would otherwise force a second strictPort bind) |
| Specs | web/gi-kg-viewer/e2e/*.spec.ts (+ fixtures.ts, helpers.ts) |
| Surface map | E2E_SURFACE_MAP.md — surfaces, fixtures, stable Playwright selectors (update with UI/E2E changes) |
| CI | Workflow job viewer-e2e (same commands as make test-ui-e2e) |
| vs pytest E2E | pytest proves CLI/pipeline + e2e_server; Playwright proves browser UX (graph shell, search UI, a11y paths) |
| vs FastAPI unit tests | tests/unit/podcast_scraper/server/test_viewer_*.py cover /api/* JSON contracts; use Playwright when behavior depends on the SPA |
| vs Vitest | web/gi-kg-viewer/src/utils/*.test.ts cover pure TS logic (parsing, merge, metrics); make test-ui (~150 ms, no browser). Use Playwright for rendered UI behavior |
Debugging UI issues and interpreting failures¶
The surface map is the shared contract for accessible names, regions, and user entry paths. When a Playwright assertion fails, when you reproduce a bug manually, or when an agent drives the app via Chrome DevTools MCP or Playwright MCP (a11y snapshots), use E2E_SURFACE_MAP.md to see what label or region should appear, which spec owns the surface, and how to disambiguate controls that share a visible name. It does not replace UXS for visual design. For the full agent-browser workflow (symmetry between reproduction and fix validation), see Agent-Browser Closed Loop Guide.
When you change viewer UX (required workflow)¶
Applies to humans and AI agents editing web/gi-kg-viewer/ (Vue UI: copy, layout, routes,
theme tokens, accessible names, or flows that Playwright exercises). Do not ship UI-only PRs
without walking this list in order:
e2e/E2E_SURFACE_MAP.md— Update if anything E2E-visible or selector-related changed (includinggetByRolestrings,#search-q,.graph-canvas, file-picker vs list flows).- Playwright — Update
e2e/*.spec.ts,helpers.ts, and/orfixtures.ts; runmake test-ui-e2e. docs/uxs/— Update VIEWER_IA.md when shell information architecture changes (regions, navigation axes, persistence, clearing, first-run). Update UXS-001 when shared tokens, typography, or shell-wide visual rules change; update the relevant feature UXS (Digest, Library, Graph, Search, Dashboard, …) when a surface-specific visual contract changes, even if tests still pass. After merge, Active UXS should describe the shipped viewer for that release (see UX specifications index — Living documents and ship boundary).
Also documented in DEVELOPMENT_GUIDE.md (GI / KG browser viewer),
TESTING_GUIDE.md (Browser E2E), UX specifications index,
.cursorrules (GI/KG viewer UX), and .ai-coding-guidelines.md (GI/KG browser viewer).
Further reading: Polyglot repository guide (root vs web/gi-kg-viewer/),
Testing Guide — Browser E2E,
ADR-066,
web/gi-kg-viewer/README.md.
Overview¶
E2E tests test complete user workflows with real implementations. No mocking allowed (except network isolation).
| Aspect | Requirement |
|---|---|
| Speed | < 60 seconds per test |
| Scope | Complete user workflow |
| Entry points | CLI commands, run_pipeline(), service.run() |
| HTTP | Real client with local E2E server |
| Filesystem | Real file operations |
| ML Models | Real (Whisper, spaCy, Transformers) - NO mocks |
Manual CLI runs against the fixture server¶
For human multi-feed checks without real RSS, use the same HTTP handler as pytest’s e2e_server:
- From repo root (venv on
PYTHONPATHincludes repo root sotests.e2eresolves):make serve-e2e-mock(default port 18765; override withE2E_MOCK_PORT). - In another terminal:
python -m podcast_scraper.cli --profile <preset> --config your_operator.yaml --feeds-spec path/to/your_fixture_feeds.yaml(add--output-dirif not already in the operator YAML). Same three-way split as production runs; see CLI.md — Quick Start.
That feeds document should list the five primary mock feeds (podcast1–podcast5) plus long-form
fixtures podcast7_sustainability, podcast8_solar, and podcast9_solo (p07–p09;
p06 edge-case feed is intentionally omitted), each at http://127.0.0.1:<port>/feeds/.../feed.xml
(E2E_MOCK_PORT, default 18765). This is not the same contract as CI pytest E2E (no network guard, you
choose ML cost); it reuses fixture XML/audio only.
Core Principle: No Mocking¶
E2E tests use real implementations throughout:
- Real HTTP client (with local server)
- Real filesystem I/O
- Real ML models (Whisper, spaCy, Transformers)
- Real providers (MLProvider, OpenAIProvider)
- No external network (blocked by network guard)
- No Whisper mocks
- No ML model mocks
E2E Server¶
Ports: Pytest’s e2e_server binds an ephemeral port (not fixed). The
FastAPI app from make serve-api defaults to 8000. For manual runs, the
same fixture HTTP handler is exposed on a fixed port via
make serve-e2e-mock, default 18765 (E2E_MOCK_PORT in the Makefile),
so the RSS/mock API server can run alongside serve-api without colliding.
The e2e_server fixture provides a local HTTP server serving test fixtures:
def test_basic_workflow(e2e_server):
# Get URLs for test resources
rss_url = e2e_server.urls.feed("podcast1")
audio_url = e2e_server.urls.audio("p01_e01")
transcript_url = e2e_server.urls.transcript("p01_e01")
# Run complete workflow
result = run_pipeline(rss_url, output_dir)
assert result.success
Available URLs¶
| Method | Returns |
|---|---|
e2e_server.urls.feed(podcast_name) |
RSS feed URL (e.g. /feeds/podcast1/feed.xml) |
e2e_server.urls.audio(episode_id) |
Audio file URL (e.g. /audio/p01_e01.mp3) |
e2e_server.urls.transcript(episode_id) |
Transcript URL (e.g. /transcripts/p01_e01.txt) |
e2e_server.urls.base() |
Server base URL |
e2e_server.urls.openai_api_base() |
OpenAI mock API base (/v1) |
e2e_server.urls.gemini_api_base() |
Gemini mock API base (/v1beta) |
e2e_server.urls.mistral_api_base() |
Mistral mock API base (/v1) |
e2e_server.urls.grok_api_base() |
Grok mock API base (/v1) |
e2e_server.urls.deepseek_api_base() |
DeepSeek mock API base (/v1) |
e2e_server.urls.ollama_api_base() |
Ollama mock API base (/v1) |
e2e_server.urls.anthropic_api_base() |
Anthropic mock API base (base URL, no /v1) |
Download resilience E2E¶
tests/e2e/test_download_resilience_e2e.py
- Transient HTTP on transcript URLs (
set_transient_errorwithfail_count), plus permanentset_error_behavior. fetch_url/ downloader retry totals (configure_downloader,http_retry_totalonConfig).- Single-feed pipeline:
run.jsonmay includefailure_summarywhen some episodes fail (seetest_partial_failure_produces_summary). - Multi-feed isolation when one RSS feed is broken: second feed still runs; with
multi_feed_strict=Truethe batch reports failure (test_one_feed_down_others_continue).
tests/e2e/test_multi_feed_resilience_e2e.py (GitHub #560, offline only)
corpus_run_summary.jsonat the corpus parent: per-feedok,error,failure_kind(soft vs hard, #559),overall_ok, schema1.1.0withbatch_incidents(rollup ofcorpus_incidents.jsonlfor that batch) and per-feedepisode_incidents_uniquesoepisodes_processed: 0withok: trueis not read as “no issues.”- Lenient default vs
multi_feed_strict/--multi-feed-strict: service and CLI exit semantics when all failures are soft-classified (RSS HTTP errors, unknown slug 404, wrong path under/feeds/...). - Unknown slug and wrong filename under a known feed (both 404 on the mock server, no DNS).
- Transient RSS 503 on one feed’s
feed.xmlwith RSS retries; batchoverall_oktrue when retries succeed. - Corpus lock: pre-acquire
LOCK_BASENAME, assert a blockedservice.run, then success after release. - Multi_episode mode (
E2E_TEST_MODE=multi_episode, not fast): two feeds,max_episodesgreater than 1, transcript 404 on a shared fixture path; asserts per-feedmetrics.jsonskipped counts and matchingrun.jsonmetrics.episodes_skipped_total(skipped transcript is not always a run-index failure, sofailure_summarymay be absent).
Handler API: E2EHTTPRequestHandler.set_transient_error(path, status=..., fail_count=...) and set_error_behavior(path, status=...). See CONFIGURATION.md — Download resilience.
Fast vs multi_episode: tests marked critical_path run under make test-e2e-fast (E2E_TEST_MODE=fast). The multi-episode partial-failure case above skips when E2E_TEST_MODE=fast; run make test-e2e (multi_episode) for full coverage.
E2E Feeds (RSS)¶
Feed names and RSS file mapping. Which feed name you can use depends on test mode (see Test Modes). For how the real pipeline fetches and parses RSS (retries, conditional GET, circuit breaker, multi-feed), see RSS and feed ingestion.
Full fixtures (used in nightly mode; mapping from PODCAST_RSS_MAP):
| Feed name | RSS file | Description |
|---|---|---|
podcast1 |
p01_mtb.xml |
Main podcast (MTB) |
podcast2 |
p02_software.xml |
Software podcast |
podcast3 |
p03_scuba.xml |
Scuba podcast |
podcast4 |
p04_photo.xml |
Photo podcast |
podcast5 |
p05_investing.xml |
Investing podcast |
edgecases |
p06_edge_cases.xml |
Edge-case episodes |
podcast1_multi_episode |
p01_multi.xml |
5 short episodes (multi-episode tests) |
podcast1_episode_selection |
p01_episode_selection.xml |
3 items, newest-first, all Path 1 transcripts (#521) |
podcast9_solo |
p09_biohacking.xml |
Solo speaker (host only) |
podcast7_sustainability |
p07_sustainability.xml |
Long-form (~15k words; Issue #283) |
podcast8_solar |
p08_solar.xml |
Long-form (~20k words; Issue #283) |
Fast fixtures (used in fast and multi_episode when set_use_fast_fixtures(True); mapping from PODCAST_RSS_MAP_FAST):
| Feed name | RSS file | Description |
|---|---|---|
podcast1 |
p01_fast.xml |
1 short episode (Path 2: transcription) |
podcast1_with_transcript |
p01_fast_with_transcript.xml |
1 episode with transcript URL (Path 1: download) |
podcast1_multi_episode |
p01_multi.xml |
Same 5-episode feed |
podcast1_episode_selection |
p01_episode_selection.xml |
Same as full map (episode selection E2E) |
podcast9_solo |
p09_biohacking.xml |
Solo speaker |
podcast7_sustainability |
p07_sustainability.xml |
Long-form |
podcast8_solar |
p08_solar.xml |
Long-form |
Allowed feeds per test mode¶
Set automatically by conftest from E2E_TEST_MODE.
| Mode | Allowed feed names |
|---|---|
fast |
podcast1, podcast1_with_transcript, podcast1_multi_episode, podcast1_episode_selection, podcast9_solo, podcast7_sustainability, podcast8_solar |
multi_episode |
podcast1_multi_episode, podcast1_episode_selection, podcast1_with_transcript, edgecases, podcast7_sustainability, podcast8_solar |
nightly |
podcast1, podcast2, podcast3, podcast4, podcast5, podcast1_episode_selection (full fixtures) |
Use e2e_server.urls.feed("podcast1_multi_episode") or e2e_server.urls.feed("podcast1_episode_selection") etc. Only feeds in the allowed set for the current mode are served; others return 404.
E2E Server Options¶
The e2e_server fixture (and the handler class) support these options for controlling behavior:
Error injection (chaos / failure testing):
| Method | Description |
|---|---|
e2e_server.set_error_behavior(url_path, status, delay=0.0) |
For a given path (e.g. "/audio/p01_multi_e03.mp3"), return HTTP status (e.g. 404, 500). Optional delay in seconds. |
e2e_server.clear_error_behavior(url_path) |
Remove error behavior for that path. |
e2e_server.reset() |
Clear all error behaviors and set allowed podcasts to None. |
Example: simulate 404 on audio so the run index records a failed episode:
e2e_server.set_error_behavior("/audio/p01_multi_e03.mp3", 404)
# ... run pipeline ...
# assert index.json has one failed episode with error_type, error_message, error_stage
e2e_server.clear_error_behavior("/audio/p01_multi_e03.mp3")
Allowed podcasts (advanced):
| Method | Description |
|---|---|
e2e_server.set_allowed_podcasts(podcasts) |
Restrict which feed names are served. podcasts: set of names or None for all. Normally set by conftest from E2E_TEST_MODE. |
Fixture mode:
- When fast fixtures are on, feeds resolve via
PODCAST_RSS_MAP_FAST(e.g.podcast1→p01_fast.xml). - When off (e.g. nightly mode), feeds use
PODCAST_RSS_MAP(e.g.podcast1→p01_mtb.xml). - Conftest sets this from
E2E_TEST_MODE; teardown clears error behaviors and resets fast-fixtures mode.
Served Content¶
Content is served from tests/fixtures/:
- RSS feeds:
tests/fixtures/rss/*.xml - Audio files:
tests/fixtures/audio/*.mp3 - Transcripts:
tests/fixtures/transcripts/*.txt
OpenAI Mock Endpoints¶
For API providers (OpenAI), the E2E server provides mock endpoints:
def test_openai_provider(e2e_server):
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
transcription_provider="openai",
openai_api_key="sk-test123",
openai_api_base=e2e_server.urls.openai_api_base(), # Use mock
)
result = run_pipeline(cfg)
assert result.success
Mock Endpoints¶
| Endpoint | Purpose |
|---|---|
/v1/chat/completions |
Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment) |
/v1/audio/transcriptions |
Transcription |
/v1/messages (Anthropic) |
Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment) |
/v1beta/models/{model}:generateContent (Gemini) |
Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment) |
See tests/e2e/fixtures/e2e_http_server.py for implementation.
ML Model Usage¶
E2E tests use real ML models - no mocking allowed.
Test Model Defaults¶
Tests use smaller, faster models for speed:
| Component | Test Model | Production Model |
|---|---|---|
| Whisper | tiny.en |
base.en |
| spaCy | en_core_web_sm |
en_core_web_sm |
| Transformers MAP | facebook/bart-base |
facebook/bart-large-cnn |
| Transformers REDUCE | allenai/led-base-16384 |
allenai/led-large-16384 |
Model Cache Requirements¶
Tests require models to be pre-cached:
# Preload all required models
make preload-ml-models
Use cache helpers to skip gracefully if not cached:
from tests.integration.ml_model_cache_helpers import (
require_whisper_model_cached,
require_transformers_model_cached,
)
def test_with_real_models(e2e_server):
require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
# Test with real models...
Network Guard¶
E2E tests use network isolation to prevent external calls:
pytest tests/e2e/ --disable-socket --allow-hosts=127.0.0.1,localhost
If a test attempts external network access:
SocketBlockedError: A]socket.socket call was blocked
Test Patterns¶
CLI E2E Test¶
@pytest.mark.e2e
def test_cli_transcript_download(e2e_server, tmp_path):
"""Test CLI transcript download command."""
rss_url = e2e_server.urls.feed("podcast1_with_transcript")
result = subprocess.run([
"podcast-scraper", rss_url,
"--output-dir", str(tmp_path),
], capture_output=True)
assert result.returncode == 0
assert (tmp_path / "0001 - Episode 1.txt").exists()
Library API E2E Test¶
@pytest.mark.e2e
def test_run_pipeline(e2e_server, tmp_path):
"""Test run_pipeline() library API."""
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
output_dir=str(tmp_path),
)
result = run_pipeline(cfg)
assert result.success
Service API E2E Test¶
@pytest.mark.e2e
def test_service_run(e2e_server, tmp_path):
"""Test service.run() API."""
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
output_dir=str(tmp_path),
)
result = service.run(cfg)
assert result.success
Full Pipeline with ML¶
@pytest.mark.e2e
@pytest.mark.ml_models
def test_full_pipeline_with_summarization(e2e_server, tmp_path):
"""Test complete pipeline with real ML models."""
require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
cfg = Config(
rss_url=e2e_server.urls.feed("podcast1"),
output_dir=str(tmp_path),
generate_summaries=True,
summary_model=config.TEST_DEFAULT_SUMMARY_MODEL,
)
result = run_pipeline(cfg)
assert result.success
# Verify summary was generated
Test Modes¶
E2E tests support different modes via the E2E_TEST_MODE environment variable (set by the Makefile). Mode controls which feeds are allowed and whether fast or full fixtures are used; see E2E Feeds (RSS) and Allowed feeds per test mode.
| Mode | Episodes | Fixtures | Use Case |
|---|---|---|---|
fast |
1 per test (via monkeypatch) | Fast | Quick feedback, critical path |
multi_episode |
No limit (e.g. 5) | Fast | Full validation |
nightly |
No limit (e.g. 15 across p01–p05) | Full | Nightly suite |
Markers can override effective mode: tests marked @pytest.mark.nightly use nightly when E2E_TEST_MODE is unset; tests marked @pytest.mark.critical_path use fast when unset.
# Run with multi-episode mode
E2E_TEST_MODE=multi_episode make test-e2e
# Run fast E2E (critical path only, 1 episode per test)
make test-e2e-fast
make test-fast / make ci-fast and E2E progress¶
The Makefile runs two pytest passes for critical-path E2E: tests without @pytest.mark.ml_models use parallel workers (-n); tests with ml_models run sequentially (-n 1). That avoids pytest-xdist showing a long flat progress bar while a single worker runs Whisper, spaCy, or Transformers (it looked like a hang around 70–80% even though work was still running). The ML phase can still take many minutes on CPU; ensure the Whisper test model is cached (make preload-ml-models or CI cache) so runs fail fast instead of downloading.
make test-e2e-fast uses the same split (not ml_models then ml_models).
Test Files¶
| Purpose | Test File |
|---|---|
| Network guard | test_network_guard.py |
| OpenAI mocking | test_openai_mock.py |
| E2E server | test_e2e_server.py |
| Fixture mapping | test_fixture_mapping.py |
| Basic workflows | test_basic_e2e.py |
| CLI commands | test_cli_e2e.py |
| Library API | test_library_api_e2e.py |
| Service API | test_service_api_e2e.py |
| Whisper | test_whisper_e2e.py |
| ML models | test_ml_models_e2e.py |
| Error handling | test_error_handling_e2e.py |
| Edge cases | test_edge_cases_e2e.py |
| HTTP behaviors | test_http_behaviors_e2e.py |
| Ollama providers | test_ollama_provider_integration_e2e.py |
Running E2E Tests¶
# All E2E tests
make test-e2e
# Fast critical path (parallel non-ML, then sequential ML; see Test Modes above)
make test-e2e-fast
# Sequential (for debugging)
pytest tests/e2e/ -n 0
# Specific test file
pytest tests/e2e/test_basic_e2e.py -v -m e2e --disable-socket --allow-hosts=127.0.0.1,localhost
Test Markers¶
@pytest.mark.e2e-- Required for all E2E tests@pytest.mark.ml_models-- Tests requiring real ML models (E2E only)@pytest.mark.critical_path-- Critical path tests (run in fast suite). See Critical Path Testing Guide
@pytest.mark.ml_models belongs only on E2E tests. make check-test-policy
(rule I1) enforces that integration tests do not carry this marker.
@pytest.mark.multi_episode- Multi-episode tests
Provider Testing¶
For provider-specific E2E testing (E2E server endpoints, full pipeline with providers):
→ Provider Implementation Guide - Testing Your Provider
Covers:
- E2E server mock endpoint implementation
- Provider works in full pipeline
- Multiple providers work together
- E2E test checklist for new providers
Real API Testing (Manual Mode)¶
Some providers support real API testing for manual validation:
Ollama (Local Server):
# Prerequisites: Ollama installed and running
ollama serve # Start server
ollama pull llama3.3:latest # Pull models
# Run tests with real Ollama
USE_REAL_OLLAMA_API=1 \
pytest tests/e2e/test_ollama_provider_integration_e2e.py -v
OpenAI/Gemini (Cloud APIs):
# Set environment variable to use real APIs
USE_REAL_OPENAI_API=1 pytest tests/e2e/test_openai_provider_integration_e2e.py
USE_REAL_GEMINI_API=1 pytest tests/e2e/test_gemini_provider_integration_e2e.py
Note: Real API mode preserves test output for inspection and will incur costs for cloud APIs. See Ollama Provider Guide for detailed Ollama setup and troubleshooting.
Coverage Targets¶
- Total tests: ~230
- Focus: Complete user workflows, production-like scenarios
- Line coverage (pytest E2E): Full
podcast_scraperpackage in the coverage denominator (samepyproject.toml[tool.coverage.run]as other tiers; no subtreeomitfile). Threshold and CI wiring: Testing Guide — coverage thresholds. Roles of pytest E2E vs HTTP integration vs Playwright: Testing Strategy — layer roles.