E2E Testing Guide¶

See also:

Testing Strategy - High-level testing philosophy and test pyramid

Testing Guide - Quick reference and test execution commands

RSS and feed ingestion - Production RSS path (HTTP, parsing, episode selection); contrasts with local e2e_server fixture feeds below

This guide covers pytest E2E test implementation: real HTTP client, E2E server, ML model usage, and OpenAI mock endpoints.

For where Playwright fits in the overall strategy (pyramid, CI jobs, pytest vs browser), see Testing Strategy — Browser UI E2E (Playwright).

Browser E2E (Playwright)¶

The GI/KG Vue viewer (web/gi-kg-viewer) uses Playwright (TypeScript, Firefox), not pytest. This section summarizes the browser stack only; everything below Overview in this file remains pytest E2E.

Topic	Detail
Run from repo root	`make test-ui-e2e` (`npm install`, `playwright install firefox`, `npm run test:e2e`)
Run in package	`cd web/gi-kg-viewer && npm run test:e2e`
Config	`web/gi-kg-viewer/playwright.config.ts` — `testDir: ./e2e`, `webServer` runs Vite on 127.0.0.1:5174 with `reuseExistingServer: true` so a dev server already bound to that port is reused (helps when `CI=true` is set locally and would otherwise force a second strictPort bind)
Specs	`web/gi-kg-viewer/e2e/*.spec.ts` (+ `fixtures.ts`, `helpers.ts`)
Surface map	E2E_SURFACE_MAP.md — surfaces, fixtures, stable Playwright selectors (update with UI/E2E changes)
CI	Workflow job `viewer-e2e` (same commands as `make test-ui-e2e`)
vs pytest E2E	pytest proves CLI/pipeline + `e2e_server`; Playwright proves browser UX (graph shell, search UI, a11y paths)
vs FastAPI unit tests	`tests/unit/podcast_scraper/server/test_viewer_.py` cover `/api/` JSON contracts; use Playwright when behavior depends on the SPA**
vs Vitest	`web/gi-kg-viewer/src/utils/.test.ts` cover pure TS logic* (parsing, merge, metrics); `make test-ui` (~150 ms, no browser). Use Playwright for rendered UI behavior

Debugging UI issues and interpreting failures¶

The surface map is the shared contract for accessible names, regions, and user entry paths. When a Playwright assertion fails, when you reproduce a bug manually, or when an agent drives the app via Chrome DevTools MCP or Playwright MCP (a11y snapshots), use E2E_SURFACE_MAP.md to see what label or region should appear, which spec owns the surface, and how to disambiguate controls that share a visible name. It does not replace UXS for visual design. For the full agent-browser workflow (symmetry between reproduction and fix validation), see Agent-Browser Closed Loop Guide.

When you change viewer UX (required workflow)¶

Applies to humans and AI agents editing web/gi-kg-viewer/ (Vue UI: copy, layout, routes, theme tokens, accessible names, or flows that Playwright exercises). Do not ship UI-only PRs without walking this list in order:

e2e/E2E_SURFACE_MAP.md — Update if anything E2E-visible or selector-related changed (including getByRole strings, #search-q, .graph-canvas, file-picker vs list flows).
Playwright — Update e2e/*.spec.ts, helpers.ts, and/or fixtures.ts; run make test-ui-e2e.
docs/uxs/ — Update VIEWER_IA.md when shell information architecture changes (regions, navigation axes, persistence, clearing, first-run). Update UXS-001 when shared tokens, typography, or shell-wide visual rules change; update the relevant feature UXS (Digest, Library, Graph, Search, Dashboard, …) when a surface-specific visual contract changes, even if tests still pass. After merge, Active UXS should describe the shipped viewer for that release (see UX specifications index — Living documents and ship boundary).

Also documented in DEVELOPMENT_GUIDE.md (GI / KG browser viewer), TESTING_GUIDE.md (Browser E2E), UX specifications index, .cursorrules (GI/KG viewer UX), and .ai-coding-guidelines.md (GI/KG browser viewer).

Further reading: Polyglot repository guide (root vs web/gi-kg-viewer/), Testing Guide — Browser E2E, ADR-066, web/gi-kg-viewer/README.md.

Overview¶

E2E tests test complete user workflows with real implementations. No mocking allowed (except network isolation).

Aspect	Requirement
Speed	< 60 seconds per test
Scope	Complete user workflow
Entry points	CLI commands, `run_pipeline()`, `service.run()`
HTTP	Real client with local E2E server
Filesystem	Real file operations
ML Models	Real (Whisper, spaCy, Transformers) - NO mocks

Manual CLI runs against the fixture server¶

For human multi-feed checks without real RSS, use the same HTTP handler as pytest’s e2e_server:

From repo root (venv on PYTHONPATH includes repo root so tests.e2e resolves): make serve-e2e-mock (default port 18765; override with E2E_MOCK_PORT).
In another terminal: python -m podcast_scraper.cli --profile <preset> --config your_operator.yaml --feeds-spec path/to/your_fixture_feeds.yaml (add --output-dir if not already in the operator YAML). Same three-way split as production runs; see CLI.md — Quick Start.

That feeds document should list the five primary mock feeds (podcast1–podcast5) plus long-form fixtures podcast7_sustainability, podcast8_solar, and podcast9_solo (p07–p09; p06 edge-case feed is intentionally omitted), each at http://127.0.0.1:<port>/feeds/.../feed.xml (E2E_MOCK_PORT, default 18765). This is not the same contract as CI pytest E2E (no network guard, you choose ML cost); it reuses fixture XML/audio only.

Core Principle: No Mocking¶

E2E tests use real implementations throughout:

Real HTTP client (with local server)
Real filesystem I/O
Real ML models (Whisper, spaCy, Transformers)
Real providers (MLProvider, OpenAIProvider)
No external network (blocked by network guard)
No Whisper mocks
No ML model mocks

E2E Server¶

Ports: Pytest’s e2e_server binds an ephemeral port (not fixed). The FastAPI app from make serve-api defaults to 8000. For manual runs, the same fixture HTTP handler is exposed on a fixed port via make serve-e2e-mock, default 18765 (E2E_MOCK_PORT in the Makefile), so the RSS/mock API server can run alongside serve-api without colliding.

The e2e_server fixture provides a local HTTP server serving test fixtures:

def test_basic_workflow(e2e_server):
    # Get URLs for test resources
    rss_url = e2e_server.urls.feed("podcast1")
    audio_url = e2e_server.urls.audio("p01_e01")
    transcript_url = e2e_server.urls.transcript("p01_e01")

    # Run complete workflow
    result = run_pipeline(rss_url, output_dir)
    assert result.success

Available URLs¶

Method	Returns
`e2e_server.urls.feed(podcast_name)`	RSS feed URL (e.g. `/feeds/podcast1/feed.xml`)
`e2e_server.urls.audio(episode_id)`	Audio file URL (e.g. `/audio/p01_e01.mp3`)
`e2e_server.urls.transcript(episode_id)`	Transcript URL (e.g. `/transcripts/p01_e01.txt`)
`e2e_server.urls.base()`	Server base URL
`e2e_server.urls.openai_api_base()`	OpenAI mock API base (`/v1`)
`e2e_server.urls.gemini_api_base()`	Gemini mock API base (`/v1beta`)
`e2e_server.urls.mistral_api_base()`	Mistral mock API base (`/v1`)
`e2e_server.urls.grok_api_base()`	Grok mock API base (`/v1`)
`e2e_server.urls.deepseek_api_base()`	DeepSeek mock API base (`/v1`)
`e2e_server.urls.ollama_api_base()`	Ollama mock API base (`/v1`)
`e2e_server.urls.anthropic_api_base()`	Anthropic mock API base (base URL, no `/v1`)

Download resilience E2E¶

tests/e2e/test_download_resilience_e2e.py

Transient HTTP on transcript URLs (set_transient_error with fail_count), plus permanent set_error_behavior.
fetch_url / downloader retry totals (configure_downloader, http_retry_total on Config).
Single-feed pipeline: run.json may include failure_summary when some episodes fail (see test_partial_failure_produces_summary).
Multi-feed isolation when one RSS feed is broken: second feed still runs; with multi_feed_strict=True the batch reports failure (test_one_feed_down_others_continue).

tests/e2e/test_multi_feed_resilience_e2e.py (GitHub #560, offline only)

corpus_run_summary.json at the corpus parent: per-feed ok, error, failure_kind (soft vs hard, #559), overall_ok, schema 1.1.0 with batch_incidents (rollup of corpus_incidents.jsonl for that batch) and per-feed episode_incidents_unique so episodes_processed: 0 with ok: true is not read as “no issues.”
Lenient default vs multi_feed_strict / --multi-feed-strict: service and CLI exit semantics when all failures are soft-classified (RSS HTTP errors, unknown slug 404, wrong path under /feeds/...).
Unknown slug and wrong filename under a known feed (both 404 on the mock server, no DNS).
Transient RSS 503 on one feed’s feed.xml with RSS retries; batch overall_ok true when retries succeed.
Corpus lock: pre-acquire LOCK_BASENAME, assert a blocked service.run, then success after release.
Multi_episode mode (E2E_TEST_MODE=multi_episode, not fast): two feeds, max_episodes greater than 1, transcript 404 on a shared fixture path; asserts per-feed metrics.json skipped counts and matching run.json metrics.episodes_skipped_total (skipped transcript is not always a run-index failure, so failure_summary may be absent).

Handler API: E2EHTTPRequestHandler.set_transient_error(path, status=..., fail_count=...) and set_error_behavior(path, status=...). See CONFIGURATION.md — Download resilience.

Fast vs multi_episode: tests marked critical_path run under make test-e2e-fast (E2E_TEST_MODE=fast). The multi-episode partial-failure case above skips when E2E_TEST_MODE=fast; run make test-e2e (multi_episode) for full coverage.

E2E Feeds (RSS)¶

Feed names and RSS file mapping. Which feed name you can use depends on test mode (see Test Modes). For how the real pipeline fetches and parses RSS (retries, conditional GET, circuit breaker, multi-feed), see RSS and feed ingestion.

Full fixtures (used in nightly mode; mapping from PODCAST_RSS_MAP):

Feed name	RSS file	Description
`podcast1`	`p01_mtb.xml`	Main podcast (MTB)
`podcast2`	`p02_software.xml`	Software podcast
`podcast3`	`p03_scuba.xml`	Scuba podcast
`podcast4`	`p04_photo.xml`	Photo podcast
`podcast5`	`p05_investing.xml`	Investing podcast
`edgecases`	`p06_edge_cases.xml`	Edge-case episodes
`podcast1_multi_episode`	`p01_multi.xml`	5 short episodes (multi-episode tests)
`podcast1_episode_selection`	`p01_episode_selection.xml`	3 items, newest-first, all Path 1 transcripts (#521)
`podcast9_solo`	`p09_biohacking.xml`	Solo speaker (host only)
`podcast7_sustainability`	`p07_sustainability.xml`	Long-form (~15k words; Issue #283)
`podcast8_solar`	`p08_solar.xml`	Long-form (~20k words; Issue #283)

Fast fixtures (used in fast and multi_episode when set_use_fast_fixtures(True); mapping from PODCAST_RSS_MAP_FAST):

Feed name	RSS file	Description
`podcast1`	`p01_fast.xml`	1 short episode (Path 2: transcription)
`podcast1_with_transcript`	`p01_fast_with_transcript.xml`	1 episode with transcript URL (Path 1: download)
`podcast1_multi_episode`	`p01_multi.xml`	Same 5-episode feed
`podcast1_episode_selection`	`p01_episode_selection.xml`	Same as full map (episode selection E2E)
`podcast9_solo`	`p09_biohacking.xml`	Solo speaker
`podcast7_sustainability`	`p07_sustainability.xml`	Long-form
`podcast8_solar`	`p08_solar.xml`	Long-form

Allowed feeds per test mode¶

Set automatically by conftest from E2E_TEST_MODE.

Mode	Allowed feed names
`fast`	`podcast1`, `podcast1_with_transcript`, `podcast1_multi_episode`, `podcast1_episode_selection`, `podcast9_solo`, `podcast7_sustainability`, `podcast8_solar`
`multi_episode`	`podcast1_multi_episode`, `podcast1_episode_selection`, `podcast1_with_transcript`, `edgecases`, `podcast7_sustainability`, `podcast8_solar`
`nightly`	`podcast1`, `podcast2`, `podcast3`, `podcast4`, `podcast5`, `podcast1_episode_selection` (full fixtures)

Use e2e_server.urls.feed("podcast1_multi_episode") or e2e_server.urls.feed("podcast1_episode_selection") etc. Only feeds in the allowed set for the current mode are served; others return 404.

E2E Server Options¶

The e2e_server fixture (and the handler class) support these options for controlling behavior:

Error injection (chaos / failure testing):

Method	Description
`e2e_server.set_error_behavior(url_path, status, delay=0.0)`	For a given path (e.g. `"/audio/p01_multi_e03.mp3"`), return HTTP `status` (e.g. 404, 500). Optional `delay` in seconds.
`e2e_server.clear_error_behavior(url_path)`	Remove error behavior for that path.
`e2e_server.reset()`	Clear all error behaviors and set allowed podcasts to None.

Example: simulate 404 on audio so the run index records a failed episode:

e2e_server.set_error_behavior("/audio/p01_multi_e03.mp3", 404)
# ... run pipeline ...
# assert index.json has one failed episode with error_type, error_message, error_stage
e2e_server.clear_error_behavior("/audio/p01_multi_e03.mp3")

Allowed podcasts (advanced):

Method	Description
`e2e_server.set_allowed_podcasts(podcasts)`	Restrict which feed names are served. `podcasts`: set of names or `None` for all. Normally set by conftest from `E2E_TEST_MODE`.

Fixture mode:

When fast fixtures are on, feeds resolve via PODCAST_RSS_MAP_FAST (e.g. podcast1 → p01_fast.xml).
When off (e.g. nightly mode), feeds use PODCAST_RSS_MAP (e.g. podcast1 → p01_mtb.xml).
Conftest sets this from E2E_TEST_MODE; teardown clears error behaviors and resets fast-fixtures mode.

Served Content¶

Content is served from tests/fixtures/:

RSS feeds: tests/fixtures/rss/*.xml
Audio files: tests/fixtures/audio/*.mp3
Transcripts: tests/fixtures/transcripts/*.txt

OpenAI Mock Endpoints¶

For API providers (OpenAI), the E2E server provides mock endpoints:

def test_openai_provider(e2e_server):
    cfg = Config(
        rss_url=e2e_server.urls.feed("podcast1"),
        transcription_provider="openai",
        openai_api_key="sk-test123",
        openai_api_base=e2e_server.urls.openai_api_base(),  # Use mock
    )
    result = run_pipeline(cfg)
    assert result.success

Mock Endpoints¶

Endpoint	Purpose
`/v1/chat/completions`	Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment)
`/v1/audio/transcriptions`	Transcription
`/v1/messages` (Anthropic)	Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment)
`/v1beta/models/{model}:generateContent` (Gemini)	Summarization, speaker detection, GIL evidence (extract_quotes, score_entailment)

See tests/e2e/fixtures/e2e_http_server.py for implementation.

ML Model Usage¶

E2E tests use real ML models - no mocking allowed.

Test Model Defaults¶

Tests use smaller, faster models for speed:

Component	Test Model	Production Model
Whisper	`tiny.en`	`base.en`
spaCy	`en_core_web_sm`	`en_core_web_sm`
Transformers MAP	`facebook/bart-base`	`facebook/bart-large-cnn`
Transformers REDUCE	`allenai/led-base-16384`	`allenai/led-large-16384`

Model Cache Requirements¶

Tests require models to be pre-cached:

# Preload all required models

make preload-ml-models

Use cache helpers to skip gracefully if not cached:

from tests.integration.ml_model_cache_helpers import (
    require_whisper_model_cached,
    require_transformers_model_cached,
)

def test_with_real_models(e2e_server):
    require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
    require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
    # Test with real models...

Network Guard¶

E2E tests use network isolation to prevent external calls:

pytest tests/e2e/ --disable-socket --allow-hosts=127.0.0.1,localhost

If a test attempts external network access:

SocketBlockedError: A]socket.socket call was blocked

Test Patterns¶

CLI E2E Test¶

@pytest.mark.e2e
def test_cli_transcript_download(e2e_server, tmp_path):
    """Test CLI transcript download command."""
    rss_url = e2e_server.urls.feed("podcast1_with_transcript")

    result = subprocess.run([
        "podcast-scraper", rss_url,
        "--output-dir", str(tmp_path),
    ], capture_output=True)

    assert result.returncode == 0
    assert (tmp_path / "0001 - Episode 1.txt").exists()

Library API E2E Test¶

@pytest.mark.e2e
def test_run_pipeline(e2e_server, tmp_path):
    """Test run_pipeline() library API."""
    cfg = Config(
        rss_url=e2e_server.urls.feed("podcast1"),
        output_dir=str(tmp_path),
    )
    result = run_pipeline(cfg)
    assert result.success

Service API E2E Test¶

@pytest.mark.e2e
def test_service_run(e2e_server, tmp_path):
    """Test service.run() API."""
    cfg = Config(
        rss_url=e2e_server.urls.feed("podcast1"),
        output_dir=str(tmp_path),
    )
    result = service.run(cfg)
    assert result.success

Full Pipeline with ML¶

@pytest.mark.e2e
@pytest.mark.ml_models
def test_full_pipeline_with_summarization(e2e_server, tmp_path):
    """Test complete pipeline with real ML models."""
    require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
    require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)

    cfg = Config(
        rss_url=e2e_server.urls.feed("podcast1"),
        output_dir=str(tmp_path),
        generate_summaries=True,
        summary_model=config.TEST_DEFAULT_SUMMARY_MODEL,
    )
    result = run_pipeline(cfg)
    assert result.success
    # Verify summary was generated

Test Modes¶

E2E tests support different modes via the E2E_TEST_MODE environment variable (set by the Makefile). Mode controls which feeds are allowed and whether fast or full fixtures are used; see E2E Feeds (RSS) and Allowed feeds per test mode.

Mode	Episodes	Fixtures	Use Case
`fast`	1 per test (via monkeypatch)	Fast	Quick feedback, critical path
`multi_episode`	No limit (e.g. 5)	Fast	Full validation
`nightly`	No limit (e.g. 15 across p01–p05)	Full	Nightly suite

Markers can override effective mode: tests marked @pytest.mark.nightly use nightly when E2E_TEST_MODE is unset; tests marked @pytest.mark.critical_path use fast when unset.

# Run with multi-episode mode
E2E_TEST_MODE=multi_episode make test-e2e

# Run fast E2E (critical path only, 1 episode per test)
make test-e2e-fast

`make test-fast` / `make ci-fast` and E2E progress¶

The Makefile runs two pytest passes for critical-path E2E: tests without @pytest.mark.ml_models use parallel workers (-n); tests with ml_models run sequentially (-n 1). That avoids pytest-xdist showing a long flat progress bar while a single worker runs Whisper, spaCy, or Transformers (it looked like a hang around 70–80% even though work was still running). The ML phase can still take many minutes on CPU; ensure the Whisper test model is cached (make preload-ml-models or CI cache) so runs fail fast instead of downloading.

make test-e2e-fast uses the same split (not ml_models then ml_models).

Test Files¶

Purpose	Test File
Network guard	`test_network_guard.py`
OpenAI mocking	`test_openai_mock.py`
E2E server	`test_e2e_server.py`
Fixture mapping	`test_fixture_mapping.py`
Basic workflows	`test_basic_e2e.py`
CLI commands	`test_cli_e2e.py`
Library API	`test_library_api_e2e.py`
Service API	`test_service_api_e2e.py`
Whisper	`test_whisper_e2e.py`
ML models	`test_ml_models_e2e.py`
Error handling	`test_error_handling_e2e.py`
Edge cases	`test_edge_cases_e2e.py`
HTTP behaviors	`test_http_behaviors_e2e.py`
Ollama providers	`test_ollama_provider_integration_e2e.py`

Running E2E Tests¶

# All E2E tests

make test-e2e

# Fast critical path (parallel non-ML, then sequential ML; see Test Modes above)

make test-e2e-fast

# Sequential (for debugging)

pytest tests/e2e/ -n 0

# Specific test file

pytest tests/e2e/test_basic_e2e.py -v -m e2e --disable-socket --allow-hosts=127.0.0.1,localhost

Test Markers¶

@pytest.mark.e2e -- Required for all E2E tests
@pytest.mark.ml_models -- Tests requiring real ML models (E2E only)
@pytest.mark.critical_path -- Critical path tests (run in fast suite). See Critical Path Testing Guide

@pytest.mark.ml_models belongs only on E2E tests. make check-test-policy (rule I1) enforces that integration tests do not carry this marker.

@pytest.mark.multi_episode - Multi-episode tests

Provider Testing¶

For provider-specific E2E testing (E2E server endpoints, full pipeline with providers):

→ Provider Implementation Guide - Testing Your Provider

Covers:

E2E server mock endpoint implementation
Provider works in full pipeline
Multiple providers work together
E2E test checklist for new providers

Real API Testing (Manual Mode)¶

Some providers support real API testing for manual validation:

Ollama (Local Server):

# Prerequisites: Ollama installed and running
ollama serve  # Start server
ollama pull llama3.3:latest  # Pull models

# Run tests with real Ollama
USE_REAL_OLLAMA_API=1 \
pytest tests/e2e/test_ollama_provider_integration_e2e.py -v

OpenAI/Gemini (Cloud APIs):

# Set environment variable to use real APIs
USE_REAL_OPENAI_API=1 pytest tests/e2e/test_openai_provider_integration_e2e.py
USE_REAL_GEMINI_API=1 pytest tests/e2e/test_gemini_provider_integration_e2e.py

Note: Real API mode preserves test output for inspection and will incur costs for cloud APIs. See Ollama Provider Guide for detailed Ollama setup and troubleshooting.

Coverage Targets¶

Total tests: ~230
Focus: Complete user workflows, production-like scenarios
Line coverage (pytest E2E): Full podcast_scraper package in the coverage denominator (same pyproject.toml [tool.coverage.run] as other tiers; no subtree omit file). Threshold and CI wiring: Testing Guide — coverage thresholds. Roles of pytest E2E vs HTTP integration vs Playwright: Testing Strategy — layer roles.