Integration Testing Guide¶

See also:

Testing Strategy - High-level testing philosophy and test pyramid

Testing Guide - Quick reference and test execution commands

This guide covers integration test implementation: what to mock vs use real, component interaction testing, and mocking guidelines.

Overview¶

Integration tests test component interactions with limited mocking. Use real internal implementations, mock external dependencies.

Aspect	Requirement
Speed	< 5 seconds per test
Scope	Multiple components working together
Internal implementations	Real (Config, factories, providers, workflow)
Filesystem	Real (temp directories)
External HTTP	Mocked (or local test server)
ML/AI models and APIs	Always mocked (real ML/AI is E2E only)

Mocking Philosophy¶

Always Mock¶

HTTP Requests (External Network)

@patch("podcast_scraper.rss.downloader.fetch_url")
def test_component_workflow(self, mock_fetch):
    mock_fetch.return_value = b"<rss>...</rss>"
    # Test component interactions

External API Calls (OpenAI, etc.)

```python @patch("podcast_scraper.providers.openai.openai_provider.OpenAI") def test_openai_provider_integration(self, mock_client):

   # Mock API client, test provider integration

Always Mock: ML/AI models and APIs¶

All ML models (Whisper, spaCy, Transformers) and all AI APIs (OpenAI, Gemini, Ollama, etc.) are always mocked in integration tests. Real ML inference and real API calls belong exclusively in E2E tests.

@pytest.mark.ml_models must not appear on integration tests. If a test loads a real ML model or calls a real AI API, it is an E2E test and belongs in tests/e2e/.

# Integration: mock ML, test component wiring
@pytest.mark.integration
def test_config_to_provider_creation(self):
    with patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper"):
        provider = create_transcription_provider(cfg)
        # Test provider creation, not ML execution

# Integration: mock summarization, test workflow logic
@pytest.mark.integration
def test_summarization_workflow(self):
    with patch("podcast_scraper.providers.ml.summarizer.SummaryModel") as mock_model:
        mock_model.return_value.summarize.return_value = {"summary": "test"}
        # Test workflow orchestration with mocked ML

Never Mock¶

Internal Implementations
Config, factories, providers, RSS parser, metadata generation
These are what we're testing
Filesystem I/O
Use tempfile.TemporaryDirectory for isolation
Test actual file operations
Component Interactions
Provider → metadata, workflow → providers
This is the integration we're testing

Real ML Models Belong in E2E Only¶

Do not use real ML models in integration tests. If a test loads a real ML model (Whisper, spaCy, Transformers) or calls a real AI API (OpenAI, Gemini, Ollama), it belongs in tests/e2e/ with @pytest.mark.e2e and @pytest.mark.ml_models.

make check-test-policy (rule I1-ml-models-marker) enforces this automatically.

Integration tests verify how our components wire together. The ML/AI boundary is always a mock or stub at this layer.

Test Patterns¶

Component Workflow Test¶

@pytest.mark.integration
def test_rss_to_provider_workflow(self):
    """Test RSS parsing → Episode creation → Provider processing."""
    # Use real internal implementations
    feed = parse_rss_feed(rss_content)
    episodes = create_episodes(feed)

    # Mock external HTTP
    with patch("podcast_scraper.rss.downloader.fetch_url") as mock_fetch:
        mock_fetch.return_value = b"transcript content"
        result = process_episodes(episodes, cfg)

    assert result.success

Provider Integration Test (mocked ML)¶

@pytest.mark.integration
def test_transcription_workflow(self):
    """Test transcription provider wiring with mocked Whisper."""
    with patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper"):
        provider = create_transcription_provider(cfg)
        provider.initialize()
        # Verify provider creation and lifecycle, not ML output

Mocking LLM SDKs at `sys.modules` (scoped, not module-global)¶

Always pair patch.dict("sys.modules", …).start() with a matching tearDownModule .stop(). Otherwise the mock leaks across test files and breaks downstream tests that need the real SDK.

When an integration test needs to construct a real provider class (OpenAIProvider, AnthropicProvider, GeminiProvider, etc.) but doesn't want the test machine to require [llm] extras for collection, the canonical pattern is to mock the SDK package at sys.modules import time. Provider modules use soft imports (try: from openai import OpenAI; except ImportError: openai = None), so mocking openai in sys.modules makes the import succeed and the symbol non-None.

Wrong (the unscoped .start() leaks the mock for the rest of the pytest process — including xdist workers — and a later test like tests/integration/infrastructure/test_e2e_server.py ends up with a MagicMock() openai client instead of the real one):

mock_openai = MagicMock()
patch.dict("sys.modules", {"openai": mock_openai}).start()  # ❌ leak

Right (scope to this module via setUpModule / tearDownModule):

mock_openai = MagicMock()
mock_openai.__spec__ = importlib.util.spec_from_loader("openai", loader=None)
_patch_openai = patch.dict("sys.modules", {"openai": mock_openai})


def setUpModule():
    # Scope the SDK mock to this module only — otherwise it leaks into
    # other integration test files that need the real SDK.
    _patch_openai.start()


def tearDownModule():
    _patch_openai.stop()


# Provider import happens AFTER the patch object exists but BEFORE setUpModule
# fires. That's fine: the integration tier has the SDK installed for real, so
# the import succeeds with the real package. The mock activates for tests that
# need to bypass real SDK constructors via ``provider.client = MagicMock()``.
from podcast_scraper.providers.openai.openai_provider import OpenAIProvider

Canonical examples in the repo:

tests/integration/providers/llm/test_gemini_provider.py — Gemini (google.genai)
tests/integration/providers/llm/test_openai_provider.py — OpenAI
tests/integration/providers/llm/test_ollama_provider.py — Ollama (openai + httpx)
tests/integration/providers/llm/test_*_bundled_methods.py — #698 bundled-method tests, same pattern

Don't mock httpx at integration tier. Several provider modules do real isinstance(base, httpx.Timeout) checks in timeout_config.py. Mocking httpx in sys.modules makes the type check raise TypeError: isinstance() arg 2 must be a type. The integration tier has [llm] (which includes httpx) installed for real — let it through. The exception is OllamaProvider which has an if httpx is None runtime check at __init__; that's already handled by the real httpx being importable.

Local HTTP Server Test¶

@pytest.mark.integration
def test_http_client_behavior(self, local_http_server):
    """Test HTTP client with local server."""
    url = local_http_server.url_for("/test")
    response = http_get(url, user_agent, timeout)
    assert response.status_code == 200

Model Cache Helpers (E2E only)¶

Real-ML tests live in tests/e2e/ (not here). They use cache helpers to skip gracefully when models are not downloaded:

from tests.integration.ml_model_cache_helpers import (
    require_whisper_model_cached,
    require_transformers_model_cached,
    require_spacy_model_cached,
)

@pytest.mark.e2e
@pytest.mark.ml_models
def test_with_real_models(self):
    require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
    require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
    # Test with real models...

Directory Organization¶

Integration tests are organized by domain subsystem — the area of functionality being exercised — not by source module. This differs from unit tests, which mirror the src/ tree 1:1.

Why domain-based? An integration test for "provider factory creates Ollama provider, initializes, and summarizes" spans config.py, summarization/factory.py, providers/ollama/, and prompts/store.py. No single source module owns it. The right grouping is the subsystem under test.

tests/integration/
├── providers/               # Provider factories, protocols, error handling
│   ├── llm/                # LLM provider integration (Anthropic, OpenAI, …)
│   ├── ml/                 # ML model loading, embedding, QA, NLI, summarizer
│   └── ollama/             # Ollama model-specific tests
├── workflow/                # Orchestration, stages, resume, parallelism, metadata
├── gi/                      # GI artifacts, KG artifacts, evidence stack
├── server/                  # FastAPI viewer: wired app, corpus library, index rebuild/stats
├── search/                  # FAISS indexing, corpus search
├── rss/                     # RSS parsing, HTTP fetching
├── eval/                    # Evaluation framework
├── infrastructure/          # Fixture mapping, infra concerns
├── tools/                   # CLI tools
└── (root)                   # Cross-cutting: filesystem, retry, cache, audio

Rules of thumb:

A domain folder is created when 3+ test files share a subsystem.
Truly cross-cutting tests (filesystem helpers, retry, transcript cache, audio preprocessing) stay in root — they span multiple subsystems.
Each folder has an __init__.py (empty) for pytest collection.

Comparison with unit test layout¶

Aspect	Unit tests	Integration tests
Axis	Source module (mirrors `src/`)	Domain subsystem
Depth	Deep (matches package nesting)	Shallow (1–2 levels)
Finding tests	"Where's the test for this file?"	"Where are tests for this subsystem?"
Duplication	1:1 with source files	One folder may cover many source files

Test Files by Domain¶

providers/¶

Subfolder	Purpose	Example files
`llm/`	LLM provider integration	`test_anthropic_providers.py`, `test_openai_providers.py`
`ml/`	ML model loading, embedding, QA, NLI (mocked inference)	`test_embedding_loader_integration.py`, `test_model_loader_integration.py`
`ollama/`	Ollama model-specific tests	`test_gemma2_9b_summary.py`, `test_llama3_1_8b_speaker.py`
(root)	Cross-provider: factories, protocols, capabilities	`test_capabilities_integration.py`, `test_fallback_behavior.py`

workflow/¶

Purpose	Example files
Orchestration and stages	`test_workflow_integration.py`, `test_workflow_stages_integration.py`
Metadata generation	`test_metadata_integration.py`, `test_kg_metadata_integration.py`
Resume and parallelism	`test_resume_behavior.py`, `test_parallel_summarization.py`
Queue and MPS	`test_bounded_queue_integration.py`, `test_mps_exclusive_integration.py`

gi/¶

Purpose	Example files
GI artifacts	`test_gi_integration.py`
KG artifacts	`test_kg_integration.py`
Evidence stack	`test_evidence_stack_integration.py`

Root (cross-cutting)¶

Purpose	File
Filesystem helpers	`test_filesystem_integration.py`
Retry with metrics	`test_retry_integration.py`
Transcript cache	`test_transcript_cache_integration.py`
Audio preprocessing	`test_audio_preprocessing_integration.py`
Summary schema	`test_summary_schema_integration.py`
Protocol verification	`test_protocol_verification_integration.py`

Real HTTP client integration (local server)¶

tests/integration/rss/test_http_integration.py exercises podcast_scraper.rss.downloader against a local http.server on 127.0.0.1 (marker integration_http). There is no external network; pytest allows localhost sockets for this suite.

Global downloader state: The module uses thread-local requests.Session objects with urllib3 Retry adapters. Production defaults retry many times on 5xx with exponential backoff, which can make a test that hits a handler returning only 500 look hung. This file uses an autouse fixture that calls configure_http_policy(), caps retries with configure_downloader(...), and downloader.reset_http_sessions() so each test builds sessions with bounded retries. Teardown clears downloader overrides.

If you add integration tests that call fetch_url / fetch_rss_feed_url for real HTTP, reuse the same pattern (or mock HTTP). See CONFIGURATION.md — Download resilience (threading and metrics) for how configuration applies to sessions.

FastAPI viewer, CIL, `bridge.json`, and semantic search lift¶

Wired HTTP coverage for the GI/KG viewer stack lives under tests/integration/server/ (not the legacy root-level test_server_api.py path). Typical modules:

test_server_api.py — health, artifacts (including *.bridge.json), index stats, search, explore, factory edge cases.
test_cil_api.py — GET /api/persons/* and GET /api/topics/* (RFC-072 CIL) against a minimal corpus with sibling GI / KG / bridge files.
test_viewer_corpus_library.py — Corpus Library responses that surface has_bridge and bridge_relative_path.

Bridge assembly (builder invariants, not HTTP): tests/integration/test_bridge_integration.py.

Semantic search lift and offset verification are primarily unit-tested under tests/unit/podcast_scraper/search/ (transcript_chunk_lift, gil_chunk_offset_verify). Operator validation on a real indexed corpus uses make verify-gil-offsets-strict or python -m podcast_scraper.cli verify-gil-chunk-offsets — see Semantic Search Guide — Chunk-to-Insight lift and GIL / KG / CIL cross-layer.

Transformers cache (E2E `ml_models` only)¶

Root tests/conftest.py sets HF_HUB_OFFLINE=1 / TRANSFORMERS_OFFLINE=1 for pytest. Integration tests do not load real Transformers weights. E2E tests marked ml_models that need facebook/bart-base or allenai/led-base-16384 (MAP/REDUCE test defaults; config aliases bart-small and long-fast) call require_transformers_model_cached and pytest.skip when those snapshots are missing under get_transformers_cache_dir() (project .cache/huggingface/hub or HF_HUB_CACHE).

Satisfy cache locally (needs network once):

make preload-ml-models

CI: The workflow preloads models and sets ML_MODELS_VALIDATED=true so workers trust the cache after the shell validation step.

Examples: tests/e2e/test_ml_models_e2e.py (summarization, preload helpers, and related).

Running Integration Tests¶

# All integration tests
make test-integration

# Fast (excludes ml_models)
make test-integration-fast

# Sequential (for debugging)
pytest tests/integration/ -n 0

# Specific domain
pytest tests/integration/providers/ -v
pytest tests/integration/workflow/ -v
pytest tests/integration/gi/ -v

# Specific test file
pytest tests/integration/workflow/test_component_workflows.py -v

Test Markers¶

@pytest.mark.integration -- Required for all integration tests
@pytest.mark.critical_path -- Critical path tests (run in fast suite). See Critical Path Testing Guide

@pytest.mark.ml_models must not appear on integration tests (enforced by make check-test-policy, rule I1). Real-ML tests belong in tests/e2e/.

Provider Testing¶

For provider-specific integration testing (E2E server mock endpoints, provider switching):

→ Provider Implementation Guide - Testing Your Provider

Covers:

Provider works with E2E server mock endpoints
Provider switching tests
Error handling in workflow context
Integration test checklist for new providers

Coverage Targets¶

Total tests: ~530
Focus: Critical paths, component interactions, edge cases