Skip to content

Integration Testing Guide

See also:

This guide covers integration test implementation: what to mock vs use real, component interaction testing, and mocking guidelines.

Overview

Integration tests test component interactions with limited mocking. Use real internal implementations, mock external dependencies.

Aspect Requirement
Speed < 5 seconds per test
Scope Multiple components working together
Internal implementations Real (Config, factories, providers, workflow)
Filesystem Real (temp directories)
External HTTP Mocked (or local test server)
ML/AI models and APIs Always mocked (real ML/AI is E2E only)

Mocking Philosophy

Always Mock

  1. HTTP Requests (External Network)
@patch("podcast_scraper.rss.downloader.fetch_url")
def test_component_workflow(self, mock_fetch):
    mock_fetch.return_value = b"<rss>...</rss>"
    # Test component interactions
  1. External API Calls (OpenAI, etc.)

```python @patch("podcast_scraper.providers.openai.openai_provider.OpenAI") def test_openai_provider_integration(self, mock_client):

   # Mock API client, test provider integration

Always Mock: ML/AI models and APIs

All ML models (Whisper, spaCy, Transformers) and all AI APIs (OpenAI, Gemini, Ollama, etc.) are always mocked in integration tests. Real ML inference and real API calls belong exclusively in E2E tests.

@pytest.mark.ml_models must not appear on integration tests. If a test loads a real ML model or calls a real AI API, it is an E2E test and belongs in tests/e2e/.

# Integration: mock ML, test component wiring
@pytest.mark.integration
def test_config_to_provider_creation(self):
    with patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper"):
        provider = create_transcription_provider(cfg)
        # Test provider creation, not ML execution

# Integration: mock summarization, test workflow logic
@pytest.mark.integration
def test_summarization_workflow(self):
    with patch("podcast_scraper.providers.ml.summarizer.SummaryModel") as mock_model:
        mock_model.return_value.summarize.return_value = {"summary": "test"}
        # Test workflow orchestration with mocked ML

Never Mock

  1. Internal Implementations
  2. Config, factories, providers, RSS parser, metadata generation
  3. These are what we're testing

  4. Filesystem I/O

  5. Use tempfile.TemporaryDirectory for isolation
  6. Test actual file operations

  7. Component Interactions

  8. Provider → metadata, workflow → providers
  9. This is the integration we're testing

Real ML Models Belong in E2E Only

Do not use real ML models in integration tests. If a test loads a real ML model (Whisper, spaCy, Transformers) or calls a real AI API (OpenAI, Gemini, Ollama), it belongs in tests/e2e/ with @pytest.mark.e2e and @pytest.mark.ml_models.

make check-test-policy (rule I1-ml-models-marker) enforces this automatically.

Integration tests verify how our components wire together. The ML/AI boundary is always a mock or stub at this layer.

Test Patterns

Component Workflow Test

@pytest.mark.integration
def test_rss_to_provider_workflow(self):
    """Test RSS parsing → Episode creation → Provider processing."""
    # Use real internal implementations
    feed = parse_rss_feed(rss_content)
    episodes = create_episodes(feed)

    # Mock external HTTP
    with patch("podcast_scraper.rss.downloader.fetch_url") as mock_fetch:
        mock_fetch.return_value = b"transcript content"
        result = process_episodes(episodes, cfg)

    assert result.success

Provider Integration Test (mocked ML)

@pytest.mark.integration
def test_transcription_workflow(self):
    """Test transcription provider wiring with mocked Whisper."""
    with patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper"):
        provider = create_transcription_provider(cfg)
        provider.initialize()
        # Verify provider creation and lifecycle, not ML output

Mocking LLM SDKs at sys.modules (scoped, not module-global)

Always pair patch.dict("sys.modules", …).start() with a matching tearDownModule .stop(). Otherwise the mock leaks across test files and breaks downstream tests that need the real SDK.

When an integration test needs to construct a real provider class (OpenAIProvider, AnthropicProvider, GeminiProvider, etc.) but doesn't want the test machine to require [llm] extras for collection, the canonical pattern is to mock the SDK package at sys.modules import time. Provider modules use soft imports (try: from openai import OpenAI; except ImportError: openai = None), so mocking openai in sys.modules makes the import succeed and the symbol non-None.

Wrong (the unscoped .start() leaks the mock for the rest of the pytest process — including xdist workers — and a later test like tests/integration/infrastructure/test_e2e_server.py ends up with a MagicMock() openai client instead of the real one):

mock_openai = MagicMock()
patch.dict("sys.modules", {"openai": mock_openai}).start()  # ❌ leak

Right (scope to this module via setUpModule / tearDownModule):

mock_openai = MagicMock()
mock_openai.__spec__ = importlib.util.spec_from_loader("openai", loader=None)
_patch_openai = patch.dict("sys.modules", {"openai": mock_openai})


def setUpModule():
    # Scope the SDK mock to this module only — otherwise it leaks into
    # other integration test files that need the real SDK.
    _patch_openai.start()


def tearDownModule():
    _patch_openai.stop()


# Provider import happens AFTER the patch object exists but BEFORE setUpModule
# fires. That's fine: the integration tier has the SDK installed for real, so
# the import succeeds with the real package. The mock activates for tests that
# need to bypass real SDK constructors via ``provider.client = MagicMock()``.
from podcast_scraper.providers.openai.openai_provider import OpenAIProvider

Canonical examples in the repo:

  • tests/integration/providers/llm/test_gemini_provider.py — Gemini (google.genai)
  • tests/integration/providers/llm/test_openai_provider.py — OpenAI
  • tests/integration/providers/llm/test_ollama_provider.py — Ollama (openai + httpx)
  • tests/integration/providers/llm/test_*_bundled_methods.py — #698 bundled-method tests, same pattern

Don't mock httpx at integration tier. Several provider modules do real isinstance(base, httpx.Timeout) checks in timeout_config.py. Mocking httpx in sys.modules makes the type check raise TypeError: isinstance() arg 2 must be a type. The integration tier has [llm] (which includes httpx) installed for real — let it through. The exception is OllamaProvider which has an if httpx is None runtime check at __init__; that's already handled by the real httpx being importable.

Local HTTP Server Test

@pytest.mark.integration
def test_http_client_behavior(self, local_http_server):
    """Test HTTP client with local server."""
    url = local_http_server.url_for("/test")
    response = http_get(url, user_agent, timeout)
    assert response.status_code == 200

Model Cache Helpers (E2E only)

Real-ML tests live in tests/e2e/ (not here). They use cache helpers to skip gracefully when models are not downloaded:

from tests.integration.ml_model_cache_helpers import (
    require_whisper_model_cached,
    require_transformers_model_cached,
    require_spacy_model_cached,
)

@pytest.mark.e2e
@pytest.mark.ml_models
def test_with_real_models(self):
    require_whisper_model_cached(config.TEST_DEFAULT_WHISPER_MODEL)
    require_transformers_model_cached(config.TEST_DEFAULT_SUMMARY_MODEL, None)
    # Test with real models...

Directory Organization

Integration tests are organized by domain subsystem — the area of functionality being exercised — not by source module. This differs from unit tests, which mirror the src/ tree 1:1.

Why domain-based? An integration test for "provider factory creates Ollama provider, initializes, and summarizes" spans config.py, summarization/factory.py, providers/ollama/, and prompts/store.py. No single source module owns it. The right grouping is the subsystem under test.

tests/integration/
├── providers/               # Provider factories, protocols, error handling
│   ├── llm/                # LLM provider integration (Anthropic, OpenAI, …)
│   ├── ml/                 # ML model loading, embedding, QA, NLI, summarizer
│   └── ollama/             # Ollama model-specific tests
├── workflow/                # Orchestration, stages, resume, parallelism, metadata
├── gi/                      # GI artifacts, KG artifacts, evidence stack
├── server/                  # FastAPI viewer: wired app, corpus library, index rebuild/stats
├── search/                  # FAISS indexing, corpus search
├── rss/                     # RSS parsing, HTTP fetching
├── eval/                    # Evaluation framework
├── infrastructure/          # Fixture mapping, infra concerns
├── tools/                   # CLI tools
└── (root)                   # Cross-cutting: filesystem, retry, cache, audio

Rules of thumb:

  • A domain folder is created when 3+ test files share a subsystem.
  • Truly cross-cutting tests (filesystem helpers, retry, transcript cache, audio preprocessing) stay in root — they span multiple subsystems.
  • Each folder has an __init__.py (empty) for pytest collection.

Comparison with unit test layout

Aspect Unit tests Integration tests
Axis Source module (mirrors src/) Domain subsystem
Depth Deep (matches package nesting) Shallow (1–2 levels)
Finding tests "Where's the test for this file?" "Where are tests for this subsystem?"
Duplication 1:1 with source files One folder may cover many source files

Test Files by Domain

providers/

Subfolder Purpose Example files
llm/ LLM provider integration test_anthropic_providers.py, test_openai_providers.py
ml/ ML model loading, embedding, QA, NLI (mocked inference) test_embedding_loader_integration.py, test_model_loader_integration.py
ollama/ Ollama model-specific tests test_gemma2_9b_summary.py, test_llama3_1_8b_speaker.py
(root) Cross-provider: factories, protocols, capabilities test_capabilities_integration.py, test_fallback_behavior.py

workflow/

Purpose Example files
Orchestration and stages test_workflow_integration.py, test_workflow_stages_integration.py
Metadata generation test_metadata_integration.py, test_kg_metadata_integration.py
Resume and parallelism test_resume_behavior.py, test_parallel_summarization.py
Queue and MPS test_bounded_queue_integration.py, test_mps_exclusive_integration.py

gi/

Purpose Example files
GI artifacts test_gi_integration.py
KG artifacts test_kg_integration.py
Evidence stack test_evidence_stack_integration.py

Root (cross-cutting)

Purpose File
Filesystem helpers test_filesystem_integration.py
Retry with metrics test_retry_integration.py
Transcript cache test_transcript_cache_integration.py
Audio preprocessing test_audio_preprocessing_integration.py
Summary schema test_summary_schema_integration.py
Protocol verification test_protocol_verification_integration.py

Real HTTP client integration (local server)

tests/integration/rss/test_http_integration.py exercises podcast_scraper.rss.downloader against a local http.server on 127.0.0.1 (marker integration_http). There is no external network; pytest allows localhost sockets for this suite.

Global downloader state: The module uses thread-local requests.Session objects with urllib3 Retry adapters. Production defaults retry many times on 5xx with exponential backoff, which can make a test that hits a handler returning only 500 look hung. This file uses an autouse fixture that calls configure_http_policy(), caps retries with configure_downloader(...), and downloader.reset_http_sessions() so each test builds sessions with bounded retries. Teardown clears downloader overrides.

If you add integration tests that call fetch_url / fetch_rss_feed_url for real HTTP, reuse the same pattern (or mock HTTP). See CONFIGURATION.md — Download resilience (threading and metrics) for how configuration applies to sessions.

FastAPI viewer, CIL, bridge.json, and semantic search lift

Wired HTTP coverage for the GI/KG viewer stack lives under tests/integration/server/ (not the legacy root-level test_server_api.py path). Typical modules:

  • test_server_api.py — health, artifacts (including *.bridge.json), index stats, search, explore, factory edge cases.
  • test_cil_api.pyGET /api/persons/* and GET /api/topics/* (RFC-072 CIL) against a minimal corpus with sibling GI / KG / bridge files.
  • test_viewer_corpus_library.py — Corpus Library responses that surface has_bridge and bridge_relative_path.

Bridge assembly (builder invariants, not HTTP): tests/integration/test_bridge_integration.py.

Semantic search lift and offset verification are primarily unit-tested under tests/unit/podcast_scraper/search/ (transcript_chunk_lift, gil_chunk_offset_verify). Operator validation on a real indexed corpus uses make verify-gil-offsets-strict or python -m podcast_scraper.cli verify-gil-chunk-offsets — see Semantic Search Guide — Chunk-to-Insight lift and GIL / KG / CIL cross-layer.

Transformers cache (E2E ml_models only)

Root tests/conftest.py sets HF_HUB_OFFLINE=1 / TRANSFORMERS_OFFLINE=1 for pytest. Integration tests do not load real Transformers weights. E2E tests marked ml_models that need facebook/bart-base or allenai/led-base-16384 (MAP/REDUCE test defaults; config aliases bart-small and long-fast) call require_transformers_model_cached and pytest.skip when those snapshots are missing under get_transformers_cache_dir() (project .cache/huggingface/hub or HF_HUB_CACHE).

Satisfy cache locally (needs network once):

make preload-ml-models

CI: The workflow preloads models and sets ML_MODELS_VALIDATED=true so workers trust the cache after the shell validation step.

Examples: tests/e2e/test_ml_models_e2e.py (summarization, preload helpers, and related).

Running Integration Tests

# All integration tests
make test-integration

# Fast (excludes ml_models)
make test-integration-fast

# Sequential (for debugging)
pytest tests/integration/ -n 0

# Specific domain
pytest tests/integration/providers/ -v
pytest tests/integration/workflow/ -v
pytest tests/integration/gi/ -v

# Specific test file
pytest tests/integration/workflow/test_component_workflows.py -v

Test Markers

  • @pytest.mark.integration -- Required for all integration tests
  • @pytest.mark.critical_path -- Critical path tests (run in fast suite). See Critical Path Testing Guide

@pytest.mark.ml_models must not appear on integration tests (enforced by make check-test-policy, rule I1). Real-ML tests belong in tests/e2e/.

Provider Testing

For provider-specific integration testing (E2E server mock endpoints, provider switching):

Provider Implementation Guide - Testing Your Provider

Covers:

  • Provider works with E2E server mock endpoints
  • Provider switching tests
  • Error handling in workflow context
  • Integration test checklist for new providers

Coverage Targets

  • Total tests: ~530
  • Focus: Critical paths, component interactions, edge cases