Unit Testing Guide¶

See also:

Testing Strategy - High-level testing philosophy and test pyramid

Testing Guide - Quick reference and test execution commands

This guide covers unit test implementation details: what to mock, isolation patterns, and testing practices.

Overview¶

Unit tests test individual functions/modules in isolation with all dependencies mocked.

Aspect	Requirement
Speed	< 100ms per test
Scope	Single function or class
Dependencies	All mocked
Network	Blocked (enforced by pytest plugin)
Filesystem	Blocked except tempfile (enforced by pytest plugin)
ML Models	Not loaded (mocked before import)

Pyproject extras: what unit tests may depend on¶

Rule: Tests under tests/unit/ must only depend on [dev] and must not pull in [ml], [llm], [compare], real model stacks, or FastAPI TestClient / create_app wiring (those live in tests/integration/). Anything declared under [project.optional-dependencies].dev in pyproject.toml (including FastAPI) is allowed to be installed, but check_test_policy.py still forbids FastAPI imports in tests/unit/ — use unittest.mock.patch / patch.dict(sys.modules, …) or move the test.

Baseline extra: [dev] — this is what check_unit_test_imports.py and “no ML at import time” checks target. Treat anything declared under [project.optional-dependencies].dev in pyproject.toml (and its transitive wheels) as allowed for unit tests. Do not assume [ml] is installed.

Why this matters: CI test-unit installs pip install -e .[dev] only. Any test in tests/unit/ that needs a non-[dev] package will be silently skipped (via importorskip) or fail outright, meaning it never validates anything in CI. Integration CI jobs install .[dev,ml,llm], so tests there run with the full dependency set.

Viewer / FastAPI tests: Place in tests/integration/server/ (not tests/unit/). Use pytest.importorskip("fastapi") there. CI integration jobs install .[dev,ml,llm], so these tests run. Prefer thin HTTP boundaries (domain exceptions, lazy imports, patching FaissVectorStore.load, etc.) so most server logic can be tested without real FAISS or ML stacks. Reserve TestClient + create_app for route/contract checks in integration tests.

Local CI parity: make venv-dev-init creates .venv-dev with pip install -e .[dev] only (same extras as GitHub test-unit). Then make test-unit-dev-venv runs check_unit_test_imports + pytest tests/unit/ inside that env. Override path: make venv-dev-init VENVDEV=.venv-ci-unit. Install ffmpeg locally if audio-related unit tests fail (CI installs it in the unit job).

Anti-patterns for unit tests:

pytest.importorskip() for any non-[dev] package -- causes silent skips in CI, test never runs.
Top-level imports of modules that require [ml] / [llm] before mocks are applied.
TestClient / create_app calls that need FastAPI -- these belong in integration tests.

See also: Testing Strategy — Unit tests and optional extras for CI alignment and rationale.

What to Mock¶

Always Mock¶

HTTP/Network Calls

@patch("podcast_scraper.rss.downloader.requests.get")
def test_download(self, mock_get):
    mock_get.return_value.status_code = 200
    mock_get.return_value.content = b"test content"
    # ...

ML Models (Whisper, spaCy, Transformers)

```python # Mock before importing dependent modules @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper") @patch("podcast_scraper.providers.ml.ml_provider.speaker_detection.get_ner_model") @patch("podcast_scraper.providers.ml.ml_provider.summarizer.SummaryModel") def test_provider_creation(self, mock_summary, mock_ner, mock_whisper):

   # Test provider creation without loading real models

External API Clients (OpenAI, etc.)

@patch("podcast_scraper.providers.openai.openai_provider.OpenAI")
def test_openai_provider(self, mock_client):
    mock_client.return_value.chat.completions.create.return_value = ...

Filesystem Operations (when testing logic, not file operations)

```python @patch("builtins.open", mock_open(read_data="test content")) def test_file_reading(self): # Test file reading logic

Never Mock in Unit Tests¶

The function/class being tested - That's what we're testing
Pure helper functions - Test them directly
Data classes/models - Create real instances

Isolation Enforcement¶

Unit tests automatically enforce isolation via pytest plugins in tests/unit/conftest.py:

Network Isolation¶

All network calls are blocked. If a test attempts network access:

NetworkCallDetectedError: Attempt to make network call detected in unit test

Blocked:

requests.get(), requests.post(), requests.Session() methods
urllib.request.urlopen()
urllib3.PoolManager()
socket.create_connection()

Filesystem Isolation¶

All filesystem I/O is blocked (except tempfile). If a test attempts I/O:

FilesystemIODetectedError: Attempt to perform filesystem I/O in unit test

Blocked:

open() for file operations (outside temp directories)
os.makedirs(), os.remove(), os.unlink(), os.rmdir(), os.rename()
shutil.copy(), shutil.move(), shutil.rmtree()
Path.write_text(), Path.write_bytes(), Path.mkdir(), Path.unlink()

Allowed:

tempfile.mkdtemp(), tempfile.NamedTemporaryFile()
Operations within temp directories
Cache directories (~/.cache/, ~/.local/share/)
Site-packages (read-only)
Python cache files (.pyc, __pycache__/)

ML Dependency Mocking¶

Unit tests must run without ML packages installed (for CI speed). Mock ML modules before importing dependent code:

import sys
from unittest.mock import MagicMock

# Mock ML modules before import

sys.modules["whisper"] = MagicMock()
sys.modules["spacy"] = MagicMock()
sys.modules["torch"] = MagicMock()
sys.modules["transformers"] = MagicMock()

# Now import the module that uses these

from podcast_scraper.providers.ml import ml_provider

Direct sys.modules[...] = MagicMock() is fine in unit tests because each unit-test module is small-scoped and tests/unit/ is the leaf of the test pyramid (nothing downstream needs the real package). For integration tests that mock SDKs at sys.modules, use the scoped setUpModule / tearDownModule pattern instead — see Integration Testing Guide → Mocking LLM SDKs at sys.modules. Otherwise the mock leaks into other integration files that need the real SDK.

CI Verification:

scripts/tools/check_unit_test_imports.py (make check-unit-imports) -- verifies library modules can import without ML deps at import time.
scripts/tools/check_test_policy.py (make check-test-policy) -- enforces the 3-tier ML/AI boundary policy: no pytest.importorskip() in unit tests (rule U1), no *_AVAILABLE skip guards in unit tests (rule U2), no @pytest.mark.ml_models in integration tests (rule I1), and no empty test files anywhere (rule G1).

Both scripts run automatically in make ci and make ci-fast.

Test Structure¶

class TestModuleName(unittest.TestCase):
    """Test module_name module."""

    def setUp(self):
        """Set up test fixtures."""
        self.temp_dir = tempfile.mkdtemp()

    def tearDown(self):
        """Clean up test fixtures."""
        shutil.rmtree(self.temp_dir, ignore_errors=True)

    @patch("module.dependency")
    def test_function_success(self, mock_dependency):
        """Test successful function execution."""
        # Arrange
        mock_dependency.return_value = expected_value

        # Act
        result = function_under_test(input)

        # Assert
        self.assertEqual(result, expected_result)
        mock_dependency.assert_called_once_with(...)

    def test_function_error_handling(self):
        """Test function error handling."""
        with self.assertRaises(ExpectedError):
            function_under_test(invalid_input)

Provider Testing Patterns¶

Standalone Provider Tests¶

Test MLProvider/OpenAIProvider directly with mocked dependencies:

class TestMLProvider(unittest.TestCase):
    """Test MLProvider standalone."""

    @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper")
    def test_transcription_initialization(self, mock_whisper):
        """Test transcription capability initialization."""
        provider = MLProvider(cfg)
        provider.initialize()
        mock_whisper.assert_called_once()

Factory Tests¶

Test factories create correct unified providers:

def test_create_transcription_provider_ml():
    """Test factory creates MLProvider for 'whisper'."""
    provider = create_transcription_provider(cfg)
    assert hasattr(provider, "transcribe")  # Protocol compliance

Key Principle: Verify protocol compliance, not class names.

Common Test Fixtures¶

# Mock HTTP Response

class MockHTTPResponse:
    def __init__(self, content, status_code=200, headers=None):
        self.content = content
        self.status_code = status_code
        self.headers = headers or {}

# Mock Whisper Model

mock_whisper_model = MagicMock()
mock_whisper_model.transcribe.return_value = {
    "text": "transcribed text",
    "segments": []
}

# Mock spaCy NLP

mock_nlp = MagicMock()
mock_nlp.return_value = [MagicMock(text="John", label_="PERSON")]

Test Files¶

Module	Test File
`config.py`	`tests/unit/podcast_scraper/test_config.py`
`filesystem.py`	`tests/unit/podcast_scraper/test_filesystem.py`
`rss_parser.py`	`tests/unit/podcast_scraper/test_rss_parser.py`
`rss/downloader.py`	`tests/unit/podcast_scraper/test_downloader.py`
`service.py`	`tests/unit/podcast_scraper/test_service.py`
`providers/ml/summarizer.py`	`tests/unit/podcast_scraper/test_summarizer.py`
`speaker_detection.py`	`tests/unit/podcast_scraper/test_speaker_detection.py`
`metadata.py`	`tests/unit/podcast_scraper/test_metadata.py`
Provider factories	`tests/unit/podcast_scraper//test__provider.py`
MLProvider	`tests/unit/podcast_scraper/ml/test_ml_provider.py`
OpenAIProvider	`tests/unit/podcast_scraper/openai/test_openai_provider.py`

Running Unit Tests¶

# All unit tests

make test-unit

# Specific module

pytest tests/unit/podcast_scraper/test_config.py -v

# With coverage

pytest tests/unit/ --cov=podcast_scraper --cov-report=term-missing

Provider Testing¶

For provider-specific testing patterns (unit tests for MLProvider, OpenAIProvider, factories):

→ Provider Implementation Guide - Testing Your Provider

Covers:

Provider creation and initialization tests
Mock API client patterns
Factory tests and protocol compliance
Testing checklist for new providers

Best Practices¶

Test Behavior, Not Implementation¶

Focus on testing what the code does, not how it does it:

# Good: Tests behavior/outcome

def test_summarizer_returns_shortened_text(self):
    result = summarize(long_text)
    assert len(result) < len(long_text)
    assert "key point" in result

# Bad: Tests implementation details

def test_summarizer_calls_tokenizer_twice(self):
    summarize(long_text)
    assert mock_tokenizer.call_count == 2  # Brittle!

Test Error Paths and Edge Cases¶

Don't just test happy paths. Cover:

Error conditions: Invalid input, missing files, network failures
Edge cases: Empty input, very large input, special characters
Boundary conditions: Zero, one, max values

def test_download_handles_404(self):
    """Test graceful handling of missing file."""
    mock_response.status_code = 404
    with self.assertRaises(DownloadError):
        download_file(url)

def test_parse_empty_transcript(self):
    """Test empty transcript handling."""
    result = parse_transcript("")
    assert result.segments == []

Use Descriptive Test Names¶

Test names should explain what is being tested:

# Good: Descriptive names

def test_config_validation_rejects_negative_workers(self):
def test_rss_parser_extracts_transcript_url_from_feed(self):
def test_whisper_provider_falls_back_on_timeout(self):

# Bad: Vague names

def test_config(self):
def test_parse(self):
def test_error(self):

Keep Tests Fast¶

Unit tests should run in < 100ms each:

Mock all I/O operations
Don't load real ML models
Use minimal test data
Avoid time.sleep() in tests

Coverage Targets¶

Overall: >80%
Critical modules: >90% (config, workflow, episode_processor)
Total tests: ~3,000