Unit Testing Guide¶

See also:

Testing Strategy - High-level testing philosophy and test pyramid

Testing Guide - Quick reference and test execution commands

This guide covers unit test implementation details: what to mock, isolation patterns, and testing practices.

Overview¶

Unit tests test individual functions/modules in isolation with all dependencies mocked.

Aspect	Requirement
Speed	< 100ms per test
Scope	Single function or class
Dependencies	All mocked
Network	Blocked (enforced by pytest plugin)
Filesystem	Blocked except tempfile (enforced by pytest plugin)
ML Models	Not loaded (mocked before import)

What to Mock¶

Always Mock¶

HTTP/Network Calls

@patch("podcast_scraper.rss.downloader.requests.get")
def test_download(self, mock_get):
    mock_get.return_value.status_code = 200
    mock_get.return_value.content = b"test content"
    # ...

ML Models (Whisper, spaCy, Transformers)

```python # Mock before importing dependent modules @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper") @patch("podcast_scraper.providers.ml.ml_provider.speaker_detection.get_ner_model") @patch("podcast_scraper.providers.ml.ml_provider.summarizer.SummaryModel") def test_provider_creation(self, mock_summary, mock_ner, mock_whisper):

   # Test provider creation without loading real models

External API Clients (OpenAI, etc.)

@patch("podcast_scraper.providers.openai.openai_provider.OpenAI")
def test_openai_provider(self, mock_client):
    mock_client.return_value.chat.completions.create.return_value = ...

Filesystem Operations (when testing logic, not file operations)

```python @patch("builtins.open", mock_open(read_data="test content")) def test_file_reading(self): # Test file reading logic

Never Mock in Unit Tests¶

The function/class being tested - That's what we're testing
Pure helper functions - Test them directly
Data classes/models - Create real instances

Isolation Enforcement¶

Unit tests automatically enforce isolation via pytest plugins in tests/unit/conftest.py:

Network Isolation¶

All network calls are blocked. If a test attempts network access:

NetworkCallDetectedError: Attempt to make network call detected in unit test

Blocked:

requests.get(), requests.post(), requests.Session() methods
urllib.request.urlopen()
urllib3.PoolManager()
socket.create_connection()

Filesystem Isolation¶

All filesystem I/O is blocked (except tempfile). If a test attempts I/O:

FilesystemIODetectedError: Attempt to perform filesystem I/O in unit test

Blocked:

open() for file operations (outside temp directories)
os.makedirs(), os.remove(), os.unlink(), os.rmdir(), os.rename()
shutil.copy(), shutil.move(), shutil.rmtree()
Path.write_text(), Path.write_bytes(), Path.mkdir(), Path.unlink()

Allowed:

tempfile.mkdtemp(), tempfile.NamedTemporaryFile()
Operations within temp directories
Cache directories (~/.cache/, ~/.local/share/)
Site-packages (read-only)
Python cache files (.pyc, __pycache__/)

ML Dependency Mocking¶

Unit tests must run without ML packages installed (for CI speed). Mock ML modules before importing dependent code:

import sys
from unittest.mock import MagicMock

# Mock ML modules before import

sys.modules["whisper"] = MagicMock()
sys.modules["spacy"] = MagicMock()
sys.modules["torch"] = MagicMock()
sys.modules["transformers"] = MagicMock()

# Now import the module that uses these

from podcast_scraper.providers.ml import ml_provider

CI Verification: scripts/tools/check_unit_test_imports.py verifies modules can import without ML deps.

Test Structure¶

class TestModuleName(unittest.TestCase):
    """Test module_name module."""

    def setUp(self):
        """Set up test fixtures."""
        self.temp_dir = tempfile.mkdtemp()

    def tearDown(self):
        """Clean up test fixtures."""
        shutil.rmtree(self.temp_dir, ignore_errors=True)

    @patch("module.dependency")
    def test_function_success(self, mock_dependency):
        """Test successful function execution."""
        # Arrange
        mock_dependency.return_value = expected_value

        # Act
        result = function_under_test(input)

        # Assert
        self.assertEqual(result, expected_result)
        mock_dependency.assert_called_once_with(...)

    def test_function_error_handling(self):
        """Test function error handling."""
        with self.assertRaises(ExpectedError):
            function_under_test(invalid_input)

Provider Testing Patterns¶

Standalone Provider Tests¶

Test MLProvider/OpenAIProvider directly with mocked dependencies:

class TestMLProvider(unittest.TestCase):
    """Test MLProvider standalone."""

    @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper")
    def test_transcription_initialization(self, mock_whisper):
        """Test transcription capability initialization."""
        provider = MLProvider(cfg)
        provider.initialize()
        mock_whisper.assert_called_once()

Factory Tests¶

Test factories create correct unified providers:

def test_create_transcription_provider_ml():
    """Test factory creates MLProvider for 'whisper'."""
    provider = create_transcription_provider(cfg)
    assert hasattr(provider, "transcribe")  # Protocol compliance

Key Principle: Verify protocol compliance, not class names.

Common Test Fixtures¶

# Mock HTTP Response

class MockHTTPResponse:
    def __init__(self, content, status_code=200, headers=None):
        self.content = content
        self.status_code = status_code
        self.headers = headers or {}

# Mock Whisper Model

mock_whisper_model = MagicMock()
mock_whisper_model.transcribe.return_value = {
    "text": "transcribed text",
    "segments": []
}

# Mock spaCy NLP

mock_nlp = MagicMock()
mock_nlp.return_value = [MagicMock(text="John", label_="PERSON")]

Test Files¶

Module	Test File
`config.py`	`tests/unit/podcast_scraper/test_config.py`
`filesystem.py`	`tests/unit/podcast_scraper/test_filesystem.py`
`rss_parser.py`	`tests/unit/podcast_scraper/test_rss_parser.py`
`rss/downloader.py`	`tests/unit/podcast_scraper/test_downloader.py`
`service.py`	`tests/unit/podcast_scraper/test_service.py`
`providers/ml/summarizer.py`	`tests/unit/podcast_scraper/test_summarizer.py`
`speaker_detection.py`	`tests/unit/podcast_scraper/test_speaker_detection.py`
`metadata.py`	`tests/unit/podcast_scraper/test_metadata.py`
Provider factories	`tests/unit/podcast_scraper//test__provider.py`
MLProvider	`tests/unit/podcast_scraper/ml/test_ml_provider.py`
OpenAIProvider	`tests/unit/podcast_scraper/openai/test_openai_provider.py`

Running Unit Tests¶

# All unit tests

make test-unit

# Specific module

pytest tests/unit/podcast_scraper/test_config.py -v

# With coverage

pytest tests/unit/ --cov=podcast_scraper --cov-report=term-missing

Provider Testing¶

For provider-specific testing patterns (unit tests for MLProvider, OpenAIProvider, factories):

→ Provider Implementation Guide - Testing Your Provider

Covers:

Provider creation and initialization tests
Mock API client patterns
Factory tests and protocol compliance
Testing checklist for new providers

Best Practices¶

Test Behavior, Not Implementation¶

Focus on testing what the code does, not how it does it:

# ✅ Good: Tests behavior/outcome

def test_summarizer_returns_shortened_text(self):
    result = summarize(long_text)
    assert len(result) < len(long_text)
    assert "key point" in result

# ❌ Bad: Tests implementation details

def test_summarizer_calls_tokenizer_twice(self):
    summarize(long_text)
    assert mock_tokenizer.call_count == 2  # Brittle!

Test Error Paths and Edge Cases¶

Don't just test happy paths. Cover:

Error conditions: Invalid input, missing files, network failures
Edge cases: Empty input, very large input, special characters
Boundary conditions: Zero, one, max values

def test_download_handles_404(self):
    """Test graceful handling of missing file."""
    mock_response.status_code = 404
    with self.assertRaises(DownloadError):
        download_file(url)

def test_parse_empty_transcript(self):
    """Test empty transcript handling."""
    result = parse_transcript("")
    assert result.segments == []

Use Descriptive Test Names¶

Test names should explain what is being tested:

# ✅ Good: Descriptive names

def test_config_validation_rejects_negative_workers(self):
def test_rss_parser_extracts_transcript_url_from_feed(self):
def test_whisper_provider_falls_back_on_timeout(self):

# ❌ Bad: Vague names

def test_config(self):
def test_parse(self):
def test_error(self):

Keep Tests Fast¶

Unit tests should run in < 100ms each:

Mock all I/O operations
Don't load real ML models
Use minimal test data
Avoid time.sleep() in tests

Coverage Targets¶

Overall: >80%
Critical modules: >90% (config, workflow, episode_processor)
Total tests: ~3,000