Skip to content

Unit Testing Guide

See also:

This guide covers unit test implementation details: what to mock, isolation patterns, and testing practices.

Overview

Unit tests test individual functions/modules in isolation with all dependencies mocked.

Aspect Requirement
Speed < 100ms per test
Scope Single function or class
Dependencies All mocked
Network Blocked (enforced by pytest plugin)
Filesystem Blocked except tempfile (enforced by pytest plugin)
ML Models Not loaded (mocked before import)

What to Mock

Always Mock

  1. HTTP/Network Calls
@patch("podcast_scraper.rss.downloader.requests.get")
def test_download(self, mock_get):
    mock_get.return_value.status_code = 200
    mock_get.return_value.content = b"test content"
    # ...
  1. ML Models (Whisper, spaCy, Transformers)

```python # Mock before importing dependent modules @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper") @patch("podcast_scraper.providers.ml.ml_provider.speaker_detection.get_ner_model") @patch("podcast_scraper.providers.ml.ml_provider.summarizer.SummaryModel") def test_provider_creation(self, mock_summary, mock_ner, mock_whisper):

   # Test provider creation without loading real models
  1. External API Clients (OpenAI, etc.)
@patch("podcast_scraper.providers.openai.openai_provider.OpenAI")
def test_openai_provider(self, mock_client):
    mock_client.return_value.chat.completions.create.return_value = ...
  1. Filesystem Operations (when testing logic, not file operations)

```python @patch("builtins.open", mock_open(read_data="test content")) def test_file_reading(self): # Test file reading logic

Never Mock in Unit Tests

  • The function/class being tested - That's what we're testing
  • Pure helper functions - Test them directly
  • Data classes/models - Create real instances

Isolation Enforcement

Unit tests automatically enforce isolation via pytest plugins in tests/unit/conftest.py:

Network Isolation

All network calls are blocked. If a test attempts network access:

NetworkCallDetectedError: Attempt to make network call detected in unit test

Blocked:

  • requests.get(), requests.post(), requests.Session() methods
  • urllib.request.urlopen()
  • urllib3.PoolManager()
  • socket.create_connection()

Filesystem Isolation

All filesystem I/O is blocked (except tempfile). If a test attempts I/O:

FilesystemIODetectedError: Attempt to perform filesystem I/O in unit test

Blocked:

  • open() for file operations (outside temp directories)
  • os.makedirs(), os.remove(), os.unlink(), os.rmdir(), os.rename()
  • shutil.copy(), shutil.move(), shutil.rmtree()
  • Path.write_text(), Path.write_bytes(), Path.mkdir(), Path.unlink()

Allowed:

  • tempfile.mkdtemp(), tempfile.NamedTemporaryFile()
  • Operations within temp directories
  • Cache directories (~/.cache/, ~/.local/share/)
  • Site-packages (read-only)
  • Python cache files (.pyc, __pycache__/)

ML Dependency Mocking

Unit tests must run without ML packages installed (for CI speed). Mock ML modules before importing dependent code:

import sys
from unittest.mock import MagicMock

# Mock ML modules before import

sys.modules["whisper"] = MagicMock()
sys.modules["spacy"] = MagicMock()
sys.modules["torch"] = MagicMock()
sys.modules["transformers"] = MagicMock()

# Now import the module that uses these

from podcast_scraper.providers.ml import ml_provider

CI Verification: scripts/tools/check_unit_test_imports.py verifies modules can import without ML deps.

Test Structure

class TestModuleName(unittest.TestCase):
    """Test module_name module."""

    def setUp(self):
        """Set up test fixtures."""
        self.temp_dir = tempfile.mkdtemp()

    def tearDown(self):
        """Clean up test fixtures."""
        shutil.rmtree(self.temp_dir, ignore_errors=True)

    @patch("module.dependency")
    def test_function_success(self, mock_dependency):
        """Test successful function execution."""
        # Arrange
        mock_dependency.return_value = expected_value

        # Act
        result = function_under_test(input)

        # Assert
        self.assertEqual(result, expected_result)
        mock_dependency.assert_called_once_with(...)

    def test_function_error_handling(self):
        """Test function error handling."""
        with self.assertRaises(ExpectedError):
            function_under_test(invalid_input)

Provider Testing Patterns

Standalone Provider Tests

Test MLProvider/OpenAIProvider directly with mocked dependencies:

class TestMLProvider(unittest.TestCase):
    """Test MLProvider standalone."""

    @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper")
    def test_transcription_initialization(self, mock_whisper):
        """Test transcription capability initialization."""
        provider = MLProvider(cfg)
        provider.initialize()
        mock_whisper.assert_called_once()

Factory Tests

Test factories create correct unified providers:

def test_create_transcription_provider_ml():
    """Test factory creates MLProvider for 'whisper'."""
    provider = create_transcription_provider(cfg)
    assert hasattr(provider, "transcribe")  # Protocol compliance

Key Principle: Verify protocol compliance, not class names.

Common Test Fixtures

# Mock HTTP Response

class MockHTTPResponse:
    def __init__(self, content, status_code=200, headers=None):
        self.content = content
        self.status_code = status_code
        self.headers = headers or {}

# Mock Whisper Model

mock_whisper_model = MagicMock()
mock_whisper_model.transcribe.return_value = {
    "text": "transcribed text",
    "segments": []
}

# Mock spaCy NLP

mock_nlp = MagicMock()
mock_nlp.return_value = [MagicMock(text="John", label_="PERSON")]

Test Files

Module Test File
config.py tests/unit/podcast_scraper/test_config.py
filesystem.py tests/unit/podcast_scraper/test_filesystem.py
rss_parser.py tests/unit/podcast_scraper/test_rss_parser.py
rss/downloader.py tests/unit/podcast_scraper/test_downloader.py
service.py tests/unit/podcast_scraper/test_service.py
providers/ml/summarizer.py tests/unit/podcast_scraper/test_summarizer.py
speaker_detection.py tests/unit/podcast_scraper/test_speaker_detection.py
metadata.py tests/unit/podcast_scraper/test_metadata.py
Provider factories tests/unit/podcast_scraper/*/test_*_provider.py
MLProvider tests/unit/podcast_scraper/ml/test_ml_provider.py
OpenAIProvider tests/unit/podcast_scraper/openai/test_openai_provider.py

Running Unit Tests

# All unit tests

make test-unit

# Specific module

pytest tests/unit/podcast_scraper/test_config.py -v

# With coverage

pytest tests/unit/ --cov=podcast_scraper --cov-report=term-missing

Provider Testing

For provider-specific testing patterns (unit tests for MLProvider, OpenAIProvider, factories):

Provider Implementation Guide - Testing Your Provider

Covers:

  • Provider creation and initialization tests
  • Mock API client patterns
  • Factory tests and protocol compliance
  • Testing checklist for new providers

Best Practices

Test Behavior, Not Implementation

Focus on testing what the code does, not how it does it:

# ✅ Good: Tests behavior/outcome

def test_summarizer_returns_shortened_text(self):
    result = summarize(long_text)
    assert len(result) < len(long_text)
    assert "key point" in result

# ❌ Bad: Tests implementation details

def test_summarizer_calls_tokenizer_twice(self):
    summarize(long_text)
    assert mock_tokenizer.call_count == 2  # Brittle!

Test Error Paths and Edge Cases

Don't just test happy paths. Cover:

  • Error conditions: Invalid input, missing files, network failures
  • Edge cases: Empty input, very large input, special characters
  • Boundary conditions: Zero, one, max values
def test_download_handles_404(self):
    """Test graceful handling of missing file."""
    mock_response.status_code = 404
    with self.assertRaises(DownloadError):
        download_file(url)

def test_parse_empty_transcript(self):
    """Test empty transcript handling."""
    result = parse_transcript("")
    assert result.segments == []

Use Descriptive Test Names

Test names should explain what is being tested:

# ✅ Good: Descriptive names

def test_config_validation_rejects_negative_workers(self):
def test_rss_parser_extracts_transcript_url_from_feed(self):
def test_whisper_provider_falls_back_on_timeout(self):

# ❌ Bad: Vague names

def test_config(self):
def test_parse(self):
def test_error(self):

Keep Tests Fast

Unit tests should run in < 100ms each:

  • Mock all I/O operations
  • Don't load real ML models
  • Use minimal test data
  • Avoid time.sleep() in tests

Coverage Targets

  • Overall: >80%
  • Critical modules: >90% (config, workflow, episode_processor)
  • Total tests: ~3,000