Unit Testing Guide¶
See also:
- Testing Strategy - High-level testing philosophy and test pyramid
- Testing Guide - Quick reference and test execution commands
This guide covers unit test implementation details: what to mock, isolation patterns, and testing practices.
Overview¶
Unit tests test individual functions/modules in isolation with all dependencies mocked.
| Aspect | Requirement |
|---|---|
| Speed | < 100ms per test |
| Scope | Single function or class |
| Dependencies | All mocked |
| Network | Blocked (enforced by pytest plugin) |
| Filesystem | Blocked except tempfile (enforced by pytest plugin) |
| ML Models | Not loaded (mocked before import) |
What to Mock¶
Always Mock¶
- HTTP/Network Calls
@patch("podcast_scraper.rss.downloader.requests.get")
def test_download(self, mock_get):
mock_get.return_value.status_code = 200
mock_get.return_value.content = b"test content"
# ...
- ML Models (Whisper, spaCy, Transformers)
```python # Mock before importing dependent modules @patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper") @patch("podcast_scraper.providers.ml.ml_provider.speaker_detection.get_ner_model") @patch("podcast_scraper.providers.ml.ml_provider.summarizer.SummaryModel") def test_provider_creation(self, mock_summary, mock_ner, mock_whisper):
# Test provider creation without loading real models
- External API Clients (OpenAI, etc.)
@patch("podcast_scraper.providers.openai.openai_provider.OpenAI")
def test_openai_provider(self, mock_client):
mock_client.return_value.chat.completions.create.return_value = ...
- Filesystem Operations (when testing logic, not file operations)
```python @patch("builtins.open", mock_open(read_data="test content")) def test_file_reading(self): # Test file reading logic
Never Mock in Unit Tests¶
- The function/class being tested - That's what we're testing
- Pure helper functions - Test them directly
- Data classes/models - Create real instances
Isolation Enforcement¶
Unit tests automatically enforce isolation via pytest plugins in tests/unit/conftest.py:
Network Isolation¶
All network calls are blocked. If a test attempts network access:
NetworkCallDetectedError: Attempt to make network call detected in unit test
Blocked:
requests.get(),requests.post(),requests.Session()methodsurllib.request.urlopen()urllib3.PoolManager()socket.create_connection()
Filesystem Isolation¶
All filesystem I/O is blocked (except tempfile). If a test attempts I/O:
FilesystemIODetectedError: Attempt to perform filesystem I/O in unit test
Blocked:
open()for file operations (outside temp directories)os.makedirs(),os.remove(),os.unlink(),os.rmdir(),os.rename()shutil.copy(),shutil.move(),shutil.rmtree()Path.write_text(),Path.write_bytes(),Path.mkdir(),Path.unlink()
Allowed:
tempfile.mkdtemp(),tempfile.NamedTemporaryFile()- Operations within temp directories
- Cache directories (
~/.cache/,~/.local/share/) - Site-packages (read-only)
- Python cache files (
.pyc,__pycache__/)
ML Dependency Mocking¶
Unit tests must run without ML packages installed (for CI speed). Mock ML modules before importing dependent code:
import sys
from unittest.mock import MagicMock
# Mock ML modules before import
sys.modules["whisper"] = MagicMock()
sys.modules["spacy"] = MagicMock()
sys.modules["torch"] = MagicMock()
sys.modules["transformers"] = MagicMock()
# Now import the module that uses these
from podcast_scraper.providers.ml import ml_provider
CI Verification: scripts/tools/check_unit_test_imports.py verifies modules can import without ML deps.
Test Structure¶
class TestModuleName(unittest.TestCase):
"""Test module_name module."""
def setUp(self):
"""Set up test fixtures."""
self.temp_dir = tempfile.mkdtemp()
def tearDown(self):
"""Clean up test fixtures."""
shutil.rmtree(self.temp_dir, ignore_errors=True)
@patch("module.dependency")
def test_function_success(self, mock_dependency):
"""Test successful function execution."""
# Arrange
mock_dependency.return_value = expected_value
# Act
result = function_under_test(input)
# Assert
self.assertEqual(result, expected_result)
mock_dependency.assert_called_once_with(...)
def test_function_error_handling(self):
"""Test function error handling."""
with self.assertRaises(ExpectedError):
function_under_test(invalid_input)
Provider Testing Patterns¶
Standalone Provider Tests¶
Test MLProvider/OpenAIProvider directly with mocked dependencies:
class TestMLProvider(unittest.TestCase):
"""Test MLProvider standalone."""
@patch("podcast_scraper.providers.ml.ml_provider._import_third_party_whisper")
def test_transcription_initialization(self, mock_whisper):
"""Test transcription capability initialization."""
provider = MLProvider(cfg)
provider.initialize()
mock_whisper.assert_called_once()
Factory Tests¶
Test factories create correct unified providers:
def test_create_transcription_provider_ml():
"""Test factory creates MLProvider for 'whisper'."""
provider = create_transcription_provider(cfg)
assert hasattr(provider, "transcribe") # Protocol compliance
Key Principle: Verify protocol compliance, not class names.
Common Test Fixtures¶
# Mock HTTP Response
class MockHTTPResponse:
def __init__(self, content, status_code=200, headers=None):
self.content = content
self.status_code = status_code
self.headers = headers or {}
# Mock Whisper Model
mock_whisper_model = MagicMock()
mock_whisper_model.transcribe.return_value = {
"text": "transcribed text",
"segments": []
}
# Mock spaCy NLP
mock_nlp = MagicMock()
mock_nlp.return_value = [MagicMock(text="John", label_="PERSON")]
Test Files¶
| Module | Test File |
|---|---|
config.py |
tests/unit/podcast_scraper/test_config.py |
filesystem.py |
tests/unit/podcast_scraper/test_filesystem.py |
rss_parser.py |
tests/unit/podcast_scraper/test_rss_parser.py |
rss/downloader.py |
tests/unit/podcast_scraper/test_downloader.py |
service.py |
tests/unit/podcast_scraper/test_service.py |
providers/ml/summarizer.py |
tests/unit/podcast_scraper/test_summarizer.py |
speaker_detection.py |
tests/unit/podcast_scraper/test_speaker_detection.py |
metadata.py |
tests/unit/podcast_scraper/test_metadata.py |
| Provider factories | tests/unit/podcast_scraper/*/test_*_provider.py |
| MLProvider | tests/unit/podcast_scraper/ml/test_ml_provider.py |
| OpenAIProvider | tests/unit/podcast_scraper/openai/test_openai_provider.py |
Running Unit Tests¶
# All unit tests
make test-unit
# Specific module
pytest tests/unit/podcast_scraper/test_config.py -v
# With coverage
pytest tests/unit/ --cov=podcast_scraper --cov-report=term-missing
Provider Testing¶
For provider-specific testing patterns (unit tests for MLProvider, OpenAIProvider, factories):
→ Provider Implementation Guide - Testing Your Provider
Covers:
- Provider creation and initialization tests
- Mock API client patterns
- Factory tests and protocol compliance
- Testing checklist for new providers
Best Practices¶
Test Behavior, Not Implementation¶
Focus on testing what the code does, not how it does it:
# ✅ Good: Tests behavior/outcome
def test_summarizer_returns_shortened_text(self):
result = summarize(long_text)
assert len(result) < len(long_text)
assert "key point" in result
# ❌ Bad: Tests implementation details
def test_summarizer_calls_tokenizer_twice(self):
summarize(long_text)
assert mock_tokenizer.call_count == 2 # Brittle!
Test Error Paths and Edge Cases¶
Don't just test happy paths. Cover:
- Error conditions: Invalid input, missing files, network failures
- Edge cases: Empty input, very large input, special characters
- Boundary conditions: Zero, one, max values
def test_download_handles_404(self):
"""Test graceful handling of missing file."""
mock_response.status_code = 404
with self.assertRaises(DownloadError):
download_file(url)
def test_parse_empty_transcript(self):
"""Test empty transcript handling."""
result = parse_transcript("")
assert result.segments == []
Use Descriptive Test Names¶
Test names should explain what is being tested:
# ✅ Good: Descriptive names
def test_config_validation_rejects_negative_workers(self):
def test_rss_parser_extracts_transcript_url_from_feed(self):
def test_whisper_provider_falls_back_on_timeout(self):
# ❌ Bad: Vague names
def test_config(self):
def test_parse(self):
def test_error(self):
Keep Tests Fast¶
Unit tests should run in < 100ms each:
- Mock all I/O operations
- Don't load real ML models
- Use minimal test data
- Avoid
time.sleep()in tests
Coverage Targets¶
- Overall: >80%
- Critical modules: >90% (config, workflow, episode_processor)
- Total tests: ~3,000