GitHub Actions Resource Usage Strategy¶
Overview¶
This document outlines our conservative resource usage strategy for GitHub Actions to avoid hitting limits and ensure reliable CI/CD.
GitHub Actions Limits¶
Free Tier Limits¶
- Minutes per month: 2,000 minutes
- Concurrent jobs: 20 jobs
- Job timeout: 6 hours (default, can be set lower)
- Cache size: 10 GB per repository
Paid Tier Limits¶
- Minutes per month: Varies by plan
- Concurrent jobs: Varies by plan
- Job timeout: 6 hours (default)
Our Conservative Strategy¶
1. Parallelism Reduction¶
Local Development (Makefile):
- Uses
auto - 2for pytest-xdist (reserves 2 cores for system) - Sequential summarization in test environments (1 worker)
GitHub Actions:
- MUST match local strategy to ensure consistent behavior
- Uses
auto - 2for pytest-xdist (reserves 2 cores for system) - Sequential summarization in test environments (1 worker)
Rationale:
- Reduces memory pressure
- Prevents resource exhaustion
- More predictable execution times
- Better for shared CI runners
2. Cache Strategy¶
Model Cache Locations:
- Local development:
.cache/in repo root (versioned structure, contents ignored) - GitHub Actions:
~/.cache/whisper,~/.local/share/spacy,~/.cache/huggingface(standard locations)
Cache Key Strategy:
- Versioned keys (
ml-models-${{ runner.os }}-v1) for cache invalidation - OS-specific keys for cross-platform compatibility
- Restore keys for fallback to older cache
Cache Size:
- Whisper models: ~150 MB
- spaCy models: ~50 MB (installed as packages)
- Transformers models: ~3.5 GB (facebook/bart-base + allenai/led-base-16384)
- Total: ~3.7 GB (well under 10 GB limit)
3. Job Timeouts¶
Recommended Timeouts:
- Lint job: 10 minutes (fast, no ML dependencies)
- Unit tests: 15 minutes (fast, no ML dependencies)
- Integration tests: 30 minutes (ML models, parallel execution)
- E2E tests: 45 minutes (ML models, parallel execution, multiple episodes)
- Nightly tests: 60 minutes (comprehensive suite with metrics)
Implementation:
timeout-minutes: 30 # Add to each job
4. Job Dependencies¶
Optimized Job Graph:
test-unit (parallel) ──┐
preload-ml-models (parallel) ──┐
docs (parallel) ──┐
build (parallel) ──┐
│
├──> test-integration
└──> test-e2e
Benefits:
- Fast jobs (lint, unit tests) run immediately
- ML model preloading runs in parallel (doesn't block)
- Heavy jobs (integration, E2E) wait for model cache
5. Conditional Execution¶
Path-based filtering:
- Only run tests when relevant files change
- Skip ML-heavy jobs for documentation-only PRs
- Skip full test suite for markdown-only changes
Branch-based filtering:
- Full test suite on
mainbranch - Full test suite on PRs (except docs-only)
- Nightly tests only on schedule/manual trigger
Resource Usage Estimates¶
Per PR (typical)¶
- Lint: ~3 minutes
- Unit tests: ~5 minutes
- Preload models: ~2 minutes (cache hit) or ~15 minutes (cache miss)
- Integration tests: ~20 minutes
- E2E tests: ~25 minutes
- Total: ~55 minutes per PR (with cache) or ~70 minutes (cache miss)
Per Nightly Run¶
- Comprehensive tests: ~45 minutes
- Data quality tests: ~15 minutes
- Total: ~60 minutes per nightly run
Monthly Estimates¶
- PRs: ~10 PRs/month × 55 minutes = 550 minutes
- Main branch pushes: ~5 pushes/month × 55 minutes = 275 minutes
- Nightly runs: ~30 runs/month × 60 minutes = 1,800 minutes
- Total: ~2,625 minutes/month (exceeds free tier limit)
Recommendation: Disable nightly schedule, use manual triggers only for now.
Monitoring¶
Key Metrics to Track¶
- Job duration: Should decrease with cache hits
- Cache hit rate: Should be >80% after initial setup
- Memory usage: Monitor for leaks or excessive usage
- Concurrent jobs: Should stay well under 20
Actions to Take if Limits Approached¶
- Reduce parallelism further (e.g.,
auto - 4) - Disable non-critical jobs
- Increase cache hit rate
- Split jobs into smaller chunks
- Consider GitHub Actions paid tier
Best Practices¶
- Always use conservative parallelism (
auto - 2minimum) - Set job timeouts to prevent runaway jobs
- Monitor cache hit rates and optimize cache keys
- Use path-based filtering to skip unnecessary runs
- Test locally first before pushing (saves CI minutes)
- Use
make ci-fastfor quick local validation
References¶
- GitHub Actions Usage Limits
- GitHub Actions Cache
- Pytest-xdist Documentation
- Workflows - Detailed workflow documentation