Performance Report: E2E WIP v1 (April 2026)¶
Informal RFC-064 sweep — eleven frozen profiles on the same host, E2E fixture
podcast1_mtb, two episodes per run. Intended to exerciseprofile-freezepresets andprofile-diff, not as a release baseline.
| Field | Value |
|---|---|
| Date | April 2026 |
| Dataset label | e2e_podcast1_mtb_n2 (2 episodes, podcast1_mtb) |
| RSS | Mock (E2EHTTPServer); configs under config/profiles/capture_e2e_*.yaml |
| Warm-up | SKIP_WARMUP=1 for all runs below (faster WIP; not release-standard) |
| Host | Markos-MacBook-Pro.local (from profile YAML; your numbers will differ on other machines) |
| Artifacts | data/profiles/v2.6-wip-*.yaml in repo when committed |
| Methodology | Performance reports index, Performance Profile Guide |
Totals (headline)¶
Sorted by totals.wall_time_s ascending. MB = totals.peak_rss_mb.
| Release tag | Config (preset) | Wall (s) | Peak RSS (MB) | s/episode |
|---|---|---|---|---|
v2.6-wip-gemini |
capture_e2e_gemini.yaml |
7.89 | 1134 | 3.94 |
v2.6-wip-anthropic |
capture_e2e_anthropic.yaml |
12.23 | 1068 | 6.12 |
v2.6-wip-mistral |
capture_e2e_mistral.yaml |
17.23 | 1149 | 8.61 |
v2.6-wip-deepseek |
capture_e2e_deepseek.yaml |
44.86 | 1109 | 22.43 |
v2.6-wip-ollama-llama32 |
capture_e2e_ollama_llama32.yaml |
52.87 | 607 | 26.44 |
v2.6-wip-grok |
capture_e2e_grok.yaml |
92.62 | 1144 | 46.31 |
v2.6-wip-openai |
capture_e2e_openai.yaml |
95.38 | 617 | 47.69 |
v2.6-wip-ollama-llama31 |
capture_e2e_ollama_llama31_8b.yaml |
96.65 | 1056 | 48.33 |
v2.6-wip-ml-dev |
capture_e2e_ml_dev.yaml |
108.12 | 3282 | 54.06 |
v2.6-wip-ml-prod |
capture_e2e_ml_prod.yaml |
111.99 | 7159 | 56.00 |
v2.6-wip-ollama-qwen35 |
capture_e2e_ollama_qwen35.yaml |
123.57 | 995 | 61.79 |
Rounding: wall times shown to two decimals; YAML may carry more precision.
Pairs we diffed (same host)¶
Illustrative make profile-diff comparisons from this campaign:
| From | To | Note |
|---|---|---|
v2.6-wip-openai |
v2.6-wip-anthropic |
Different API stacks; transcript cache on both |
v2.6-wip-ml-dev |
v2.6-wip-ml-prod |
Prod stack much higher peak RSS (~+118% in-table delta) |
v2.6-wip-ollama-qwen35 |
v2.6-wip-ollama-llama32 |
Smaller Ollama model faster, lower RSS |
v2.6-wip-gemini |
v2.6-wip-grok |
Very different wall totals; check parallelism / stage attribution |
v2.6-wip-grok |
v2.6-wip-mistral |
Grok preset uses grok-3-mini (x.ai model IDs evolve) |
v2.6-wip-mistral |
v2.6-wip-deepseek |
Two “budget cloud text” stacks (mistral-large-latest vs deepseek-chat) |
v2.6-wip-ollama-llama31 |
v2.6-wip-ollama-llama32 |
Guide “privacy default” 8B vs 3B |
Re-run make profile-diff FROM=… TO=… locally to reproduce the Rich tables.
Caveats (read before drawing product conclusions)¶
- WIP / non-release —
SKIP_WARMUP=1skips the default cold-start scrub; official release captures should omit it unless debugging. - Not apples-to-apples across rows — presets differ (e.g. OpenAI uses API
transcription; Anthropic uses Whisper + API text; ML prod loads Pegasus +
trf). - Transcript cache — repeated runs hit disk cache; transcription stage may be absent or near-zero in YAML even though episodes were processed.
- Stage RSS/CPU — proportional sampling; short stages may show 0 MB or 0% CPU. See Interpreting the profile.
- Grok models —
capture_e2e_grok.yamlusesgrok-3-minifor summary and cleaning becausegrok-2/grok-betareturned 400 on the capture API at the time.