Performance Report: E2E WIP v1 (April 2026)¶

Informal RFC-064 sweep — eleven frozen profiles on the same host, E2E fixture podcast1_mtb, two episodes per run. Intended to exercise profile-freeze presets and profile-diff, not as a release baseline.

Field	Value
Date	April 2026
Dataset label	`e2e_podcast1_mtb_n2` (2 episodes, `podcast1_mtb`)
RSS	Mock (`E2EHTTPServer`); configs under `config/profiles/capture_e2e_*.yaml`
Warm-up	`SKIP_WARMUP=1` for all runs below (faster WIP; not release-standard)
Host	`Markos-MacBook-Pro.local` (from profile YAML; your numbers will differ on other machines)
Artifacts	`data/profiles/v2.6-wip-*.yaml` in repo when committed
Methodology	Performance reports index, Performance Profile Guide

Totals (headline)¶

Sorted by totals.wall_time_s ascending. MB = totals.peak_rss_mb.

Release tag	Config (preset)	Wall (s)	Peak RSS (MB)	s/episode
`v2.6-wip-gemini`	`capture_e2e_gemini.yaml`	7.89	1134	3.94
`v2.6-wip-anthropic`	`capture_e2e_anthropic.yaml`	12.23	1068	6.12
`v2.6-wip-mistral`	`capture_e2e_mistral.yaml`	17.23	1149	8.61
`v2.6-wip-deepseek`	`capture_e2e_deepseek.yaml`	44.86	1109	22.43
`v2.6-wip-ollama-llama32`	`capture_e2e_ollama_llama32.yaml`	52.87	607	26.44
`v2.6-wip-grok`	`capture_e2e_grok.yaml`	92.62	1144	46.31
`v2.6-wip-openai`	`capture_e2e_openai.yaml`	95.38	617	47.69
`v2.6-wip-ollama-llama31`	`capture_e2e_ollama_llama31_8b.yaml`	96.65	1056	48.33
`v2.6-wip-ml-dev`	`capture_e2e_ml_dev.yaml`	108.12	3282	54.06
`v2.6-wip-ml-prod`	`capture_e2e_ml_prod.yaml`	111.99	7159	56.00
`v2.6-wip-ollama-qwen35`	`capture_e2e_ollama_qwen35.yaml`	123.57	995	61.79

Rounding: wall times shown to two decimals; YAML may carry more precision.

Pairs we diffed (same host)¶

Illustrative make profile-diff comparisons from this campaign:

From	To	Note
`v2.6-wip-openai`	`v2.6-wip-anthropic`	Different API stacks; transcript cache on both
`v2.6-wip-ml-dev`	`v2.6-wip-ml-prod`	Prod stack much higher peak RSS (~+118% in-table delta)
`v2.6-wip-ollama-qwen35`	`v2.6-wip-ollama-llama32`	Smaller Ollama model faster, lower RSS
`v2.6-wip-gemini`	`v2.6-wip-grok`	Very different wall totals; check parallelism / stage attribution
`v2.6-wip-grok`	`v2.6-wip-mistral`	Grok preset uses `grok-3-mini` (x.ai model IDs evolve)
`v2.6-wip-mistral`	`v2.6-wip-deepseek`	Two “budget cloud text” stacks (`mistral-large-latest` vs `deepseek-chat`)
`v2.6-wip-ollama-llama31`	`v2.6-wip-ollama-llama32`	Guide “privacy default” 8B vs 3B

Re-run make profile-diff FROM=… TO=… locally to reproduce the Rich tables.

Caveats (read before drawing product conclusions)¶

WIP / non-release — SKIP_WARMUP=1 skips the default cold-start scrub; official release captures should omit it unless debugging.
Not apples-to-apples across rows — presets differ (e.g. OpenAI uses API transcription; Anthropic uses Whisper + API text; ML prod loads Pegasus + trf).
Transcript cache — repeated runs hit disk cache; transcription stage may be absent or near-zero in YAML even though episodes were processed.
Stage RSS/CPU — proportional sampling; short stages may show 0 MB or 0% CPU. See Interpreting the profile.
Grok models — capture_e2e_grok.yaml uses grok-3-mini for summary and cleaning because grok-2 / grok-beta returned 400 on the capture API at the time.

Performance Report: E2E WIP v1 (April 2026)¶

Totals (headline)¶

Pairs we diffed (same host)¶

Caveats (read before drawing product conclusions)¶

Related¶