Skip to content

ADR-068: BART+LED as Local ML Production Baseline

  • Status: Accepted
  • Date: 2026-04-03
  • Authors: Podcast Scraper Team
  • Related RFCs: RFC-057
  • Supersedes: ml_prod_authority_v1 (Pegasus+LED) — see ADR-067
  • See Also: ADR-048

Context & Problem Statement

Following Pegasus retirement (ADR-067), the project needed a validated local ML summarization baseline for podcast content. ml_small_authority (BART-small + LED) existed as a development baseline but had not been swept for optimal parameters. RFC-057 Track B defined a greedy one-param-at-a-time sweep ratchet to find the best local ML configuration empirically.

Decision

Promote ml_bart_led_autoresearch_v1 — BART-base MAP + LED-base-16384 REDUCE with autoresearch-tuned parameters — as the canonical local ML production baseline. Register in model_registry.py and set as PROD_DEFAULT_SUMMARY_MODE_ID.

Sweep Methodology

RFC-057 Track B uses a greedy ratchet:

  • Accept threshold: ≥ +1% relative ROUGE-L gain over current best
  • Early stop: 3 consecutive rejections within a param group
  • Reference: silver_sonnet46_smoke_v1 (Claude Sonnet 4.6 silver labels)
  • Dataset: curated_5feeds_smoke_v1 (5 episodes, 4 podcast feeds)

Sweep Results — Round 1 (Reduce Params)

Base config: baseline_ml_dev_authority (BART-base MAP, num_beams=4; LED REDUCE, max_new_tokens=650)

Param Candidate ROUGE-L Delta Decision
reduce max_new_tokens 450 rejected
reduce max_new_tokens 550 18.54% +2.89% Accepted
reduce max_new_tokens 750 rejected
reduce num_beams 6 18.82% +1.15% Accepted
reduce num_beams 8 rejected
reduce length_penalty 1.2 rejected
reduce length_penalty 1.5 rejected
reduce length_penalty 0.8 early stop

Round 1 outcome: ROUGE-L 18.05% → 18.82% (+4.26%), 2 params accepted.

Sweep Results — Round 2 (Map Params)

Base: round-1 winner (max_new_tokens=550, num_beams=6)

Param Candidate Delta Decision
map num_beams 6 +0.0%
map num_beams 8 early stop
reduce no_repeat_ngram_size 4, 5, 2 ≤0% all
reduce min_new_tokens 150, 280, 320 ≤0% all
reduce repetition_penalty 1.1, 1.5, 1.0 ≤0% all

Round 2 outcome: No further gain. Round 1 winner is the stable optimum.

Final Promoted Configuration

Registered as ml_bart_led_autoresearch_v1 in model_registry.py:

map_model:           bart-small (facebook/bart-base)
map_params:          num_beams=4, max_new_tokens=200, min_new_tokens=80,
                     no_repeat_ngram_size=3, repetition_penalty=1.3
reduce_model:        long-fast (allenai/led-base-16384)
reduce_params:       num_beams=6, max_new_tokens=550, min_new_tokens=220,
                     no_repeat_ngram_size=3, repetition_penalty=1.3
preprocessing:       cleaning_v4
chunking:            word_chunking, word_chunk_size=900, word_overlap=150
tokenize:            map_max_input_tokens=1024, reduce_max_input_tokens=4096

Measured Performance vs. Alternatives

Evaluated on curated_5feeds_smoke_v1 vs. silver_sonnet46_smoke_v1:

Mode ROUGE-L F1 Embedding Cosine Avg Tokens Privacy
ml_bart_led_autoresearch_v1 18.82% 72.6% ~230 100% local
ml_prod_authority_v1 (Pegasus) ~6.5% ~41% ~58 100% local
ml_small_authority (pre-sweep) ~16.3% ~70% ~185 100% local
OpenAI GPT-4o (cloud reference) ~28–32% ~82% ~420 cloud

Key finding: Sweeping just 2 reduce parameters (max_new_tokens, num_beams) gave +4.26% over the development baseline. The remaining ~10pp gap to cloud models is the motivation for the hybrid ML architecture (ADR-069).

Why BART over Pegasus for Local ML

Property BART-base Pegasus-CNN
Pretraining objective Text infilling (BERT-like denoising) GSG (gap sentence generation)
Domain General (books + web) News (CNN/DailyMail)
Podcast chunk diversity High — produces topically diverse summaries Low — near-duplicate (see ADR-067)
LED compatibility Compatible — diverse input enables ngram budget Incompatible — exhausts ngram budget

Consequences

  • Positive: 189% ROUGE-L improvement over Pegasus baseline in production.
  • Positive: Establishes a reproducible sweep methodology (RFC-057 Track B ratchet) for future model promotions.
  • Neutral: This mode is now the privacy-first fallback. The recommended production path is the hybrid ML pipeline (ADR-069), which surpasses this by a further +22.9%.
  • Neutral: PROD_DEFAULT_SUMMARY_MODE_ID points to this mode as the pure-ML anchor while hybrid validation completes.

Implementation Notes

  • Registry entry: src/podcast_scraper/providers/ml/model_registry.py_mode_registry["ml_bart_led_autoresearch_v1"]
  • Canonical eval config: data/eval/configs/ml/baseline_ml_bart_led_autoresearch_v1.yaml
  • Sweep TSVs: autoresearch/ml_param_tuning/results/bart_led_sweep_*.tsv
  • Default constant: src/podcast_scraper/config_constants.pyPROD_DEFAULT_SUMMARY_MODE_ID = "ml_bart_led_autoresearch_v1"

References