RFC-053: Adaptive Summarization Routing Based on Episode Profiling¶

Status: Draft
Date: 2026-02-05
Authors:
Stakeholders: Maintainers, users processing diverse podcast types, developers integrating summarization
Execution Timing: Phase 4 — Implement after RFC-042 (Hybrid ML Platform) and RFC-049 (GIL) are stable. KG extraction (RFC-055) can use the same profiling and routing hooks when the KG pipeline is enabled; KG routing ships when PRD-019 / RFC-055 artifacts are implemented, not a blocker for summarization-only routing. This RFC is an optimization layer that makes existing capabilities work better across diverse content types. It also serves as the bridge to multi-content-type expansion beyond podcasts.
Related PRDs:
docs/prd/PRD-005-episode-summarization.md (Episode summarization requirements)
docs/prd/PRD-019-knowledge-graph-layer.md (Knowledge Graph — optional routing for KG extraction)
Related RFCs:
docs/rfc/RFC-012-episode-summarization.md (Current summarization implementation)
docs/rfc/RFC-042-hybrid-summarization-pipeline.md (Hybrid MAP-REDUCE architecture — provides models)
docs/rfc/RFC-044-model-registry.md (Model Registry — model capability lookup)
docs/rfc/RFC-049-grounded-insight-layer-core.md (GIL — extraction routing per content type)
docs/rfc/RFC-055-knowledge-graph-layer-core.md (KG — entity/topic/relationship extraction; optional routing)
docs/rfc/RFC-052-locally-hosted-llm-models-with-prompts.md (Local LLM models — routing targets)
Related ADRs:
docs/adr/ADR-010-hierarchical-summarization-pattern.md (Hierarchical summarization)

Execution Order:

Phase 1: RFC-044 (Model Registry)          ~2-3 weeks
    ▼
Phase 2: RFC-042 (Hybrid ML Platform)      ~10 weeks
    │  + RFC-052 (Local LLM Prompts)       parallel
    ▼
Phase 3: RFC-049 (GIL)                     ~6-8 weeks
    │  RFC-055 (KG) may overlap / follow    (separate feature flag)
    ▼
Phase 4: RFC-053 (this RFC — Routing)      ~4-6 weeks
          Optimization + multi-content bridge

Why Phase 4? RFC-053 routes to capabilities that RFC-042 provides (MAP/REDUCE models, FLAN-T5, LLMs) and can also optimize GIL extraction (RFC-049) and, when enabled, KG extraction (RFC-055). It requires those foundations to be stable first. Additionally, the profiling data collected during Phase 3 (GIL extraction on real episodes) provides empirical evidence for tuning routing thresholds. KG uses the same EpisodeProfile so entity/topic strategies align with content shape (dense vs dialogue vs long-form) without merging GIL and KG contracts.

Abstract¶

This RFC proposes an adaptive routing system for podcast summarization that selects optimal summarization strategies based on episode characteristics (duration, structure, content type). Instead of using a single summarization approach for all episodes, the system profiles each episode and routes it to the most appropriate strategy. This enables consistent output quality across diverse podcast formats while keeping system complexity manageable.

Key Principle: Standardize the pipeline and outputs; vary strategy via routing.

GIL vs KG: Routing is orthogonal to the product split in PRD-017 vs PRD-019: the same profile informs summarization, grounded insight extraction, and knowledge-graph extraction, but outputs remain in gi.json vs KG artifacts respectively (no shared JSON contract).

Beyond Podcasts: While v1 focuses on podcast episode profiles, the profiling and routing architecture is content-type-agnostic. The same framework extends to lectures, panel discussions, interviews, debates, audiobooks, and other long-form audio/text content. RFC-053 is the bridge from "podcast scraper" to "content intelligence platform".

Problem Statement¶

The current summarization pipeline (RFC-012) uses a uniform approach for all episodes: BART/LED models with MAP-REDUCE summarization, complex chunking logic, and two-pass aggregation. However, podcasts vary significantly:

Duration: 10 minutes to multiple hours
Structure: Monologue vs dialogue vs panel discussions
Content: Technical vs abstract vs narrative
Speaker patterns: Single host, interview format, roundtable discussions

A single summarization strategy does not generalize well across all cases:

Short episodes (< 15 min) don't need complex chunking
Dialogue-heavy episodes benefit from speaker-aware processing
Technical content requires extraction-first strategies
Long monologues need hierarchical chunking with strong reducers

Current limitations:

One-size-fits-all approach misses optimization opportunities
No adaptation to episode characteristics
Quality varies significantly across episode types
Evaluation metrics are averaged across heterogeneous content

Goals¶

Support diverse podcast formats with consistent output quality
Avoid model sprawl and pipeline fragmentation
Improve faithfulness, coverage, and structure per episode type
Enable systematic benchmarking and future model swaps
Maintain simplicity - routing logic should be deterministic and debuggable

Non-Goals¶

Selecting a single "best" summarization model
Fully replacing the current pipeline immediately
Introducing provider-specific dependencies into core logic
Real-time routing decisions (profiling happens once per episode)

Constraints & Assumptions¶

Constraints:

Must be backward compatible with existing summarization pipeline
Routing decisions must be deterministic and reproducible
Profiling must be fast (< 1s per episode)
Must work with existing providers (ML, OpenAI, Ollama, etc.)
Routing logic must be logged for debugging

Assumptions:

Episode transcripts are available before summarization
Speaker detection results are available (for dialogue profiling)
Episode metadata (duration, etc.) is available
Users accept that different episodes may use different strategies

Design & Implementation¶

1. Episode Profiling¶

Before invoking any ML models, each episode is profiled using inexpensive heuristics:

Profile Metrics:

Duration (minutes) - From episode metadata
Transcript token count - From transcript analysis
Speaker count - From speaker detection results
Turn-taking rate - Dialogue vs monologue indicator
Topic drift - Semantic variance over time (optional, requires embeddings)
Named entity density - Technical content indicator
Numeric density - Data-heavy content indicator

Profiling Implementation:

@dataclass
class EpisodeProfile:
    """Episode characteristics for routing decisions."""
    duration_minutes: float
    token_count: int
    speaker_count: int
    turn_taking_rate: float  # Turns per minute
    entity_density: float  # Entities per 1000 tokens
    numeric_density: float  # Numbers per 1000 tokens
    topic_drift: Optional[float] = None  # Semantic variance (optional)

2. Episode Profiles (Routing Categories)¶

The following profiles cover most podcast types:

2.1 Short Monologue (≤15 min)¶

Characteristics: - Duration ≤ 15 minutes - Single speaker or minimal dialogue - Token count < 2000

Strategy: - Single-pass summary (no chunking) - Direct summarization with BART or LED - Minimal processing overhead

Models: - BART-large (fast, good quality for short content) - LED-base (if context window allows)

2.2 Short Dialogue (≤30 min)¶

Characteristics: - Duration ≤ 30 minutes - Multiple speakers (2-4) - High turn-taking rate - Token count < 4000

Strategy: - Chunk by speaker-turn blocks - Emphasis on "who said what" - Speaker-aware summarization

Models: - BART-large with speaker-aware chunking - LED-base for longer dialogues

2.3 Long Monologue (30-180+ min)¶

Characteristics: - Duration > 30 minutes - Single speaker or minimal dialogue - Token count > 4000

Strategy: - Hierarchical chunking - Strong reducer focus - MAP-REDUCE with LED or LongT5

Models: - LED-large or LongT5-large (MAP phase) - Instruction-tuned LLM (REDUCE phase, RFC-042)

2.4 Long Dialogue / Panel (60-240+ min)¶

Characteristics: - Duration > 60 minutes - Multiple speakers (3+) - High turn-taking rate - Token count > 8000

Strategy: - Topic segmentation - Speaker-position extraction - Hierarchical MAP-REDUCE

Models: - LED-large or LongT5-large (MAP phase) - Instruction-tuned LLM (REDUCE phase, RFC-042)

2.5 Technical / Dense Content¶

Characteristics: - High entity density (> 10 entities per 1000 tokens) - High numeric density (> 5 numbers per 1000 tokens) - Technical terminology

Strategy: - Extraction-first approach - Conservative summarization - Preserve facts, numbers, entities

Models: - BART-large with extraction prompts - LED-base for long technical content

2.6 Abstract / Philosophical Content¶

Characteristics: - Low entity density - Low numeric density - High topic drift - Narrative structure

Strategy: - Argument and claim mapping - Narrative synthesis - Abstractive summarization

Models: - LED-large (better for abstract content) - Instruction-tuned LLM (REDUCE phase, RFC-042)

3. Routing Rules (Deterministic)¶

Routing is rule-based and logged for debuggability:

def route_episode(profile: EpisodeProfile) -> SummarizationStrategy:
    """Route episode to appropriate summarization strategy."""

    # Short monologue
    if profile.duration_minutes <= 15 and profile.speaker_count <= 1:
        return SummarizationStrategy.SHORT_MONOLOGUE

    # Short dialogue
    if profile.duration_minutes <= 30 and profile.turn_taking_rate > 2.0:
        return SummarizationStrategy.SHORT_DIALOGUE

    # Technical content
    if profile.entity_density > 10.0 or profile.numeric_density > 5.0:
        return SummarizationStrategy.TECHNICAL

    # Long monologue
    if profile.duration_minutes > 30 and profile.speaker_count <= 2:
        return SummarizationStrategy.LONG_MONOLOGUE

    # Long dialogue/panel
    if profile.duration_minutes > 60 and profile.speaker_count > 2:
        return SummarizationStrategy.LONG_DIALOGUE

    # Default: standard MAP-REDUCE
    return SummarizationStrategy.STANDARD

Routing Thresholds (Initial):

Token count < 2000 → Single-pass strategy
Speaker turn rate > 2.0 turns/min → Dialogue strategy
Entity density > 10.0 per 1000 tokens → Technical strategy
Topic drift > threshold → Topic segmentation
Duration > 60 min + multiple speakers → Panel strategy

4. Model Roles¶

The system is structured around stable roles (compatible with RFC-042):

Extractor (Map Pass)
High recall
Structured outputs (facts, bullets, entities)
Minimal hallucination
Models: BART, LED, LongT5
Summarizer (Map Pass)
Chunk-level narrative summaries
Preserves salience and context
Models: BART, LED, LongT5, PEGASUS
Reducer / Synthesizer
De-duplication and reconciliation
Global coherence
Schema and formatting compliance
Models: Instruction-tuned LLMs (RFC-042), BART-large, LED-large
Finalizer (Optional)
Style, tone, and output normalization
Often combined with reducer
Models: Instruction-tuned LLMs

5. Extraction-First Intermediate Artifacts¶

All strategies produce structured intermediate outputs before reduction:

@dataclass
class ExtractionArtifacts:
    """Structured intermediate outputs before reduction."""
    key_points: List[str]
    claims: List[Claim]  # With supporting evidence
    entities: List[Entity]  # With roles
    numbers: List[Number]  # Value, unit, context
    notable_quotes: List[Quote]  # Optional timestamps
    speaker_positions: List[SpeakerPosition]  # Dialogue only
    definitions: List[Definition]  # Technical only

Reducers operate exclusively on these artifacts, not raw transcripts.

6. Integration with Existing Pipeline¶

Backward Compatibility:

Default routing: Standard MAP-REDUCE (current behavior)
Profiling is opt-in (can be disabled)
Existing providers work unchanged
Routing decisions are logged but don't break existing workflows

Configuration:

# Enable adaptive routing
enable_adaptive_routing: bool = False  # Opt-in for backward compatibility

# Routing thresholds (tunable)
routing_token_threshold: int = 2000
routing_turn_rate_threshold: float = 2.0
routing_entity_density_threshold: float = 10.0

Key Decisions¶

Deterministic Routing
Decision: Use rule-based routing, not ML-based classification
Rationale: Deterministic, debuggable, reproducible. Fast (< 1s per episode). No training data needed.
Profile-Based Metrics
Decision: Use inexpensive heuristics (token count, speaker count, etc.)
Rationale: Fast profiling, no ML inference required. Sufficient for routing decisions.
Backward Compatibility
Decision: Make routing opt-in, default to current behavior
Rationale: No breaking changes. Users can opt-in gradually.
Structured Artifacts
Decision: All strategies produce extraction artifacts before reduction
Rationale: Enables consistent reducer interface. Feeds downstream KG construction (RFC-055) when the KG stage consumes the same structured intermediates or transcript slices selected by routing.
Per-Profile Evaluation
Decision: Track metrics per episode profile, not globally
Rationale: Avoids misleading averages. Enables profile-specific optimization.

Alternatives Considered¶

ML-Based Routing
Description: Train a classifier to route episodes
Pros: Potentially more accurate routing
Cons: Requires training data, adds complexity, less debuggable
Why Rejected: Deterministic rules are sufficient and more maintainable
Single Strategy for All
Description: Keep current one-size-fits-all approach
Pros: Simpler, no routing logic needed
Cons: Suboptimal quality for diverse episode types
Why Rejected: Quality improvements justify added complexity
Provider-Specific Routing
Description: Different routing per provider (ML vs OpenAI vs Ollama)
Pros: Provider-specific optimizations
Cons: Fragmentation, harder to maintain
Why Rejected: Unified routing is cleaner and more maintainable

Testing Strategy¶

Test Coverage:

Unit tests: Profile calculation, routing logic, threshold validation
Integration tests: End-to-end routing with real episodes
Quality validation: Compare routed vs non-routed summaries per profile
Performance testing: Profiling overhead (< 1s target)

Test Organization:

tests/unit/workflow/test_episode_profiling.py - Profile calculation
tests/unit/workflow/test_routing.py - Routing logic
tests/integration/test_adaptive_routing.py - End-to-end routing
tests/integration/test_profile_quality.py - Quality validation per profile

Test Execution:

Unit tests run in CI (fast, no ML dependencies)
Integration tests require real episodes (manual/local testing)
Quality validation: Compare summaries for 3-5 episodes per profile

Rollout & Monitoring¶

Prerequisites (must be complete before starting):

RFC-042 (Hybrid ML Platform) — provides model diversity to route to
RFC-049 (GIL) — stable extraction pipeline for GIL routing
RFC-055 (KG) — not a hard prerequisite for summarization routing; required before KG routing can run in production

Rollout Plan:

Phase 4a: Implement profiling and routing logic (opt-in, podcast profiles only)
Phase 4b: Validate routing decisions on representative episodes (summarization + GIL; + KG when RFC-055 pipeline is available)
Phase 4c: Enable by default for new episodes
Phase 4d: Iterate on thresholds based on quality
Phase 4e (v1.1): Add interview + lecture profiles
Phase 4f (v2): Multi-content expansion with content-type detection

Monitoring:

Routing decisions: Log which profile each episode gets
Quality metrics: Track per-profile quality (faithfulness, coverage, etc.)
Performance metrics: Profiling time, routing overhead
Usage tracking: Which profiles are most common

Success Criteria:

✅ Profiling completes in < 1s per episode
✅ Routing decisions are deterministic and reproducible
✅ Quality improves for at least 3 episode profiles
✅ No regressions for default (non-routed) behavior
✅ Documentation complete (routing guide, threshold tuning)

Integration with GIL (RFC-049)¶

Routing for GIL Extraction¶

Episode profiling benefits GIL extraction, not just summarization. Different content types benefit from different extraction strategies:

Profile	GIL Strategy	Rationale
Short Monologue	Single-pass FLAN-T5	Short enough for direct extraction
Short Dialogue	Speaker-aware extraction	"Who said what" matters for quotes
Long Monologue	MAP → REDUCE extraction	Chunking needed for long content
Long Dialogue	Topic-segmented extraction	Panel insights cluster by topic
Technical	Entity-first extraction	Preserve facts/numbers in insights
Abstract	Claim-mapping extraction	Focus on arguments and positions

Implementation: RFC-053 exposes profiling and strategy selectors (e.g. route_summarization(profile), route_gil_extraction(profile)) so RFC-049 can adapt GIL extraction per episode; route_kg_extraction(profile) is the KG analogue for RFC-055 (see § Integration with KG).

Shared Profiling¶

Episode profiling is computed once and reused:

profile = profile_episode(transcript, metadata, speakers)

# Summarization uses profile for strategy selection
summary_strategy = route_summarization(profile)

# GIL uses profile for extraction strategy selection
extraction_strategy = route_gil_extraction(profile)

# KG uses profile for graph extraction strategy (when generate_kg / RFC-055)
kg_strategy = route_kg_extraction(profile)

This avoids duplicate work and ensures consistent routing decisions across summarization, GIL, and KG pipelines.

Integration with KG (RFC-055)¶

Routing for KG extraction¶

Episode profiling benefits Knowledge Graph extraction (RFC-055) as well as summarization and GIL. Different content shapes suggest different entity, topic, and relationship strategies (still distinct from GIL insights and quotes):

Profile	KG strategy (illustrative)	Rationale
Short Monologue	Lightweight topic + entity pass	Few speakers; small graph
Short Dialogue	Speaker-linked entities and co-mentions	Graph edges reflect dialogue
Long Monologue	Chunked entity/topic passes with merge	Avoids single-shot limits
Long Dialogue / Panel	Topic-segmented graph passes	Aligns clusters with discussion structure
Technical	Entity-first, preserve named entities and relations	High density matches KG value
Abstract	Topic and theme nodes; sparse entity extraction	Low NER yield; focus on themes

Implementation: RFC-053 exposes the same profile_episode() result to a route_kg_extraction(profile) selector (name illustrative) so RFC-055 can adapt KG builders per episode without coupling to gi.json or GIL extraction internals.

Independence¶

Feature flags: generate_gi and KG generation (per RFC-055 / PRD-019) remain independently toggleable; routing hooks exist for both, but neither requires the other.
Artifacts: KG output paths and schema follow RFC-055 only; RFC-053 does not define KG node types.

Beyond Podcasts: Multi-Content Expansion¶

RFC-053's profiling and routing architecture is content-type-agnostic. The same framework extends to any long-form audio/text content.

Future Content Profiles¶

Beyond podcast-specific profiles (v1), the system can add content-type profiles:

Content Type	Key Characteristics	Strategy Adaptations
Lectures	Single speaker, structured, technical	Section-aware chunking, definition extraction
Interviews	Two speakers, Q&A format	Question-answer pairing, interviewer filtering
Panel Discussions	3+ speakers, topic hopping	Topic segmentation, speaker-position mapping
Debates	Opposing viewpoints, structured	Claim-counterclaim mapping, position extraction
Audiobooks	Narrative, long-form, chapters	Chapter-aware segmentation, narrative synthesis
Meetings	Multiple speakers, action items	Decision extraction, action item tracking
Earnings Calls	Structured, financial, Q&A	Financial entity extraction, guidance tracking

Expansion Strategy¶

Phase 4a (v1 — Podcasts):

Implement profiling + routing for podcast profiles
Validate on representative episodes
Tune thresholds based on quality feedback

Phase 4b (v1.1 — Adjacent Content):

Add interview and lecture profiles (closest to podcasts)
Test with real interview/lecture transcripts
Minimal routing rule additions

Phase 4c (v2 — Multi-Content):

Add panel, debate, meeting profiles
Introduce content-type detection (auto-classify input)
Expand extraction strategies for new content types

Key Insight: The profiling metrics (duration, speaker count, turn-taking rate, entity density, topic drift) are universal. What changes per content type is the routing rules and strategy implementations, not the profiling framework itself.

Content-Type Detection (Future)¶

For multi-content support, add auto-detection:

def detect_content_type(
    profile: EpisodeProfile,
    metadata: dict,
) -> ContentType:
    """Auto-detect content type from profile + metadata."""
    # Heuristic-based detection
    if profile.speaker_count == 1 and profile.entity_density > 10:
        return ContentType.LECTURE
    if profile.speaker_count == 2 and profile.turn_taking_rate > 3.0:
        return ContentType.INTERVIEW
    if profile.speaker_count >= 3:
        return ContentType.PANEL
    # Default
    return ContentType.PODCAST

This enables the system to handle mixed input without manual content-type specification.

Relationship to Other RFCs¶

This RFC (RFC-053) is the optimization and expansion layer in the overall architecture:

Phase 1: RFC-044 (Model Registry)
    ▼
Phase 2: RFC-042 (Hybrid ML Platform)
    │  + RFC-052 (Local LLM Prompts)
    ▼
Phase 3: RFC-049 (GIL)
    │  RFC-055 (KG) — optional parallel track
    ▼
Phase 4: RFC-053 (this RFC)
          Routing + multi-content bridge

Dependency chain:

RFC-044 (Phase 1): Model capabilities — RFC-053 uses registry to check which models are available for each routing strategy
RFC-042 (Phase 2): Hybrid platform — provides the MAP/REDUCE models and FLAN-T5/LLM tiers that RFC-053 routes to
RFC-052 (Phase 2b): Local LLM prompts — provides model-specific prompts that RFC-053 can select per routing strategy
RFC-049 (Phase 3): GIL extraction — RFC-053 can route GIL extraction strategies per content type
RFC-055: KG extraction — RFC-053 can route KG graph-building strategies per content type (when KG is enabled); separate from GIL routing rules
RFC-053 (Phase 4, this RFC): Routing — selects optimal strategies for summarization, GIL, and optionally KG based on episode/content profiling

Key Distinction:

RFC-012: Basic summarization pipeline
RFC-042: Model diversity (MAP/REDUCE, FLAN-T5, LLMs, embedding, QA, NLI)
RFC-052: Prompt quality for local LLMs
RFC-049: GIL extraction orchestration
RFC-055: KG artifact model and extraction (PRD-019)
RFC-053: Adaptive routing — selects the right strategy from the available capabilities

Together, these provide:

Complete summarization pipeline (RFC-012)
High-quality model platform (RFC-042)
Local LLM options with optimized prompts (RFC-052)
Evidence-backed insight extraction (RFC-049)
Knowledge-graph extraction when enabled (RFC-055)
Adaptive routing for diverse content types (RFC-053)

Benefits¶

Improved Quality: Better summaries for diverse episode types
Optimized Performance: Right strategy for each episode
Systematic Evaluation: Per-profile metrics enable targeted improvements
Extensibility: Easy to add new profiles and routing rules
Debuggability: Deterministic routing with logging

Migration Path¶

For Users:

Opt-in to adaptive routing: enable_adaptive_routing: true
System automatically profiles episodes and routes appropriately
Review routing decisions in logs
Adjust thresholds if needed (via config)

For Developers:

Review RFC-053 (this document)
Implement profiling logic
Implement routing rules
Add integration tests
Validate on representative episodes

Open Questions¶

Threshold Tuning: What are optimal thresholds for routing rules? Use Phase 3 profiling data to calibrate.
Topic Drift Calculation: How to efficiently calculate semantic variance? Sentence-transformers (RFC-042) can provide embeddings for this.
Profile Expansion: When to add new profiles vs adjust existing ones? Start with 6 podcast profiles, expand to content-type profiles in v1.1.
Evaluation Metrics: What metrics matter most per profile? Per-profile quality (faithfulness, coverage).
~~Provider Integration: How do different providers affect routing?~~ Resolved: Routing is provider-agnostic. Strategies map to model roles (MAP, REDUCE), not specific providers. RFC-042 + RFC-044 handle provider/model resolution.
GIL Extraction Routing: Should GIL extraction use the same routing rules as summarization, or separate rules? Proposal: shared profiling, separate strategy selection.
KG Extraction Routing: Should KG use the same profile thresholds as GIL, or lighter/heavier passes by default per profile? Proposal: shared EpisodeProfile, separate route_kg_extraction() thresholds tuned against RFC-055 graph quality metrics.
Content-Type Detection: When should auto-detection of content type be implemented? Proposal: Phase 4e (v1.1), after podcast routing is validated.

References¶

Related PRD: docs/prd/PRD-005-episode-summarization.md
Related RFC: docs/rfc/RFC-012-episode-summarization.md
Prerequisite: docs/rfc/RFC-042-hybrid-summarization-pipeline.md
Prerequisite: docs/rfc/RFC-044-model-registry.md
Related RFC: docs/rfc/RFC-049-grounded-insight-layer-core.md
Related RFC: docs/rfc/RFC-055-knowledge-graph-layer-core.md
Related PRD: docs/prd/PRD-019-knowledge-graph-layer.md
Related RFC: docs/rfc/RFC-052-locally-hosted-llm-models-with-prompts.md
Related ADR: docs/adr/ADR-010-hierarchical-summarization-pattern.md
Source Code: podcast_scraper/workflow/stages/summarization_stage.py