RFC-019 · Science Overlay & Episode System — architecture, pipeline, cost analysis

Status: Draft v0.4 · 2026-05-16 (12 architectural decisions resolved) · Closes: PRD-016

Why this is an RFC. The architecture binds every future audio episode and every route's player surface for years: TTS provider abstraction (so "swap providers" is an env var, not a rewrite), bundle / hosting strategy that survives the planned VPS docker-compose migration at v1.0, async generation that runs identically on Marko's M-series Mac and in GH Actions, and a translation pipeline that keeps 12-locale parity without leaking executable code into mobile builds (App Store §2.5.2). These are interlocking decisions; one wrong cut early forces ugly retrofits later.

1 · Architecture overview

                ┌──────────────────────┐
                │  content/episodes/   │  ← human-written markdown + SSML, en-US source
                │   en-US/{id}.md      │
                └──────────┬───────────┘
                           │
                           ▼
              ┌────────────────────────┐         ┌──────────────────┐
              │  Pipeline 1 (i18n)     │────────▶│ content/episodes/│
              │  Claude API            │         │   {locale}/{id}.md│
              │  SSML-safe translate   │         └──────────────────┘
              └────────────────────────┘                  │
                                                          ▼
                                              ┌─────────────────────┐
                                              │  Pipeline 2 (TTS)   │
                                              │  TtsProvider iface  │
                                              │  hash-keyed cache   │
                                              └──────────┬──────────┘
                                                         │
                                                         ▼
                                              ┌─────────────────────┐
                                              │  static/audio/...   │
                                              │   {locale}/{persona}/{id}.mp3│
                                              │   {locale}/{persona}/{id}.vtt│
                                              │   {locale}/{persona}/{id}.txt│
                                              └──────────┬──────────┘
                                                         │
                            ┌────────────────────────────┴────────────────────────────┐
                            ▼                                                         ▼
            ┌────────────────────────┐                            ┌────────────────────────┐
            │  Web build (GH Pages)  │                            │  Mobile (Capacitor)    │
            │  ALL locales bundled   │                            │  USER LOCALE bundled   │
            │  PWA SW caches         │                            │  Other locales lazy    │
            └────────────────────────┘                            └────────────────────────┘
                            │                                                         │
                            └────────────────────┬────────────────────────────────────┘
                                                 ▼
                                      ┌──────────────────────┐
                                      │  Browser runtime     │
                                      │  EpisodePlayer       │
                                      │  Overlay component   │
                                      └──────────────────────┘

Source of truth: human-edited markdown scripts (with SSML markup) under content/episodes/en-US/. Two pipelines (translation, TTS) produce all derived assets. Both pipelines run identically locally and in GH Actions. The runtime never sees a TTS API key.

2 · Episode taxonomy (rebuilt for 11 routes)

2.1 · Three voice personas

Persona	Role	Register	Length sweet spot	Routes
Curator	Institutional, Sagan-register, "we / humanity / on this small blue dot". Frames why Orrery exists; opens and closes the Full Tour.	Slow, weighty, declarative. Mid-low pitch. Long pauses earned, not decorated.	60–120 s segments	`/` + the Full Tour orchestration only
Guide	Personal docent — "look here, see this, watch what happens." First-person engaged. Models the visitor's own attention.	Conversational, warm, builds to clarity. Mid pitch. Punctuation breathes.	5–8 min screen episodes	All 11 routes (one per route)
Enthusiast	Technical-emotional. Equations as instruments. "The signal takes 14.5 seconds. Think about that." Numbers earn emotion through precision.	Brisk, specific, curious. Mid-high pitch. Numbers spoken with their unit.	90 s – 3 min object episodes	Object-level: each planet on `/explore`, each mission on `/missions`, each landing site on `/earth`/`/moon`/`/mars`, each module on `/iss`/`/tiangong`, each chapter on `/science`, each spacecraft on `/fleet`

The personas are internal editorial tools, not user-facing labels (per PRD-016 §will-not-have). The user just hears the right voice for the moment.

2.2 · Episode count per route (full hierarchy target)

Route	Guide (screen)	Enthusiast (object)	Total
`/`	1	0	1
`/explore`	1	8 (planets)	9
`/missions`	1	6 (marquee missions)	7
`/fly`	1	2 (porkchop, cislunar)	3
`/earth`	1	4 (landing sites)	5
`/moon`	1	6 (Apollo + Chang'e + Luna sites)	7
`/mars`	1	5 (rover sites)	6
`/iss`	1	4 (modules)	5
`/tiangong`	1	3 (modules)	4
`/science`	1	5 (chapters)	6
`/fleet`	1	4 (spacecraft families)	5
Subtotal Guide+Enthusiast	11	47	58
Curator Full Tour segments	—	—	8 (intro, transitions, close)
Total per locale			66 episodes

× 12 locales = 792 audio assets at full hierarchy. v1 priority cut (per PRD-016 §goal): 33 English episodes (Guide + Enthusiast English-only) + 8 Curator segments × 12 locales = ~129 audio assets ≈ 91 MB.

2.3 · The eight Atmospheric Moves

Editorial anchor — these are the moments the audio system exists for. Every persona / route / locale must land them with weight:

Signal delay. "When Curiosity radios home, you wait 14 minutes for the answer. Not because the radio is slow — because the universe is large." (Enthusiast, /mars Curiosity object)
Porkchop plot. The C-shape isn't decorative. Reading the contour is reading a year-by-year argument with the solar system. (Enthusiast, /fly porkchop object)
Pale blue dot. The Voyager image, the Sagan reading, restated for an Orrery user looking at the same scene. (Curator, / opening tour segment)
14.5-second one-way. Light-time. Why "real-time" control of a Mars rover is a category error. (Enthusiast, /mars Curiosity)
Capability ladder close. Apollo 11 to Artemis to Mars — what the missions list tells you about humans, not just spacecraft. (Curator, Full Tour close)
Cernan's last words. Apollo 17, the last footstep on the Moon, fifty-two years and counting. (Guide, /moon screen episode)
Far side. No human has seen the far side directly. The probes that have. (Guide, /moon)
Curiosity persistence. A robot driving slower than a baby crawls, alone on a planet, every day, for over a decade. (Enthusiast, /mars Curiosity)

These moves get authored first in en-US, reviewed before any other content, and used as voice-quality reference takes when the per-locale voice ID is being curated.

3 · TTS provider abstraction

3.1 · The interface

typescript

// scripts/audio/tts/provider.ts
export interface TtsProvider {
  readonly name: 'elevenlabs' | 'openai' | 'google' | 'azure' | 'coqui-local';

  generate(input: {
    ssml: string;              // SSML or plain text per provider capability
    voiceId: string;           // looked up from voices.json by (provider, locale, persona)
    locale: string;            // BCP-47, e.g. 'en-US'
    persona: 'curator' | 'guide' | 'enthusiast';
  }): Promise<{
    audio: Buffer;             // raw mp3 bytes
    captions: string;          // WebVTT
    transcript: string;        // plain text
    chars: number;             // for cost ledger
    cost_usd: number;          // computed by the provider impl
  }>;
}

Each provider implementation lives in scripts/audio/tts/{provider}.ts. The pipeline (§5) imports TtsProvider and selects the implementation by process.env.TTS_PROVIDER. Default elevenlabs.

3.2 · `voices.json` shape

jsonc

// static/data/audio/voices.json — committed, human-curated
{
  "elevenlabs": {
    "en-US": {
      "curator":    { "voiceId": "EXAVITQu4vr4xnSDxMaL", "model": "eleven_multilingual_v2" },
      "guide":      { "voiceId": "VR6AewLTigWG4xSOukaG", "model": "eleven_multilingual_v2" },
      "enthusiast": { "voiceId": "pNInz6obpgDQGcFmaJgB", "model": "eleven_multilingual_v2" }
    },
    "es-ES": { /* ... */ },
    "de-DE": { /* ... */ }
  },
  "openai": {
    "en-US": {
      "curator":    { "voice": "onyx", "model": "tts-1-hd" },
      "guide":      { "voice": "nova", "model": "tts-1-hd" },
      "enthusiast": { "voice": "shimmer", "model": "tts-1-hd" }
    }
  },
  "google": { /* ... */ },
  "azure":  { /* ... */ },
  "coqui-local": { /* speaker reference WAVs paths */ }
}

Adding a provider: implement the interface, add the provider's voice IDs under its key in voices.json, set TTS_PROVIDER=newprovider, run the pipeline. Switching providers mid-corpus: the hash-keyed cache (§5) keys on (provider, voiceId, ssml) so a provider swap re-generates only audio that uses voices from the new provider.

3.3 · What this earns us

Cost optionality. If ElevenLabs's billing surprises us, we re-run the priority cut on OpenAI ($21 vs $300+) without touching pipeline code.
Locale optionality. ElevenLabs covers ~32 locales; if a future locale isn't supported, we fall back to Google Cloud TTS for that one locale only (mixed-provider corpus is fine — voices.json supports it).
Failure mode. If ElevenLabs has an outage during a pipeline run, retry with Google or Azure for the affected episodes. Pipeline restart is idempotent.
Self-hosted future. Coqui XTTS-v2 (local) is in the matrix; if v1.x privacy requirements push toward "no SaaS for narration," the Coqui provider is the destination.

4 · Cost analysis & provider selection

4.1 · Char-count assumptions

Average screen episode (Guide): 5–8 min ≈ 8000 chars (incl. SSML markup)
Average object episode (Enthusiast): 90 s – 3 min ≈ 3000 chars
Average Curator segment: 60–120 s ≈ 2500 chars

Full hierarchy per locale: 11 Guide + 47 Enthusiast + 8 Curator = 11 × 8000 + 47 × 3000 + 8 × 2500 = 88 000 + 141 000 + 20 000 = 249 000 chars per locale.

Full corpus, 12 locales: ≈ 3.0 M chars.

v1 priority cut (English full + Curator × 12): 249 000 (en) + 8 × 2500 × 11 other locales = 249 000 + 220 000 = 469 000 chars total.

4.2 · Provider matrix

Provider	Pricing	Free tier	Locale coverage	Voice quality	v1 cost	Full corpus cost	Strategic note
ElevenLabs	$0.30 / 1k chars on Pro tier ($99/mo, 500k chars included)	Starter: 10k chars/mo (no commercial use)	~32 locales incl. all 12 of ours	Best in class for prosody, emotion. Voice cloning available.	~$140 (1 Pro month covers v1 cut comfortably; ~half spent)	~$900 (or ~3 Pro months at $99 = $297 if amortised)	Anchor for v1. The "museum" target benchmark.
OpenAI TTS	$15 / 1M chars (`tts-1`); $30/1M (`tts-1-hd`)	$5 trial credit (one-time)	~50 locales	Decent, less expressive. 6 built-in voices. No cloning.	~$7 (`tts-1`) / $14 (`tts-1-hd`)	~$45 (`tts-1`) / $90 (`tts-1-hd`)	Cheapest at scale. Escape hatch if ElevenLabs costs spike.
Google Cloud TTS	$4/M (Standard), $16/M (WaveNet/Neural2)	1M chars/month free for WaveNet/Neural2 for 12 months; 4M/month free Standard	50+ locales	Strong (Studio voices excellent for narration).	$0 if spread across 1 month (469k < 1M free)	~$32 raw or ~$0 if spread over 3 months (1M free × 3)	Best free-tier for v1 if we accept Google's voices.
Azure Neural TTS	$16/M (Neural)	0.5M chars/month free for first 12 months	60+ locales	Comparable to Google Neural2.	~$0 (469k < 0.5M free)	~$32 raw or ~$8 if spread over 6 months (0.5M free × 6)	Strong all-rounder. Good for the long tail of locales.
Coqui XTTS-v2 (local)	$0 marginal	unlimited (local compute)	~16 native, more via voice cloning	Mid-tier raw, excellent for cloning a signature voice.	$0 (M-series Mac runs it)	$0	Future signature-voice path. Slower iteration; weaker prosody than ElevenLabs today.

4.3 · Recommended provider sequencing (optionality-preserving)

v1 (priority cut, ~469k chars): start on Google Cloud TTS Neural2 free tier (~$0). Bumps a knob that costs nothing while we validate the rest of the pipeline (asset hosting, player UX, captions). Editorial quality is a good 80% of ElevenLabs.
v1.0 ship — switch to ElevenLabs for the en-US Atmospheric Moves only. Re-generate the 8 anchor episodes that carry the museum-grade tone. ~50k chars at ElevenLabs ≈ $15. The other 88% of the corpus stays on Google's free tier.
v1.x as cost permits — graduate more episodes to ElevenLabs by editorial priority. Voice ID registry is per-provider; mixed corpus is the design.
v2 / signature voice — Coqui XTTS-v2 + ElevenLabs voice cloning if Marko wants to clone a specific voice as the project's "house voice."

This sequence keeps v1 cost effectively zero, ElevenLabs spend bounded to where it editorially matters, and the door open to swap entirely if any provider's pricing or terms change.

4.4 · Cost ledger

static/data/audio/cost-ledger.json records every TTS call:

jsonc

{
  "version": 1,
  "entries": [
    {
      "ts": "2026-05-20T14:32:11Z",
      "provider": "elevenlabs",
      "locale": "en-US",
      "persona": "curator",
      "episode_id": "tour-open",
      "chars": 2412,
      "cost_usd": 0.7236,
      "voice_id": "EXAVITQu4vr4xnSDxMaL"
    }
  ],
  "monthly_totals": { "2026-05": { "elevenlabs": 12.45, "google": 0 } }
}

Pipeline appends; CI guards thresholds (PRD-016 OQ7 — default $25/mo soft warn, $100 hard halt).

5 · Async generation pipeline

5.1 · Two pipelines, never mixed

Pipeline 1 — Translation. Reads content/episodes/en-US/{id}.md, calls Claude API to translate to each target locale preserving SSML markup, writes content/episodes/{locale}/{id}.md. Validated for SSML integrity, equation/number presence, ±20 % length tolerance. Translation is the slow + expensive step (Claude API costs); the cache-key is the SHA-256 of the en-US source.

Pipeline 2 — Audio generation. Reads content/episodes/{locale}/{id}.md, looks up (provider, locale, persona) → voiceId in voices.json, calls TtsProvider.generate(), writes static/audio/{locale}/{persona}/{id}.{hash8}.mp3 + .vtt + .txt. Cache-key is SHA-256(provider + voiceId + ssml); identical input never re-generates.

5.2 · Local vs GH Actions — same script, different triggers

bash

# Local — Marko iterates on a script
npm run audio:translate -- --episode tour-open                 # Pipeline 1, one episode, all locales
npm run audio:generate  -- --episode tour-open --locales en-US # Pipeline 2, one episode, one locale (fast)
npm run audio:generate  -- --episode tour-open                 # all locales

# Local — full rebuild
npm run audio:build                                            # both pipelines, full corpus, hash-cache hits skip

# CI — GitHub Actions on script-PR merge
# .github/workflows/audio.yml runs the same scripts; provider creds in repo secrets;
# diffed scripts only re-trigger affected episodes via the same hash-cache.

Cost split:

Local: Marko's machine, Claude API for translation (paid), TTS provider per voices.json (paid or free per §4).
GH Actions: 2000 free minutes/month on free tier; a full audio rebuild fits in ~30 min if cache is warm. Translation API + TTS API costs the same as local.

When to use which:

Editing one script + iterating: local, generate one locale, listen, iterate.
Adding a locale or graduating to a new provider: GH Actions, automated, one PR.
Translating one new episode across all 12 locales: either; local is faster start-to-finish.

5.3 · Re-translation triggers (PRD-016 OQ10)

When content/episodes/en-US/{id}.md changes:

Default behaviour: the Claude-based translator re-translates the entire episode's content for all 12 target locales. Cost ≈ 8000 chars × 12 = 96k chars Claude tokens per episode revision.
Optional optimisation (v1.1): paragraph-level diff. Only paragraphs that changed get re-translated; unchanged paragraphs are reused from the cache. Implementation cost: paragraph IDs in SSML, paragraph-keyed cache. Defer until episode revision frequency justifies it.

5.4 · Pipeline integration with `validate-data`

scripts/validate-data.ts (the fail-closed gate) gains two new checks at v0.9 ship:

For every script in content/episodes/en-US/, the matching static/audio/en-US/{persona}/{id}.*.mp3 must exist (otherwise fail).
For every shipped audio asset, a row in the image-provenance.json equivalent (audio-provenance.json) must record provider, voice ID, generation timestamp, and char count (per ADR-046/047 spirit, extended to audio).

6 · Audio asset hosting (host-agnostic)

6.1 · Path layout

static/audio/
├── en-US/
│   ├── curator/
│   │   ├── tour-open.a3f1c2d8.mp3
│   │   ├── tour-open.a3f1c2d8.vtt
│   │   └── tour-open.a3f1c2d8.txt
│   ├── guide/
│   │   ├── explore.{hash}.mp3
│   │   └── ...
│   └── enthusiast/
│       ├── explore-saturn.{hash}.mp3
│       └── ...
├── es-ES/
│   └── ...
└── audio-provenance.json

Hash in the filename ({episode-id}.{hash8}.mp3) enables aggressive caching (PRD-016 M13). Filename change on script revision invalidates the SW cache automatically.

6.2 · Web hosting

v1 (current): GH Pages serves static/audio/ alongside the rest of the build. ~91 MB v1 audio + 355 MB existing → ~445 MB repo size, well under GH Pages 1 GB soft limit. Headroom for ~500 MB further audio growth.

v1.0 product milestone (Marko's planned VPS docker-compose migration): hosting moves to the VPS. The audio path layout is unchanged; the build's audio URLs are relative (/audio/{locale}/{persona}/{id}.mp3), so the move is a config change, not a code change. PWA SW cache rules don't change.

Trigger for this hosting decision is product-milestone-driven (Marko's v1.0 plan), NOT audio-corpus-size-driven. We do not need a CDN for this work.

6.3 · Mobile (Capacitor) hosting

Per PRD-015 / RFC-018 §4: the mobile build bundles only the user's primary locale of audio. Other locales lazy-fetch from chipi.github.io (or the future VPS) on locale switch and cache via Capacitor's native cache (NOT the PWA SW, which is disabled under Capacitor for v1.0 per RFC-018 §8).

typescript

// scripts/build-mobile-audio.ts — runs as part of MOBILE=1 build
// Determines target locale from CAPACITOR_LOCALE env (default: en-US)
// Copies only static/audio/{locale}/* into the Capacitor sync directory
// Other locales remain referenced by URL only; runtime fetches when needed

Mobile audio bundle add: ~67 MB (full hierarchy in user's locale). Total Capacitor install: ~85 + 67 = ~152 MB, slightly over PRD-018 M11 ceiling (150 MB) — accept the 2 MB overage with a comment, or shave 2 MB elsewhere (likely fleet-thumbnail format change).

7 · Player UX — overlay component

7.1 · Layout

Desktop (≥ 800 px): right-side panel, 360 px wide, slides in from edge. Contains: persona-implicit header (no persona label), waveform visualiser (CSS-only animated bars per RFC-019 OQ — no Web Audio API analysis), now-playing title + duration + scrubber, transport controls (play/pause, skip, speed), caption toggle, transcript download link, episode inventory (collapsible, "for this screen" / "all episodes" tabs).

Mobile (< 800 px): bottom-sheet, 60 % viewport height, drag-handle to expand to 90 %. Same content; transport controls promoted to top of sheet for thumb reach.

7.2 · Trigger

A waveform icon (〜 glyph) in Nav.svelte, between the home link and the locale switcher. Tap opens the overlay; while playing, the icon shows a discrete pulse (PRD-016 S3).

7.3 · Autoplay policy

No autoplay. Browser autoplay restrictions + the museum-grade goal both demand a user gesture. The first-visit-to-screen toast (PRD-016 S6) is a non-modal nudge, not autoplay.

7.4 · Locale-switch behaviour

While an episode is playing, a locale change (URL ?lang= swap or LocaleSwitcher click) restarts the current episode in the new locale at the proportionally-matched timestamp (PRD-016 US-5). Implementation: keep (episodeId, normalisedProgress) in the player; on locale change, look up the new locale's episode metadata, seek to normalisedProgress * newDuration. Best-effort match — translations vary in length.

7.5 · Captions

WebVTT inline. Toggle defaults: ON if ANY of —

window.matchMedia('(prefers-reduced-motion: reduce)').matches, OR
screen-reader detected (navigator.userAgent heuristics + ARIA live region presence), OR
audioElement.muted === true at episode start, OR
navigator.connection?.effectiveType indicates < 1 Mbps (Chromium-only API; non-supporting browsers fall back to the other 3 signals).

User toggle persists for the session only (in-memory; no localStorage).

7.6 · Heard-state

In-memory Set<episodeId>. Lost on reload. Used for:

Showing a discrete "✓" next to episodes the user has played to ≥ 80 % completion in the inventory list.
Surfacing "next unheard" in the Full Tour playlist.
v1 does NOT persist this. Per ADR-057 + PRD-016 M8.

?audio={episode-id} URL parameter. The +layout.svelte URL-effect (existing canonicalisation pattern) reads it, navigates to the episode's home route if not already there, opens the overlay, autoplays the episode (by URL gesture, which counts as user-initiated for autoplay purposes). Works on web; Capacitor's deep-link handler (orrery://?audio=tour-open, RFC-018 §7) routes through the same code path.

8 · Internationalisation

8.1 · Catalog of supported locales

Inherits from ADR-031/032/033: 12 locales — en-US, en-GB, es-ES, fr-FR, de-DE, it-IT, pt-BR, ja-JP, zh-CN, ko-KR, ru-RU, sr-Cyrl, hi-IN. (List subject to the project's actual localeFromPage registry; verify against src/lib/locale.ts at implementation time.)

8.2 · Phase gating per locale

Per PRD-016 §goal phasing:

v1: en-US full hierarchy + Curator Full Tour in all 12 locales
v1.1: Guide-level in en, es, fr, de, it, pt, ja
v1.2: Enthusiast object-level in en, es, fr, de, it, pt, ja
v1.3: zh, ko, ru, sr-Cyrl, hi at all levels

A locale is "shipped" when 100 % of its phase-tier audio assets exist + have a curated voice ID + have passed the per-locale voice-quality review (one Atmospheric Moves segment listened to end-to-end by a fluent reviewer).

8.3 · Voice curation per locale per persona

Three voice IDs per locale × 12 locales = 36 voice IDs to curate. ElevenLabs's library is the starting point; if a locale's ElevenLabs voices don't carry the right register, that locale's voice mapping switches to Google Cloud Studio voices in voices.json (mixed-provider corpus, designed-for, §3.3).

8.4 · SSML safety in translation

The Claude-API translator (Pipeline 1) sees the SSML-augmented script as input. It must:

Preserve every <break>, <emphasis>, <say-as> tag verbatim (validated post-translation).
Translate the natural-language text inside tags but never the tag attributes.
Pass through equation placeholders (<say-as interpret-as="characters">14.5</say-as>) unchanged in source-language form (numbers stay as-is; the TTS provider voices them per locale).

Validation script in Pipeline 1 enforces these by AST-comparison of source vs translated SSML.

9 · Failure modes + handling

Failure	Detection	Handling
TTS provider API outage during pipeline run	HTTP 5xx from `TtsProvider.generate()`	Retry with exponential backoff (3 attempts); on final failure, log + skip + mark in cost ledger as `failed`; pipeline continues for other episodes
TTS provider rate-limit hit	HTTP 429	Sleep per `Retry-After` header, resume; if no header, 60 s sleep
Free-tier quota exhausted	Provider returns specific error code	Halt pipeline for that provider; ledger flags the threshold breach; operator decides (switch provider for the rest, pay, or ship without remaining episodes)
Translation diverges in length > 20 % from source	Pipeline 1 validation step	Fail the translation; flag for manual review (script may have ambiguous phrasing)
Claude API translation produces invalid SSML	AST validation in Pipeline 1	Auto-retry with stricter prompt; second failure → manual review
Audio asset missing at `validate-data` time	Existence check in §5.4	Fail-closed: validate-data exits non-zero; commit blocks
User on mobile switches to a non-bundled locale, no network	`fetch()` reject in audio-load path	Player shows "audio not available offline in this language"; falls back to caption-only playback if VTT was bundled (it isn't by default — open question)

10 · Resolved decisions + remaining follow-ups

All 12 v1 architectural questions resolved 2026-05-16:

TTS provider — RESOLVED: ElevenLabs anchor + TtsProvider abstraction so swap to OpenAI / Google / Azure / Coqui-local is an env var. §3 + §4.3.
v1 provider sequencing — RESOLVED: Hybrid. Google Cloud TTS (free tier) for the bulk of the corpus; ElevenLabs for the 8 Atmospheric Moves anchor episodes only. v1 cost ≈ $15 total. Mixed-provider via voices.json is the design.
Audio hosting — RESOLVED: GH Pages now (static/audio/). Migration to VPS docker-compose at v1.0 product milestone; hosting layer is host-agnostic so the move is config-only.
Mobile audio — RESOLVED: Bundle user's locale of audio + VTT captions only (~68 MB add). Other locales lazy-fetch.
Async generation — RESOLVED: Same script, same cache, runs locally + GH Actions. Local for iteration; CI for completeness.
localStorage for heard-state — RESOLVED: NO. In-memory only per ADR-057. Revisit single-cookie bitset in v1.x if data justifies.
VTT bundling on mobile — RESOLVED: YES. Captions bundled alongside the user's locale of audio (~1 MB add). Accessibility parity with web; covers Audio.muted + airplane-mode users too.
Cost-ledger thresholds — RESOLVED: $50/mo soft warn, $200/mo hard halt. Looser than initial $25/$100 recommendation; gives headroom for one-shot rebuilds during iteration.
Curator Full Tour ordering — RESOLVED: Documentary order (not nav order). Curator opens (pale-blue-dot register) → Solar System big picture → closer to home (Earth, Moon) → missions sent → people in space (ISS, Tiangong) → Mars + future → Curator close.
Per-locale voice review — RESOLVED: Defer non-en review until v1.1. v1 ships audio in all 12 locales; non-en locales carry a "beta" UI flag in the overlay header. Reviewers recruited in v1.1.
Voice persona surfacing in UI — RESOLVED: Implicit. No badge, no Curator/Guide/Enthusiast label. User just hears the right voice for the moment.
Re-translation strategy — RESOLVED: Full episode re-translate on source change (§5.3). Cost ≈ $0.50 / revision. Paragraph-diff optimisation deferred to v1.1.
Caption auto-on triggers — RESOLVED: ALL FOUR signals — prefers-reduced-motion, screen-reader detected, Audio.muted == true, AND navigator.connection.effectiveType indicating < 1 Mbps. §7.5.

Remaining follow-ups (operational, not architectural):

"Beta" UI flag for non-en locales. Visual treatment + tooltip copy; polish at implementation. Suggested copy: "Voice quality reviewed in en-US only; other locales pending v1.1 review."
Music bed (v2 candidate). Not in v1. Re-open as a separate PRD if v1 ship surfaces "the silence between segments feels empty."
Slow-connection caption-on detection — navigator.connection browser support. Limited (Chromium-only as of 2026). Accept best-effort behaviour; non-supporting browsers fall back to the other 3 signals only.

RFC-019 · Orrery · Science Overlay & Episode System · Drafted 2026-05-16 · Closes-into-PRD-016

RFC-019 · Science Overlay & Episode System — architecture, pipeline, cost analysis ​

1 · Architecture overview ​

2 · Episode taxonomy (rebuilt for 11 routes) ​

2.1 · Three voice personas ​

2.2 · Episode count per route (full hierarchy target) ​

2.3 · The eight Atmospheric Moves ​

3 · TTS provider abstraction ​

3.1 · The interface ​

3.2 · voices.json shape ​

3.3 · What this earns us ​

4 · Cost analysis & provider selection ​

4.1 · Char-count assumptions ​

4.2 · Provider matrix ​

4.3 · Recommended provider sequencing (optionality-preserving) ​

4.4 · Cost ledger ​

5 · Async generation pipeline ​

5.1 · Two pipelines, never mixed ​

5.2 · Local vs GH Actions — same script, different triggers ​

5.3 · Re-translation triggers (PRD-016 OQ10) ​

5.4 · Pipeline integration with validate-data ​

6 · Audio asset hosting (host-agnostic) ​

6.1 · Path layout ​

6.2 · Web hosting ​

6.3 · Mobile (Capacitor) hosting ​

7 · Player UX — overlay component ​

7.1 · Layout ​

7.2 · Trigger ​

7.3 · Autoplay policy ​

7.4 · Locale-switch behaviour ​

7.5 · Captions ​

7.6 · Heard-state ​

7.7 · Share-link ​

8 · Internationalisation ​

8.1 · Catalog of supported locales ​

8.2 · Phase gating per locale ​

8.3 · Voice curation per locale per persona ​

8.4 · SSML safety in translation ​

9 · Failure modes + handling ​

10 · Resolved decisions + remaining follow-ups ​