PRD-016 · Science Overlay & Episode System (audio)

Status · Draft v0.5 (v0.7 shipped en-US; tour iteration v2 in flight in v0.7 per S7..S10 + ADR-075) Date · 2026-06-03 (v0.5 amendment); 2026-05-16 (v0.4) Owner · Marko Closes into · RFC-019 Slice gate · v0.7 (PRD-014 surface-hotspots shipped; PRD-015 mobile wrapper follows in v0.8 and consumes this system, not the reverse)

Why this is a PRD. A persistent narrated audio layer turns Orrery from a visual reference into an editorial experience — closer to a museum than a documentation site. The decision touches every screen's UI, the build pipeline, the asset budget (~97 MB of v0.7 en-US audio in the GH Pages build; full 12-locale corpus would reach ~800 MB at v1.3), the runtime cost (paid TTS providers, free-tier accounting), and the per-locale content authoring effort (12 locales × 3 voice personas × 33 episodes = 1188 audio assets at full corpus). It needs a product gate before any audio asset, player component, or TTS API key lands.

§why

The original orrery — the 18th-century brass instrument the product is named after — was an object with sound. Brass gears, wood, the click of position. Orrery the web app has rebuilt the visual language of that instrument with planet textures, orbits, mission arcs, and 3D landing sites. The thing it does not have is a voice.

Museums solved this problem fifty years ago: the audio guide. A voice that doesn't replace the artefact but accompanies it — telling you what you're looking at, why it matters, what the silent visual leaves out. Done well (think the Hayden Planetarium, the Cité de l'espace, the Mudam wall texts), the audio doesn't compete with the visual; it deepens it. The visitor's attention stays on the object; the voice supplies the editorial layer.

Orrery has the editorial scope but not the editorial layer. A first-time visitor lands on /explore, sees Saturn rendered to scale, and reads HUD numbers. They learn nothing about scale, nothing about why Saturn matters, nothing about how we got there. A narrator could tell them — could spend ninety seconds on the absurd ratio of Saturn's volume to the spacecraft that crossed it, and turn the visit from "an interactive but" into a moment they remember.

This PRD scopes that layer. It is one feature, not eleven (one per screen): a single audio system, three voiced personalities, three depths of content, all 12 supported locales, generated at build time, served as static files, and presented through a single overlay component that is the same on every screen.

§audiences

Audience	Why audio helps them
Curious learner (most-served audience)	Doesn't read long-form. A 90-second narrated episode while watching the planets move is the format they already consume on YouTube and podcasts.
Space enthusiast	Already knows the visuals; comes for the editorial framing — the story around an artefact. Audio is the editorial layer.
First-time visitor	The landing-page autoplay-blocked silent video problem solves itself: there's a play button on the home screen that does something audible and tells them what they're looking at.
Educator / journalist	Cites a specific narration moment ("Orrery's Cernan-last-words segment frames Apollo 17 as…") as a reference point. Audio is shareable as a per-episode link.
Vision-impaired user	The narration is the alt-text-at-scale: every visual moment in `/explore`, `/fly`, `/earth`, `/moon`, `/mars`, `/iss`, `/tiangong` has a spoken counterpart. Not a substitute for proper accessibility (ADR-025), but a layer that helps.

§what's already shipped (audio-readiness inventory)

Before scoping the work, the parts that are already in place:

Capability	Status	Source
12-locale Paraglide message bundle	shipped (v0.4–v0.5)	`src/lib/paraglide/messages/*.json`, ADR-031/032/033
12 primary nav routes (incl. 7 with 3D scenes) — incl. `/plan` carrying the porkchop plot	shipped (v0.6)	`src/routes/+layout.svelte`, TA.md route inventory
PWA service worker (`registerType: 'autoUpdate'`)	shipped (v0.5.x)	`vite.config.ts`, ADR-029
Cookie-based locale persistence (`orrery_locale`)	shipped	ADR-057 (only persistent storage allowed; localStorage forbidden)
Capacitor mobile wrapper (Android-first)	planned (PRD-015 / RFC-018, v0.8)	—
Bundle-slimming machinery (`MOBILE=1` env, lazy locale chunks)	planned (RFC-018 §4, v0.8)	—
External-link click delegation (`src/routes/+layout.svelte`)	shipped (v0.6)	will be reused for the audio overlay's "more about this" links
Surface hotspots (4-tier LOD on /moon, /mars; orbital mosaics; ground panoramas)	shipped (v0.7)	PRD-014 / RFC-017 — stable visual targets for landing-site narration scripts

What this means: the infrastructure for shipping a 12-locale audio system on web is in place today (v0.7); the Capacitor mobile wrapper (v0.8) will lazy-load the user's primary locale and is a downstream consumer, not a prerequisite.

§goal

Ship a persistent audio episode system across all 12 routes, generated build-time from human-edited markdown scripts, narrated by 3 voice personas, presented through one overlay component that is identical on every screen.

v0.7 ships en-US only. Translation pipeline (S2) + 12-locale Curator Tour (S7) deferred to v0.8 after the test ride validated that en-US delivers standalone editorial value.

Phasing (v1 = "first audio ship"; v1.x adds depth):

Phase	Audio scope	Audio MB added	Repo total
v1 (v0.7)	en-US only — 33 logical episodes × 2 providers (Google + ElevenLabs A/B) = 66 audio assets	~97 MB	~452 MB
v1.1 (v0.8)	+ Curator Full Tour × 12 locales (8 segments × 12 = 96 assets via Anthropic translation pipeline)	+~50 MB	~502 MB
v1.2	Guide-level episodes in priority locales (en, es, fr, de, it, pt, ja) — 12 routes × 7 locales	+~140 MB	~640 MB
v1.3	Enthusiast object-level episodes in priority locales	+~120 MB	~760 MB
v1.4	Long-tail locales (zh, ko, ru, sr-Cyrl, hi) at all levels	+~95 MB	~855 MB
v2	VPS docker-compose migration (planned for v1.0 of overall product per Marko 2026-05-16) absorbs whatever audio corpus has accumulated; CDN trigger no longer GH-Pages-bound	—	—

Mobile (Capacitor) ships only the user's locale of audio, ~67 MB add to the ~85 MB Capacitor budget per RFC-018 §4. Other locales lazy-fetch from chipi.github.io on locale switch.

§user-stories

US-1 — Screen narration (Guide voice). Visitor opens /explore, taps the waveform icon in the nav, and a 5–8 minute episode plays explaining what they're looking at — the scale, the time, the planets in their orbits — while the scene continues to render. The narration does not interrupt camera controls or planet hover. The user can pause, scrub, change speed, and switch locale without losing position.

US-2 — Object narration (Enthusiast voice). Visitor selects Mars on /explore, taps "More about this object" inside the planet detail panel, and a 90-second episode plays specifically about Mars — its orbital eccentricity, why missions launch in 26-month windows, the 14.5-second one-way signal delay at maximum range. Numbers are spoken with their unit; equations are voiced as their physical meaning, not as letters.

US-3 — Full Tour (Curator playlist). Visitor on / taps "Take the tour" and gets a ~90 minute documentary-order playlist that walks them through the whole product. The Curator voice acts as a docent: introduces each section, hands off to Guide for screen-level narration, hands off to Enthusiast for object-level deep-dives, and closes with a Sagan-register epilogue. Resumable, scrubbable, exits cleanly to free navigation if interrupted.

US-4 — Episode inventory. From the audio overlay, the user can see "all the audio for this screen" plus "all episodes across the product, grouped by route" plus "what I've already heard". The heard-tracking is in-memory only (lost on reload — ADR-057 forbids localStorage); a future v1.1 may opt-in a single-cookie heard-bitset if usage data justifies the persistence cost.

US-5 — i18n parity. Every voiced episode that exists in en-US must exist in all 12 supported locales by the time that locale tier ships (per the phasing table). A locale-switch event (?lang=es URL change or LocaleSwitcher click) restarts the currently-playing episode in the new locale at the matching timestamp (best-effort match, not bit-perfect).

US-6 — Offline playback. On the web, the PWA service worker caches whatever audio the user has played (per ADR-029, autoUpdate semantics). On mobile (Capacitor), the user's primary locale is bundled at install time; airline / no-signal use cases play the full episode set without network. Switching to a non-bundled locale on mobile fetches that locale's assets and caches them for offline replay.

US-7 — Caption and transcript surface. Every audio episode has a synced caption track (WebVTT) and a downloadable plain-text transcript in the playing locale. Caption rendering happens inside the overlay; transcript downloads as .txt. Required for accessibility (ADR-025) and for users who prefer reading.

§must-have requirements

ID	Requirement
M1	Single audio overlay component, presented as right-panel on desktop and bottom-sheet on mobile (`<800 px viewport`). Same component, two layouts. Triggered by waveform icon in `Nav.svelte`.
M2	Three voice personas (Curator / Guide / Enthusiast) implemented as separate voice IDs per locale, curated in `static/data/audio/voices.json`. Per-locale voice testing required before that locale ships.
M3	Episode taxonomy covers all 12 routes (not just 6 as in the original draft): `/`, `/explore`, `/missions`, `/fly`, `/plan`, `/earth`, `/moon`, `/mars`, `/iss`, `/tiangong`, `/science`, `/fleet`. Each route gets at least one Guide-level screen episode in en-US for v1; full per-locale rollout per phasing table. The porkchop plot lives on `/plan` (not `/fly`), so its Enthusiast episode anchors there.
M4	Build-time-only TTS generation. No runtime API calls to TTS providers from the browser. All audio is `.mp3` files served as static assets. Provider credentials never reach the client.
M5	TTS provider abstraction (`TtsProvider` interface in RFC-019 §provider-abstraction). v1 ships with ElevenLabs as the anchor; the system can swap to OpenAI / Google Cloud / Azure / Coqui-local with an env-var change + new voice IDs in `voices.json`. No pipeline rewrite.
M6	Audio asset packaging: MP3 96 kbps mono (Opus 32 kbps where browser support is universal — likely v1.1+). Target average episode size ~2 MB / 5–8 min for screen episodes, ~600 KB / 90 s for object episodes.
M7	All audio assets live under `static/audio/{locale}/{persona}/{episode-id}.mp3`. Web build serves them as static files; Capacitor sync includes only the user's locale (RFC-018 §4 lazy-loading pattern). Hosting is host-agnostic — same paths work on GH Pages today, on the planned VPS docker-compose at v1.0.
M8	Heard-state tracking is in-memory only, runtime-scoped (lost on reload). ADR-057 forbids localStorage; persistent storage allowed is `orrery_locale` cookie (ADR-057) and `orrery_tour` cookie for Curator Tour resume only (ADR-075). Per-episode heard-state is NOT persisted; only the active-tour resume point is. v1.x may opt in a single-cookie heard-bitset under its own ADR if data justifies it.
M9	Caption tracks (WebVTT) generated alongside every audio asset, served from `static/audio/{locale}/{persona}/{episode-id}.vtt`. Captions render inside the overlay; toggle defaults to on when ANY of: `prefers-reduced-motion` set, screen-reader detected, `Audio.muted == true`, OR effective bandwidth < 1 Mbps (`navigator.connection.effectiveType` heuristic, best-effort).
M10	Transcripts (plain text) downloadable from the overlay; lives at `static/audio/{locale}/{persona}/{episode-id}.txt`.
M11	Async generation pipeline runs both locally (Marko's machine, manual `npm run audio:build`) and in GitHub Actions (free-tier minutes, automated on script-PR merge). Same script, same outputs, same cache keys. RFC-019 §async-generation.
M12	Cost telemetry: every TTS API call records (provider, locale, persona, char count, $cost) into `static/data/audio/cost-ledger.json`. Free-tier accounting per provider is the ledger's job, not the pipeline's.
M13	Service-worker cache strategy: audio fetched once, cached forever. Cache invalidation by content hash in the asset URL (`{episode-id}.{hash8}.mp3`).
M14	The waveform icon's behaviour adapts to device input: on touch devices, opens the overlay as bottom-sheet; on desktop with keyboard focus, opens as right-panel and traps focus inside.
M15	Episode-share-link: `/?audio=explore-guide-en-US` deep-links to "open `/explore`, autoplay this episode" — works on web and through Capacitor's deep-link handler (RFC-018 §7).

§should-have requirements

ID	Requirement
S1	Speed control (0.75x / 1x / 1.25x / 1.5x) inside the overlay.
S2	"Continue where I left off" — purely runtime, not persisted; if the user reloads, position resets.
S3	Visual cue on the screen during playback: the route's HUD shows a discrete pulse-bar so the user remembers the audio is on.
S4	Locale-switch mid-playback restarts the current episode in the new locale at the proportionally-matched timestamp (best-effort).
S5	"Skip to next episode" + "skip to previous episode" within Full Tour playlist.
S6	Per-screen autoplay-prompt: first time the user lands on `/explore` (or `/fly`, etc.) the overlay shows a non-modal "1-tap to listen to this screen's episode" toast. Dismissed → never shown again that session.
S7	Tour timer. Tour bar surfaces total tour duration + elapsed + remaining, as `mm:ss / h:mm:ss · mm:ss left`. Total = sum of `durationSec` across `tourSequence` ids in the registry. Elapsed = sum of durations for episodes `0..tourIndex-1` + current `positionSec`. Pure derived state; no storage. Visible only while `tourActive`.
S8	Compact tour mode. Overlay header gains a minimize toggle (▭ → __) next to close (×). Compact state collapses the overlay to a thin pill: desktop ~64 × 280 px bottom-right; mobile ~56 px full-width bottom. Contents: episode title (truncated, 1 line) · play-pause · `N/21` position · elapsed clock (S7) · ↑ expand · × stop. Captions banner (if CC on) floats above the bar at viewport level. State: `audio.compact = $state(false)`. Compact flag included in the resume payload (S9).
S9	Cross-session tour resume. `orrery_tour` cookie per ADR-075. Throttled write every ~5 s during playback + explicit writes on pause, advance, overlay close, compact toggle. On next visit: cookie present + valid → show "Resume tour?" prompt in overlay header (one click, not autoplay — explicit gesture also satisfies browser audio-policy). Stop-tour button + tour-end clear the cookie. Schema-validation failure on read silently clears the cookie.
S10	Transcripts index page at `/library/episodes`. Reads `audio-provenance.json`; renders one row per episode with title · persona · anchored route · duration · Read transcript link (existing `.txt` file, no new asset needed) · Sources column. Sources fed by a new sidecar `content/episodes/sources.json` (schema mirrors `text-sources.json` pattern from Pipeline 5; avoids touching the SSML frontmatter parser). Sidecar starts empty; editorial fill happens incrementally per-episode under a follow-up sub-issue. Renders publisher logos via existing `source-logos.json` like `/credits` already does.
S11	Stage authoring — convert text-only cues to a proper guided tour. The `EPISODE_STAGES` infrastructure already supports `cue` / `flash` / `scroll-to` / `click` / `open-tab`; the v0.7 corpus uses 100 % `cue` (83 of 83 stages). S11 expands the 21 tour episodes to mixed-action sequences so the listener sees what the narrator is talking about — sections scroll into view, target elements pulse, illustrative panels open. Requires stable DOM hooks via `data-audio-stage="..."` attributes across the relevant route pages. Ships per-episode incrementally: pilot one episode end-to-end (showroom + visual-anchor approval per `feedback_visual_anchor_before_ux_commit`), then schedule the corpus fill. Per RFC-019 §12.

§will-not-have (v1)

Real-time TTS. No browser-side voice generation. All audio is pre-generated.
User-recorded audio / community contributions. Editorial control stays in-house.
Music bed / ambient layer. A v2 candidate; out of v1 scope to keep ship date honest.
Push notifications for "new episode". PWA push is out of project scope (CLAUDE.md).
Persistent heard-state (per-episode). Per ADR-057 + ADR-075 storage rules; only Curator Tour active-position resume is persisted (S9); per-episode heard-state stays runtime-only. Revisit in v1.x under its own ADR if usage data justifies it.
Voice persona names surfaced in UI. The personas (Curator/Guide/Enthusiast) are an internal editorial tool, not a user-facing taxonomy. The user just hears "the right voice for this moment."
iOS-only voice cloning of a "signature Orrery voice". Optional v2 candidate via Coqui-local + ElevenLabs voice cloning; out of v1.

§success-criteria

Editorial:

A first-time visitor on / who taps "Take the tour" stays for ≥ 10 minutes (proxy for the museum-grade atmosphere goal).
Each of the 8 "Atmospheric Moves" identified in RFC-019 (signal-delay, porkchop, pale blue dot, 14.5-second delay, capability ladder, Cernan's last words, far side, Curiosity persistence) lands as a recognisable beat — verified by Marko + 3 reviewer listens before the locale ships.

Technical: 3. v0.7 web build stays under 500 MB total repo size (current ~355 MB + 97 MB en-US audio + headroom). v0.8 adds the 12-locale Curator Tour (~50 MB) on top. 4. v1 mobile (Capacitor) bundle stays under ~150 MB installed (RFC-018 M11 ceiling), with the user's locale of audio bundled. 5. Cold-start to first-audible-narration < 1.5 s on 4G (target 800 ms cached). 6. PWA service-worker hit-rate for audio replays > 90 % (audio plays once, cached forever).

Operational: 7. Async generation pipeline completes a full English-locale rebuild (~33 episodes ≈ 264k chars) in < 15 min on Marko's M-series Mac, < 30 min in GH Actions. 8. TTS cost for v1 priority cut (~1.4 M chars total — full English + Curator Tour × 12 locales) lands under $50 one-shot (provider-dependent — see RFC-019 §cost-analysis).

§transparency (AI involvement disclosed by default)

The audio narration system has machines doing things humans used to do: drafting prose, translating across locales, voicing the result. Orrery's existing principle is attribution is design (PA §principles); that principle extends to AI involvement, not just imagery and outbound links.

Layer	What touches it	How it's disclosed
Source scripts (en-US)	Drafted by Claude (Anthropic) in v0.7; subject to human editorial review before any voicing. Future scripts may be human-authored end-to-end.	`text_authorship` field on every `audio-provenance.json` entry: `claude-drafted` / `claude-translated` / `human-authored` / `human-edited-claude-draft`. Plus `text_author_model` (e.g. `claude-opus-4-7`).
Translations	Claude API translates en-US source to the other 11 locales (Pipeline 1, RFC-019 §5.1).	Per-locale entry: `text_authorship: claude-translated`.
Voice	Google Cloud TTS (Neural2 / WaveNet) for the bulk; ElevenLabs for the 8 Atmospheric Moves anchor takes (RFC-019 §4.3).	`provider` + `voice_id` + `tts_model` fields. Provider logos surfaced on `/credits` like the existing publisher logos.

Where the disclosure surfaces:

/credits — new "Audio narration" table (after the existing image-provenance and text-sources tables), one row per asset with text-author / TTS-provider columns.
AudioOverlay footer — compact one-liner: Voices · Google Cloud TTS · Scripts · drafted by Claude (Anthropic), with a link to /credits for per-episode detail. Visible whenever the overlay is open (empty state included).
audio-provenance.json — machine-readable manifest. Validated by validate-data as a fail-closed gate alongside image-provenance.json.

What this rules out: stripping AI authorship from the visible record after the fact; presenting AI-drafted prose as human-authored; presenting a Google/ElevenLabs voice as a hand-recorded human reader. If a piece is genuinely human-recorded later, it carries that attribution honestly too.

§dependencies

PRD-014 (surface hotspots) — shipped v0.7; landing-site narration scripts (S6 Atmospheric Moves, S8 Guide episodes) now have a stable visual model to write against.
ADR-029 (PWA service worker) — already in place.
ADR-057 (no localStorage) — constrains heard-state design.
ADR-031 / ADR-032 / ADR-033 (i18n strategy) — defines the 12-locale catalog.
PRD-015 / RFC-018 (mobile wrapper) — downstream consumer in v0.8, not a prerequisite. The audio system ships web-only in v0.7; RFC-019 §6.3 is forward-spec for the v0.8 Capacitor sync layer.

§resolved decisions

Resolved 2026-05-16 in conversation with Marko.

v1 ambition — RESOLVED: Full hierarchy × 12 locales as the editorial goal; phased ship per the §goal table. v1.0 = English full hierarchy + 12-locale Curator Full Tour; v1.x adds depth.
TTS provider — RESOLVED: ElevenLabs as anchor, with provider abstraction (M5) so swap to OpenAI / Google / Azure / Coqui-local is an env-var change. No upfront work to integrate alternates; the abstraction earns the optionality.
v1 provider sequencing — RESOLVED: Hybrid. Google Cloud TTS (free tier) generates the bulk of the corpus at $0; ElevenLabs voices the 8 Atmospheric Moves anchor episodes only (~$15 total). Mixed-provider via voices.json. RFC-019 §4.3.
Mobile audio v1 — RESOLVED: Bundle the user's locale only (~67 MB add to the ~85 MB Capacitor budget). Other locales lazy-fetch on switch. Locale's WebVTT captions bundled too (~1 MB add).
Audio asset hosting — RESOLVED: GH Pages site (static/audio/) for v1. Re-evaluation triggered by Marko's planned v1.0 VPS docker-compose migration, not by an audio-size threshold; design keeps audio host-agnostic so the migration is a config change, not a rewrite.
Async generation — RESOLVED: Both local AND GH Actions, same pipeline. Marko triggers locally for iteration; GH Actions runs on script-PR merge for completeness.
Voice persona surfacing in UI — RESOLVED: Implicit. No badge, no Curator/Guide/Enthusiast label in the overlay. The user just hears the right voice for the moment. Keeps focus on content; matches museum-audio-guide UX. Lets us re-cast personas later without UI churn.
Curator Full Tour ordering — RESOLVED: Documentary order. Curator opens (pale-blue-dot register) → Solar System big picture → closer to home (Earth, Moon) → missions sent → people in space (ISS, Tiangong) → Mars + future → Curator close. Optimised for ~90-min listen-through. NOT nav order.
Re-translation strategy — RESOLVED: Full episode re-translate on source change. Cost ≈ $0.50 per episode revision (Claude API). Paragraph-diff optimisation deferred to v1.1 if revision frequency justifies it.
Caption auto-on triggers — RESOLVED: ALL FOUR. prefers-reduced-motion set, screen-reader detected, Audio.muted == true, OR effective bandwidth < 1 Mbps. M9 covers this.
Cost-budget thresholds — RESOLVED: $50/mo soft warn, $200/mo hard halt. Looser than initial recommendation; gives headroom for one-shot rebuilds during iteration.
Mobile VTT bundling — RESOLVED: Yes. Captions bundled alongside the user's locale of audio (~1 MB total per locale). Accessibility parity with web; deaf / hard-of-hearing + Audio.muted + airplane-mode users all covered.
Per-locale voice-quality review — RESOLVED: Defer non-en review until v1.1. Audio ships in all 12 locales for v1 with a "beta" UI flag on non-en locales (small chip on the overlay header for affected locales). Reviewers recruited in v1.1; non-en flagged as beta until reviewed. Faster ship; honest about the quality gap.

§open questions

All v1 architectural questions resolved. Operational follow-ups:

"Beta" UI flag visual treatment for non-en locales. Small chip + tooltip ("Voice quality reviewed in en-US only; other locales pending v1.1 review") in the overlay header. Treatment + copy can be polished at implementation; not blocking.
Music bed — v2 candidate, not v1. Re-open as a separate PRD if v1 surfaces "the silence between Curator segments feels empty."

§status — S11 shipped (2026-06-15, #342)

S11 ("Stage authoring — convert text-only cues to a proper guided tour") was the should-have item that anticipated a phased corpus rollout. #342 shipped it as a single 15-phase arc instead, taking the v0.7 corpus from "100 % cue (83 of 83 stages)" to fully wired across all 31 staged episodes in CURATOR_FULL_TOUR and CURATOR_EXTENDED_TOUR. Detailed architecture in RFC-019 §13.

What's now live on top of S11's spec:

Action vocabulary expanded from 5 to 8: flash, scroll-to, click, open-tab, cue, drag, zoom, navigate. drag and zoom dispatch CustomEvents the canvas routes listen for; navigate calls SvelteKit goto for URL-bound state demos (/missions?q=apollo).
Panorama descents on three surface episodes: cernan-last-words (Apollo 17 Taurus-Littrow), curiosity-persistence (Curiosity Gale Crater), guide-moon (Apollo 11 Tranquillity Base). Two of the three pair with PanoramaAutoTour auto-pans through authored annotations.
ISS assembly playback integrated into zarya-first-module (50 s visualization runs while the narration walks Endeavour-STS-88 / Unity / 12-year assembly arc).
Tour-resume race fix (Phase 10): cookie-restored position landing inside a panorama or assembly window now sequences catch-up clicks via await tick() so DOM mutations flush before the next selector lookup. Without this fix, panorama descents would silently no-op on reload.
Stage-authoring tool (scripts/audio/suggest-stages.ts) — reads SSML + VTT, emits suggested stages. Use for v0.8 episode wiring.
Analytics schema expanded to cover audio-stage-fire + nav-flow + item-click + gallery-image-load events.

S11's "ships per-episode incrementally; pilot one episode end-to-end before scheduling corpus fill" rollout plan was compressed by the structural-pass approach — every commit went through audio-tour.test.ts and Marko reviewed in-chat per Orrery's branchless workflow.

What stayed out (unchanged from S11):

Stage authoring for the remaining object-level deep-dives can opt in incrementally as they're authored.
Per-locale stage timing re-tune — v0.8 work item; selectors don't change but at_sec will drift with translated prose duration.

PRD-016 · Orrery · Science Overlay & Episode System · Drafted 2026-05-16 · S11 shipped 2026-06-15 (#342) · Closes-into-RFC-019

PRD-016 · Science Overlay & Episode System (audio) ​

§why ​

§audiences ​

§what's already shipped (audio-readiness inventory) ​

§goal ​

§user-stories ​

§must-have requirements ​

§should-have requirements ​

§will-not-have (v1) ​

§success-criteria ​

§transparency (AI involvement disclosed by default) ​

§dependencies ​

§resolved decisions ​

§open questions ​

§status — S11 shipped (2026-06-15, #342) ​