ADR-054 — Fleet i18n strategy: locale overlay parity at 137 × 14
Status · Accepted (retrospective; shipped v0.6.0) Date · 2026-05-15 Closes · RFC-016 OQ-5 TA anchor · §components/fleet/i18n · §contracts/fleet-overlay Related ADRs · ADR-017 (Paraglide-js i18n + locale overlay architecture), ADR-031 (i18n language list + rollout waves), ADR-032 (font + script strategy Wave 1), ADR-033 (translation workflow: LLM-only first-pass), ADR-043 (sr-Cyrl font gate), ADR-044 (CJK fonts Wave 2), ADR-045 (RTL Arabic), ADR-057 (locale override cookie)
Context
Fleet ships 137 entries × 14 locales = 1,918 overlay files at full parity. That's almost double the locale-overlay surface of the entire pre-v0.6 missions corpus (37 missions × 14 = 518 files). The naive plan — author every overlay by hand — would have blocked v0.6 ship by months. The alternative — ship en-US only and let the UI fall back — would have violated the "i18n from the start" constraint (TA §constraints) and made fleet a second-class citizen on the 13 non-English locales.
Three sub-questions had to be resolved:
- What gets translated — every editorial field, or just the headline name + a short blurb?
- What pipeline — same Tiangong rollout (ADR-033 LLM-first-pass + argos-translate fallback + manual review) or something fleet-specific?
- Locale fallback — when a locale overlay is missing, fall back to what?
Decision
Overlay scope (closes OQ-5)
The fleet overlay carries exactly five editorial fields per entry, matching the user-visible surface of the panel:
{
"id": "saturn-v",
"name": "Saturn V",
"best_known_for": "Carried every crewed Apollo lunar mission.",
"specs_labels": { "height_m": "Height (m)", "payload_lEO_kg": "LEO payload (kg)", "stages": "Stages" },
"flights": [
{ "mission_id": "apollo11", "flight_designation_display": "AS-506 · Apollo 11", "notes": "First crewed lunar landing." }
]
}Locked by static/data/schemas/fleet-overlay.schema.json. The base file (static/data/fleet/<category>/<id>.json per ADR-052) holds everything language-neutral: id, category, agency, country, manufacturer, dates, status, era, epoch, specs values, linked_missions, linked_sites, flights structure (mission_ids + patch paths + crew names + crew roles + crew countries), credit, links.
What is NOT translated:
specsvalues (units are universal; "110.6 m" reads identically in every locale).agency/country/manufacturer— these surface as flag chips + agency badges, not free-text fields.crew[].name— proper nouns; we do not transliterate names (Neil Armstrong is "Neil Armstrong" on/fleet?id=apollo-csm-block-ii&locale=ja).crew[].role— partially translated: roles like "Commander" / "Pilot" / "Mission Specialist" come from the Paraglide-js UI strings catalogue (per ADR-017), not from per-entry overlays.links[]URLs — linklabelstrings translate via ADR-051's locale fallback chain, not via the fleet overlay.
Pipeline (closes OQ-5)
Same as Tiangong rollout per ADR-033 — three phases:
- en-US authored by hand. The 137 en-US overlays ship as the canonical editorial truth. Every other locale derives from this.
- Wave 1 + 2 locales (es, fr, de, pt-BR, it, nl, zh-CN, ja, ko, hi, ar, ru) translated by argos-translate offline NMT in batch per ADR-033. argos-translate is the explicit fallback when an LLM round-trip is unavailable or undesirable — it ships free, runs locally, and produces consistent technical translations for the specs-heavy fleet vocabulary. Output written by
scripts/wave23/apply-translations.ts. - sr-Cyrl authored manually per ADR-043 — argos-translate does not ship a Cyrillic Serbian model, and the Latin → Cyrillic transliteration is not mechanical for the technical vocabulary. The 137 sr-Cyrl overlays are hand-authored against the en-US source. Same pattern as Tiangong sr-Cyrl rollout.
scripts/wave23/ toolchain (catalog → maps → apply-translations) is reused unchanged from Tiangong; fleet just passed a different content surface through the same pipe.
Locale fallback (closes OQ-5)
The data client (src/lib/data.ts) applies the standard ADR-017 shallow-merge:
- Fetch base file
static/data/fleet/<category>/<id>.json(always present). - Fetch overlay
static/data/i18n/<locale>/fleet/<category>/<id>.jsonif present. - Shallow-merge overlay over base — overlay wins for every field it carries.
- If a non-en-US overlay is missing, fall back to en-US (not to the base file) so the entry still shows translated UI strings around it.
This chain means the corpus degrades gracefully: missing a Korean overlay shows English text on a Korean-UI page, which is strictly better than rendering nothing or rendering a Romanised stub.
What ships at v0.6.0
- en-US: 137/137 (100 %) — full parity, hand-authored.
- Wave 1 (es, fr, de, pt-BR, it, nl): 137/137 × 6 = 822 files — argos-translate batch, no manual review pass yet.
- Wave 2 (zh-CN, ja, ko, hi, ar, ru): 137/137 × 6 = 822 files — argos-translate batch.
- sr-Cyrl: 137/137 — manual authoring (per ADR-043).
Total: 1,918 overlay files committed to source. Quality varies by locale: en-US is the editorial truth; Wave 1 + 2 are post-edit pending; sr-Cyrl is hand-checked.
Locale switching honours the orrery_locale cookie per ADR-057.
Rationale
- Five-field overlay surface keeps the per-entry translation cost minimal (~150 words per entry) while still translating everything the user actually reads. Names, dates, specs, and IDs need no translation.
- argos-translate over LLM for batch translation: free, deterministic, offline, and ADR-033 already committed to this fallback. LLMs for spot-fix only.
- Hand-authored sr-Cyrl is the only path; pretending argos handles Cyrillic Serbian would produce mis-script output (Latin transliteration on a Cyrillic page).
- en-US fallback (not base file) keeps the surrounding UI consistent — a Korean user with a missing Korean overlay still sees Korean for "Saturn V" if Wave 2 has it, then Korean UI strings around the panel; falling back to the base file would mix English content with Korean chrome.
- No crew-name translation is a deliberate honesty rule: Wernher von Braun, Yuri Gagarin, Liu Yang are not transliterations targets; the user sees the name the historical record uses. Roles are translated because they are role labels, not personal identifiers.
Alternatives considered
- One overlay file for all entries per locale (rejected) — would have made PRs ugly and made per-entry translation review impossible to scope. Per-entry files match the base-file shape (ADR-017 standard).
- Crowd-sourced translations (rejected for v0.6) — quality control overhead exceeds the budget for a one-person curator; reconsider post-1.0 if community contribution lands.
- Ship en-US only and rely on browser auto-translate (rejected) — violates the "i18n from the start" constraint; browser MT is worse than argos for technical vocabulary; defeats the purpose of having a translated UI shell.
- LLM round-trip for every overlay (rejected for batch) — non-deterministic, costly at 1,918 × ~150 words, and ADR-033 already locks argos as the batch tool.
Consequences
Positive
- 100 % overlay parity across 14 locales at v0.6.0 ship — fleet does not lag any other route on i18n coverage.
- argos-translate pipeline is reusable: any future content surface of comparable size (fleet expansion, science encyclopedia, surface hotspot LOD content per RFC-017) routes through the same toolchain.
- Locale fallback is graceful: a missing overlay never shows as a broken or empty panel.
Negative
- Wave 1 + 2 argos output has not had a manual review pass; translation quality is "MT-first-pass good" — adequate for technical specs, occasionally awkward for editorial prose like
best_known_for. Post-edit is tracked indocs/wip/fleet-translation-review.md. - sr-Cyrl is the bottleneck for any future fleet expansion — every new entry needs hand-authored Cyrillic, no shortcut.
- 1,918 files in
gitare visible ingit statusafter every locale-overlay rebuild; cleanup tooling lives inscripts/wave23/but the noise is real. - The "names not translated" rule occasionally produces a mixed-script line in CJK locales when a Russian name is romanised in en-US but appears next to Hiragana/Hangul body text. Acceptable per the editorial-honesty rationale, but visually unusual.
Implementation notes
- Per-entry overlay path:
static/data/i18n/<locale>/fleet/<category>/<id>.json. Schema:static/data/schemas/fleet-overlay.schema.json. - Batch pipeline:
scripts/wave23/catalog.ts→scripts/wave23/maps.ts→scripts/wave23/apply-translations.ts(catalog → per-locale maps → apply translations in JSON overlays). Tracked locally in the maintainer's Claude Code memory for repeatable invocation across releases. - Locale-override cookie:
orrery_locale(ADR-057). UI exposes locale switcher in the top nav. - Fallback chain validation: handled by
scripts/validate-data.ts— every base entry must have an en-US overlay; non-English overlays are optional but warned-on-missing.
Related
- ADR-017 — Paraglide-js i18n + locale overlay architecture (the parent pattern).
- ADR-031 — i18n language list and rollout waves.
- ADR-033 — Translation workflow: LLM-only first-pass (argos-translate is the explicit batch fallback this ADR exercises).
- ADR-043 — Serbian Cyrillic font gate for sr-Cyrl.
- ADR-044 — CJK font strategy for Wave 2 locales.
- ADR-045 — RTL strategy for Arabic locale.
- ADR-052 — Fleet schema + bidirectional cross-reference contract.
- ADR-053 — Fleet imagery sourcing.
- ADR-057 — Narrow exception to "no client storage": one functional cookie for locale override.
- RFC-016 — Spaceflight Fleet · architecture, schema, and dataset boundaries.
- PRD-012 — Spaceflight Fleet product spec.