Skip to content

ADR-054 — Fleet i18n strategy: locale overlay parity at 137 × 14

Status · Accepted (retrospective; shipped v0.6.0) Date · 2026-05-15 Closes · RFC-016 OQ-5 TA anchor · §components/fleet/i18n · §contracts/fleet-overlay Related ADRs · ADR-017 (Paraglide-js i18n + locale overlay architecture), ADR-031 (i18n language list + rollout waves), ADR-032 (font + script strategy Wave 1), ADR-033 (translation workflow: LLM-only first-pass), ADR-043 (sr-Cyrl font gate), ADR-044 (CJK fonts Wave 2), ADR-045 (RTL Arabic), ADR-057 (locale override cookie)

Context

Fleet ships 137 entries × 14 locales = 1,918 overlay files at full parity. That's almost double the locale-overlay surface of the entire pre-v0.6 missions corpus (37 missions × 14 = 518 files). The naive plan — author every overlay by hand — would have blocked v0.6 ship by months. The alternative — ship en-US only and let the UI fall back — would have violated the "i18n from the start" constraint (TA §constraints) and made fleet a second-class citizen on the 13 non-English locales.

Three sub-questions had to be resolved:

  1. What gets translated — every editorial field, or just the headline name + a short blurb?
  2. What pipeline — same Tiangong rollout (ADR-033 LLM-first-pass + argos-translate fallback + manual review) or something fleet-specific?
  3. Locale fallback — when a locale overlay is missing, fall back to what?

Decision

Overlay scope (closes OQ-5)

The fleet overlay carries exactly five editorial fields per entry, matching the user-visible surface of the panel:

json
{
  "id": "saturn-v",
  "name": "Saturn V",
  "best_known_for": "Carried every crewed Apollo lunar mission.",
  "specs_labels": { "height_m": "Height (m)", "payload_lEO_kg": "LEO payload (kg)", "stages": "Stages" },
  "flights": [
    { "mission_id": "apollo11", "flight_designation_display": "AS-506 · Apollo 11", "notes": "First crewed lunar landing." }
  ]
}

Locked by static/data/schemas/fleet-overlay.schema.json. The base file (static/data/fleet/<category>/<id>.json per ADR-052) holds everything language-neutral: id, category, agency, country, manufacturer, dates, status, era, epoch, specs values, linked_missions, linked_sites, flights structure (mission_ids + patch paths + crew names + crew roles + crew countries), credit, links.

What is NOT translated:

  • specs values (units are universal; "110.6 m" reads identically in every locale).
  • agency / country / manufacturer — these surface as flag chips + agency badges, not free-text fields.
  • crew[].name — proper nouns; we do not transliterate names (Neil Armstrong is "Neil Armstrong" on /fleet?id=apollo-csm-block-ii&locale=ja).
  • crew[].role — partially translated: roles like "Commander" / "Pilot" / "Mission Specialist" come from the Paraglide-js UI strings catalogue (per ADR-017), not from per-entry overlays.
  • links[] URLs — link label strings translate via ADR-051's locale fallback chain, not via the fleet overlay.

Pipeline (closes OQ-5)

Same as Tiangong rollout per ADR-033 — three phases:

  1. en-US authored by hand. The 137 en-US overlays ship as the canonical editorial truth. Every other locale derives from this.
  2. Wave 1 + 2 locales (es, fr, de, pt-BR, it, nl, zh-CN, ja, ko, hi, ar, ru) translated by argos-translate offline NMT in batch per ADR-033. argos-translate is the explicit fallback when an LLM round-trip is unavailable or undesirable — it ships free, runs locally, and produces consistent technical translations for the specs-heavy fleet vocabulary. Output written by scripts/wave23/apply-translations.ts.
  3. sr-Cyrl authored manually per ADR-043 — argos-translate does not ship a Cyrillic Serbian model, and the Latin → Cyrillic transliteration is not mechanical for the technical vocabulary. The 137 sr-Cyrl overlays are hand-authored against the en-US source. Same pattern as Tiangong sr-Cyrl rollout.

scripts/wave23/ toolchain (catalog → maps → apply-translations) is reused unchanged from Tiangong; fleet just passed a different content surface through the same pipe.

Locale fallback (closes OQ-5)

The data client (src/lib/data.ts) applies the standard ADR-017 shallow-merge:

  1. Fetch base file static/data/fleet/<category>/<id>.json (always present).
  2. Fetch overlay static/data/i18n/<locale>/fleet/<category>/<id>.json if present.
  3. Shallow-merge overlay over base — overlay wins for every field it carries.
  4. If a non-en-US overlay is missing, fall back to en-US (not to the base file) so the entry still shows translated UI strings around it.

This chain means the corpus degrades gracefully: missing a Korean overlay shows English text on a Korean-UI page, which is strictly better than rendering nothing or rendering a Romanised stub.

What ships at v0.6.0

  • en-US: 137/137 (100 %) — full parity, hand-authored.
  • Wave 1 (es, fr, de, pt-BR, it, nl): 137/137 × 6 = 822 files — argos-translate batch, no manual review pass yet.
  • Wave 2 (zh-CN, ja, ko, hi, ar, ru): 137/137 × 6 = 822 files — argos-translate batch.
  • sr-Cyrl: 137/137 — manual authoring (per ADR-043).

Total: 1,918 overlay files committed to source. Quality varies by locale: en-US is the editorial truth; Wave 1 + 2 are post-edit pending; sr-Cyrl is hand-checked.

Locale switching honours the orrery_locale cookie per ADR-057.

Rationale

  • Five-field overlay surface keeps the per-entry translation cost minimal (~150 words per entry) while still translating everything the user actually reads. Names, dates, specs, and IDs need no translation.
  • argos-translate over LLM for batch translation: free, deterministic, offline, and ADR-033 already committed to this fallback. LLMs for spot-fix only.
  • Hand-authored sr-Cyrl is the only path; pretending argos handles Cyrillic Serbian would produce mis-script output (Latin transliteration on a Cyrillic page).
  • en-US fallback (not base file) keeps the surrounding UI consistent — a Korean user with a missing Korean overlay still sees Korean for "Saturn V" if Wave 2 has it, then Korean UI strings around the panel; falling back to the base file would mix English content with Korean chrome.
  • No crew-name translation is a deliberate honesty rule: Wernher von Braun, Yuri Gagarin, Liu Yang are not transliterations targets; the user sees the name the historical record uses. Roles are translated because they are role labels, not personal identifiers.

Alternatives considered

  • One overlay file for all entries per locale (rejected) — would have made PRs ugly and made per-entry translation review impossible to scope. Per-entry files match the base-file shape (ADR-017 standard).
  • Crowd-sourced translations (rejected for v0.6) — quality control overhead exceeds the budget for a one-person curator; reconsider post-1.0 if community contribution lands.
  • Ship en-US only and rely on browser auto-translate (rejected) — violates the "i18n from the start" constraint; browser MT is worse than argos for technical vocabulary; defeats the purpose of having a translated UI shell.
  • LLM round-trip for every overlay (rejected for batch) — non-deterministic, costly at 1,918 × ~150 words, and ADR-033 already locks argos as the batch tool.

Consequences

Positive

  • 100 % overlay parity across 14 locales at v0.6.0 ship — fleet does not lag any other route on i18n coverage.
  • argos-translate pipeline is reusable: any future content surface of comparable size (fleet expansion, science encyclopedia, surface hotspot LOD content per RFC-017) routes through the same toolchain.
  • Locale fallback is graceful: a missing overlay never shows as a broken or empty panel.

Negative

  • Wave 1 + 2 argos output has not had a manual review pass; translation quality is "MT-first-pass good" — adequate for technical specs, occasionally awkward for editorial prose like best_known_for. Post-edit is tracked in docs/wip/fleet-translation-review.md.
  • sr-Cyrl is the bottleneck for any future fleet expansion — every new entry needs hand-authored Cyrillic, no shortcut.
  • 1,918 files in git are visible in git status after every locale-overlay rebuild; cleanup tooling lives in scripts/wave23/ but the noise is real.
  • The "names not translated" rule occasionally produces a mixed-script line in CJK locales when a Russian name is romanised in en-US but appears next to Hiragana/Hangul body text. Acceptable per the editorial-honesty rationale, but visually unusual.

Implementation notes

  • Per-entry overlay path: static/data/i18n/<locale>/fleet/<category>/<id>.json. Schema: static/data/schemas/fleet-overlay.schema.json.
  • Batch pipeline: scripts/wave23/catalog.tsscripts/wave23/maps.tsscripts/wave23/apply-translations.ts (catalog → per-locale maps → apply translations in JSON overlays). Tracked locally in the maintainer's Claude Code memory for repeatable invocation across releases.
  • Locale-override cookie: orrery_locale (ADR-057). UI exposes locale switcher in the top nav.
  • Fallback chain validation: handled by scripts/validate-data.ts — every base entry must have an en-US overlay; non-English overlays are optional but warned-on-missing.
  • ADR-017 — Paraglide-js i18n + locale overlay architecture (the parent pattern).
  • ADR-031 — i18n language list and rollout waves.
  • ADR-033 — Translation workflow: LLM-only first-pass (argos-translate is the explicit batch fallback this ADR exercises).
  • ADR-043 — Serbian Cyrillic font gate for sr-Cyrl.
  • ADR-044 — CJK font strategy for Wave 2 locales.
  • ADR-045 — RTL strategy for Arabic locale.
  • ADR-052 — Fleet schema + bidirectional cross-reference contract.
  • ADR-053 — Fleet imagery sourcing.
  • ADR-057 — Narrow exception to "no client storage": one functional cookie for locale override.
  • RFC-016 — Spaceflight Fleet · architecture, schema, and dataset boundaries.
  • PRD-012 — Spaceflight Fleet product spec.

Orrery — architecture documentation · MIT · No tracking