Skip to content

ADR-033 — Translation workflow: LLM-only first-pass

Status · Accepted Date · 2026-05-04 RFC anchor · docs/rfc/RFC-010.md (closed by ADR-031 + ADR-032 + this ADR) Re-evaluation triggers · documented below

Context

RFC-010 OQ-3 explored four candidate translation workflows: professional agency ($200–400 / language), community PR model, AI-assisted with native-speaker review, and a hybrid by language tier. The decision was deferred to maintainer judgement. This ADR closes that question for v0.3.x and the immediate roadmap.

Decision

LLM-only first-pass translation. No native-speaker review pass for v0.3.x.

Concretely:

  • An LLM (the same Claude that maintains this codebase) produces the first pass for every locale-overlay file (static/data/i18n/<code>/...) and the Paraglide message bundle (messages/<code>.json).
  • The physics-terminology glossary at docs/guides/i18n-style-guide.md (per ADR-017's content rules; authored under #34) is the LLM's binding brief — terms in that glossary are non-negotiable.
  • Mission and agency proper nouns (Curiosity, Perseverance, JAXA, ROSCOSMOS, etc.) stay in original language per PRD-007 §translation-guidance.
  • No human native-speaker review gate before merge. Quality risk is accepted.

Rationale

The maintainer is a single individual operating a side-project; there is no translation budget, no translator network, and no ongoing review capacity. The realistic choice is between "LLM-only and shipped" vs "human-reviewed and never shipped". The PRD's reach goal (1.5B+ speakers excluded by English-only state) outweighs the quality risk of imperfect translations, provided graceful degradation exists.

Graceful degradation is already in place via ADR-017's en-US fallback chain: any missing or broken <code> overlay falls back to en-US per-key, so a bad translation never breaks the user's session — at worst it shows English fallback. This means the cost of a translation error is bounded (English fallback) rather than user-blocking (broken UI).

The glossary (#34) is the load-bearing quality control: it constrains the high-stakes physics terminology (Δv, Hohmann transfer, sphere of influence, etc.) where literal translation would be actively wrong. Outside the glossary, descriptive prose tolerates more variation.

Alternatives considered

  • Professional agency for tier-1 languages, LLM for the rest — proper hybrid; rejected on budget grounds (no budget). Re-evaluation trigger if budget materialises.
  • Community PR model — open translations/ and let contributors submit per-language PRs; rejected because cold-start needs a populated first language and a glossary, and quality review still falls on the maintainer. Re-evaluation trigger if a community contributor offers ongoing native-speaker review.
  • AI-assisted with mandatory native review — gold standard; rejected because there is no review capacity and "mandatory review" would mean "never shipped".
  • No translations — keeps quality at "perfect English" but excludes ~3B speakers; rejected on PRD grounds.

Consequences

Positive:

  • Spanish (and future locales) ship at maintainer pace, not blocked on external review capacity.
  • Cost = $0 per language; scales to all eleven without budget approval.
  • Fallback chain bounds the user-visible cost of any translation error.

Negative:

  • Physics terminology can drift if the glossary is incomplete or the LLM ignores it. Mitigation: spot-check 5 random missions per locale before commit; treat glossary maintenance as load-bearing.
  • Idiomatic naturalness will be uneven. Spanish from a tuned LLM is probably good; Hindi or Arabic from the same LLM is less verifiable without a native speaker.
  • Cultural adaptation (regional idioms, formal-vs-informal register choice) happens by LLM judgement rather than informed editorial choice.
  • This ADR is explicitly opinionated about quality vs reach; future maintainers may disagree and trigger re-evaluation.

Re-evaluation triggers

Supersede or extend this ADR when any of the following happens:

  1. First user-reported translation issue — particularly a physics-terminology error or a culturally inappropriate phrasing. Don't wait for accumulation; one credible report should trigger glossary update + LLM re-translation of affected files.
  2. A native-speaker reviewer becomes available for any specific language (paid or volunteer). At that point, that language moves to "AI-assisted + native review" and this ADR is superseded for that locale only.
  3. Budget materialises for professional translation of a tier-1 language. At that point, hybrid-by-tier becomes feasible and this ADR is superseded.
  4. An LLM regression in translation quality is observed across multiple locales. At that point, the LLM-only pillar weakens and the workflow is re-opened.

Implementation notes

  • The glossary at docs/guides/i18n-style-guide.md is part of the translator brief (whether LLM or human); changes to the glossary should ideally trigger re-translation of affected files.
  • Per-language tickets (e.g. #17 for Spanish) reference this ADR in their workflow section and link the glossary as a binding input.
  • This ADR does not prevent a per-language ticket from layering review on top of the LLM pass — it documents the baseline, not the ceiling.

Orrery — architecture documentation · MIT · No tracking