ADR-033 — Translation workflow: LLM-only first-pass

Status · Accepted Date · 2026-05-04 RFC anchor · docs/rfc/RFC-010.md (closed by ADR-031 + ADR-032 + this ADR) Re-evaluation triggers · documented below

Context

RFC-010 OQ-3 explored four candidate translation workflows: professional agency ($200–400 / language), community PR model, AI-assisted with native-speaker review, and a hybrid by language tier. The decision was deferred to maintainer judgement. This ADR closes that question for v0.3.x and the immediate roadmap.

Decision

LLM-only first-pass translation. No native-speaker review pass for v0.3.x.

Concretely:

An LLM (the same Claude that maintains this codebase) produces the first pass for every locale-overlay file (static/data/i18n/<code>/...) and the Paraglide message bundle (messages/<code>.json).
The physics-terminology glossary at docs/guides/i18n-style-guide.md (per ADR-017's content rules; authored under #34) is the LLM's binding brief — terms in that glossary are non-negotiable.
Mission and agency proper nouns (Curiosity, Perseverance, JAXA, ROSCOSMOS, etc.) stay in original language per PRD-007 §translation-guidance.
No human native-speaker review gate before merge. Quality risk is accepted.

Rationale

The maintainer is a single individual operating a side-project; there is no translation budget, no translator network, and no ongoing review capacity. The realistic choice is between "LLM-only and shipped" vs "human-reviewed and never shipped". The PRD's reach goal (1.5B+ speakers excluded by English-only state) outweighs the quality risk of imperfect translations, provided graceful degradation exists.

Graceful degradation is already in place via ADR-017's en-US fallback chain: any missing or broken <code> overlay falls back to en-US per-key, so a bad translation never breaks the user's session — at worst it shows English fallback. This means the cost of a translation error is bounded (English fallback) rather than user-blocking (broken UI).

The glossary (#34) is the load-bearing quality control: it constrains the high-stakes physics terminology (Δv, Hohmann transfer, sphere of influence, etc.) where literal translation would be actively wrong. Outside the glossary, descriptive prose tolerates more variation.

Alternatives considered

Professional agency for tier-1 languages, LLM for the rest — proper hybrid; rejected on budget grounds (no budget). Re-evaluation trigger if budget materialises.
Community PR model — open translations/ and let contributors submit per-language PRs; rejected because cold-start needs a populated first language and a glossary, and quality review still falls on the maintainer. Re-evaluation trigger if a community contributor offers ongoing native-speaker review.
AI-assisted with mandatory native review — gold standard; rejected because there is no review capacity and "mandatory review" would mean "never shipped".
No translations — keeps quality at "perfect English" but excludes ~3B speakers; rejected on PRD grounds.

Consequences

Positive:

Spanish (and future locales) ship at maintainer pace, not blocked on external review capacity.
Cost = $0 per language; scales to all eleven without budget approval.
Fallback chain bounds the user-visible cost of any translation error.

Negative:

Physics terminology can drift if the glossary is incomplete or the LLM ignores it. Mitigation: spot-check 5 random missions per locale before commit; treat glossary maintenance as load-bearing.
Idiomatic naturalness will be uneven. Spanish from a tuned LLM is probably good; Hindi or Arabic from the same LLM is less verifiable without a native speaker.
Cultural adaptation (regional idioms, formal-vs-informal register choice) happens by LLM judgement rather than informed editorial choice.
This ADR is explicitly opinionated about quality vs reach; future maintainers may disagree and trigger re-evaluation.

Re-evaluation triggers

Supersede or extend this ADR when any of the following happens:

First user-reported translation issue — particularly a physics-terminology error or a culturally inappropriate phrasing. Don't wait for accumulation; one credible report should trigger glossary update + LLM re-translation of affected files.
A native-speaker reviewer becomes available for any specific language (paid or volunteer). At that point, that language moves to "AI-assisted + native review" and this ADR is superseded for that locale only.
Budget materialises for professional translation of a tier-1 language. At that point, hybrid-by-tier becomes feasible and this ADR is superseded.
An LLM regression in translation quality is observed across multiple locales. At that point, the LLM-only pillar weakens and the workflow is re-opened.

Implementation notes

The glossary at docs/guides/i18n-style-guide.md is part of the translator brief (whether LLM or human); changes to the glossary should ideally trigger re-translation of affected files.
Per-language tickets (e.g. #17 for Spanish) reference this ADR in their workflow section and link the glossary as a binding input.
This ADR does not prevent a per-language ticket from layering review on top of the LLM pass — it documents the baseline, not the ceiling.

ADR-033 — Translation workflow: LLM-only first-pass ​

Context ​

Decision ​

Rationale ​

Alternatives considered ​

Consequences ​

Re-evaluation triggers ​

Implementation notes ​