ADR-051 — Outbound learn-link stewardship
Status · Accepted Date · 2026-05-07 Extends ·
docs/adr/ADR-046.md(agency-first imagery sourcing) anddocs/adr/ADR-047.md(provenance manifests + license stewardship) Closes into · RFC-015 (LEARN-link rollout) Scope ·static/data/link-provenance.json,static/data/source-logos.json,static/data/schemas/link-provenance.schema.json,scripts/build-link-provenance.ts,scripts/validate-data.ts,scripts/check-learn-links.ts,src/lib/components/LinkCredit.svelte,src/lib/data.ts,src/lib/library-grouping.ts,src/routes/library/+page.svelte,src/routes/+layout.svelte
Context
ADR-046 locked the source priority for build-time imagery and ADR-047 layered provenance manifests + a fail-closed pipeline on top. Together they cover every shipped image and every paraphrased editorial fragment.
LEARN-tab outbound links — the single largest editorial surface in the app (docs/research/learn-link-audit.md inventories 340 of them, deduping to 296 unique (entity_id, url) pairs) — were left out of that contract. The audit shows the same drift pattern that the imagery contract corrected:
- ~82 % of outbound links go to NASA-domain hosts (~28 %) and Wikipedia (~54 %).
- Only ~10 % go to non-US agency sites; zero go to native-language Roscosmos / CNSA / JAXA-jp / ISRO-hi pages despite the obvious primary sources.
- Several non-US missions (Chang'e 5, Chang'e 6, Chandrayaan-3, Tianwen-1) link to nothing but Wikipedia.
- Six of seven ISS Russian-segment modules miss
roscosmos.ruentirely. - No per-link provenance, no
hreflang, norel="external", no link-checker, no public bill-of-links.
A user lands on a non-US mission detail panel today and the LEARN tab points them at Wikipedia, when the operating agency's own page exists. That is editorial laziness in a project that re-distributes other people's work — and it is disrespectful to the agencies that made the missions happen.
ADR-051 brings outbound-link discipline to parity with ADR-046 / ADR-047.
Decision
Adopt per-link provenance with a fail-closed pipeline and a publishing contract that routes users to operator pages first, in their native language when possible, with a public bill-of-links page.
Manifest
static/data/link-provenance.json — one entry per (entity_id, url) pair. Required fields:
id— stable identity_id__hash(url)entity_id— id of the mission / module / planet / etc. that the link belongs toroute— app route where the link is shown (/missions,/iss,/moon,/mars,/explore,/earth)category— data-file source (mission,iss-module,iss-visitor,earth-object,moon-site,mars-site,planet,sun,rocket,small-body)url— canonicalised target (no?utm_*,?fbclid, no AMP suffix, nom.mobile prefix)label— display label (English; locale overlays may translate it)tier—intro|core|deepsource_id— must resolve instatic/data/source-logos.jsonlanguage— BCP-47 code (en,ru,zh,ja,hi,es, …);*for multi-lingual landing pageskind—agency-official|mission-microsite|science-publisher|encyclopedic|educational|community|vendor-officialfair_use_rationale— short string; "external reference; rel=noopener noreferrer external" is the defaultlast_verified— ISO date; updated by the link-checker
Optional fields:
replaced_with— when a link is removed because of a 404 or because a better source surfaced; records the new URL so the change is not silentnotes— author notes (e.g. "press kit redirects to a partial paywall after 30 s; deep tier appropriate")
Schema: static/data/schemas/link-provenance.schema.json.
Source diversity targets
- Every non-US entity has at least one
agency-officialintrolink to its operating agency. The firstintrolink on a non-US entity must be the agency portal. - No entity has Wikipedia as its sole source.
- The non-US share of the inventory rises from ~4 % to ≥ 15 % across the corpus.
- 100 % first-link agency coverage on non-US entities.
Locale fallback chain
When the UI is rendering a LEARN link list, the data layer picks links in this order, breaking ties by last_verified desc:
- Links whose
languagematches the active UI locale. - Links whose
languagematches the operator's native language (Roscosmos →ru, CNSA →zh, JAXA →ja, ISRO →hioren, MBRSC →ar). - English (
en). - Multi-lingual landing pages (
*).
The fallback chain is additive — a UI locale never hides a non-matching link, it only re-orders. Native-language agency pages stay visible to every user.
Rendering rules
Every outbound LEARN link renders as:
<a
href={link.url}
target="_blank"
rel="noopener noreferrer external"
hreflang={link.language}
>
{link.label} ↗
</a>
<LinkCredit link={link} /><LinkCredit /> renders below the anchor: Source name · language · last verified. Rendered in every panel that shows LEARN links: MissionPanel, IssModulePanel, PlanetPanel, SunPanel, SmallBodyPanel, plus the route components for /moon, /mars, /earth that render their own LEARN sections.
Public disclosure
A new route /library lists every outbound link in the manifest, grouped by source, sorted newest verified first within each source. Same architecture as /credits (ADR-047): top intro, table of contents, per-source section with logo + license summary + entry list.
The public footer (src/routes/+layout.svelte) gains a second link so /credits and /library sit together in the bottom-trailing strip.
The Mission Library label on /missions is renamed to Mission Catalog so the word "library" is unambiguous between users and the new page.
Fail-closed pipeline
scripts/build-link-provenance.ts is the single source of truth for the manifest. It walks every links[] array and the wiki strings on small bodies, normalises the URL (strips utm_*, fbclid, AMP suffixes, mobile m-prefix), classifies the host against source-logos.json, infers the language from URL or marks for human review, then writes the manifest in deterministic order.
scripts/check-learn-links.ts is the live-verification step. HEAD-probes every URL with retry, falls back to GET on 405, honours each host's robots.txt, records redirect chains, expired TLS, slow responses (> 5 s). Writes docs/provenance/last-link-check.md.
scripts/validate-data.ts enforces schema + integrity:
- Every
source_idresolves insource-logos.json. - Every
languageis BCP-47 (or*). - No duplicate
(entity_id, url)pairs. - No tracker query parameters survived normalisation.
- Every
intro/corelink returned 2xx in the most recentlast-link-check.md(when present); 4xx/5xx fail closed. deep4xx/5xx warn but do not fail.
npm run fetch chains fetch-assets && build-image-provenance && build-link-provenance && check-learn-links && validate-data so a refetch re-derives every manifest and re-validates outbound links atomically.
npm run build keeps validate-data only as the pre-hook (no live network in CI); the link-check report is the source of truth at build time.
Authoring rule
When adding a new entity to static/data/** or a new link to an existing entity:
- The first
introlink on a non-US entity is the operating agency portal. - Add the same link in the operator's native language when the operator publishes one.
- Run
npm run build-link-provenanceto register the link in the manifest; if classification falls back to a?source, edit the row by hand to set a realsource_id. - Run
npm run validate-dataandnpm run check-learn-links; both must pass. - The link will appear on
/libraryautomatically.
Consequences
Positive
- Non-US missions get the editorial weight they deserve: their operator's own page is the first thing a user sees, in the operator's language when one exists.
- Provenance is machine-checked: every outbound URL has a recorded source, language, and last-verified date.
- Drift is fail-closed: a 4xx on an
introlink breaks the build until the manifest is updated. - The public
/librarypage is honest disclosure — every external link we send users to is enumerated, grouped, and dated.
Negative
- ~25 entities need editorial enrichment in Milestone L-C.
- The link-checker adds live network to
npm run fetch. Build CI sticks tovalidate-dataonly; link-checking happens at refetch time, not on every PR. - Native-language pages may break first, since they are less stable than the English versions. The
replaced_withaudit trail is the answer.
Out of scope (intentional, tracked separately)
- Deep-link permission letters to non-Western agencies for content beyond what their public sites license — stays in #46.
- NASA-partnership-credit imagery enrichment — stays in #45.
- JSON-LD
sameAsblock on entity pages for SEO discoverability of source diversity —docs/wip/learn-link-backlog.md. - Translation of the editorial intro on
/librarybeyond the existing 14 locales — follows the existing i18n process.
Related
- ADR-046 — Agency-first build-time imagery sourcing.
- ADR-047 — Provenance manifests + license stewardship.
- RFC-015 — LEARN-link rollout (multi-milestone plan).
- Epic #51 — LEARN-link stewardship rollout.