Skip to content

ADR-051 — Outbound learn-link stewardship

Status · Accepted Date · 2026-05-07 Extends · docs/adr/ADR-046.md (agency-first imagery sourcing) and docs/adr/ADR-047.md (provenance manifests + license stewardship) Closes into · RFC-015 (LEARN-link rollout) Scope · static/data/link-provenance.json, static/data/source-logos.json, static/data/schemas/link-provenance.schema.json, scripts/build-link-provenance.ts, scripts/validate-data.ts, scripts/check-learn-links.ts, src/lib/components/LinkCredit.svelte, src/lib/data.ts, src/lib/library-grouping.ts, src/routes/library/+page.svelte, src/routes/+layout.svelte

Context

ADR-046 locked the source priority for build-time imagery and ADR-047 layered provenance manifests + a fail-closed pipeline on top. Together they cover every shipped image and every paraphrased editorial fragment.

LEARN-tab outbound links — the single largest editorial surface in the app (docs/research/learn-link-audit.md inventories 340 of them, deduping to 296 unique (entity_id, url) pairs) — were left out of that contract. The audit shows the same drift pattern that the imagery contract corrected:

  • ~82 % of outbound links go to NASA-domain hosts (~28 %) and Wikipedia (~54 %).
  • Only ~10 % go to non-US agency sites; zero go to native-language Roscosmos / CNSA / JAXA-jp / ISRO-hi pages despite the obvious primary sources.
  • Several non-US missions (Chang'e 5, Chang'e 6, Chandrayaan-3, Tianwen-1) link to nothing but Wikipedia.
  • Six of seven ISS Russian-segment modules miss roscosmos.ru entirely.
  • No per-link provenance, no hreflang, no rel="external", no link-checker, no public bill-of-links.

A user lands on a non-US mission detail panel today and the LEARN tab points them at Wikipedia, when the operating agency's own page exists. That is editorial laziness in a project that re-distributes other people's work — and it is disrespectful to the agencies that made the missions happen.

ADR-051 brings outbound-link discipline to parity with ADR-046 / ADR-047.

Decision

Adopt per-link provenance with a fail-closed pipeline and a publishing contract that routes users to operator pages first, in their native language when possible, with a public bill-of-links page.

Manifest

static/data/link-provenance.json — one entry per (entity_id, url) pair. Required fields:

  • id — stable id entity_id__hash(url)
  • entity_id — id of the mission / module / planet / etc. that the link belongs to
  • route — app route where the link is shown (/missions, /iss, /moon, /mars, /explore, /earth)
  • category — data-file source (mission, iss-module, iss-visitor, earth-object, moon-site, mars-site, planet, sun, rocket, small-body)
  • url — canonicalised target (no ?utm_*, ?fbclid, no AMP suffix, no m. mobile prefix)
  • label — display label (English; locale overlays may translate it)
  • tierintro | core | deep
  • source_id — must resolve in static/data/source-logos.json
  • language — BCP-47 code (en, ru, zh, ja, hi, es, …); * for multi-lingual landing pages
  • kindagency-official | mission-microsite | science-publisher | encyclopedic | educational | community | vendor-official
  • fair_use_rationale — short string; "external reference; rel=noopener noreferrer external" is the default
  • last_verified — ISO date; updated by the link-checker

Optional fields:

  • replaced_with — when a link is removed because of a 404 or because a better source surfaced; records the new URL so the change is not silent
  • notes — author notes (e.g. "press kit redirects to a partial paywall after 30 s; deep tier appropriate")

Schema: static/data/schemas/link-provenance.schema.json.

Source diversity targets

  • Every non-US entity has at least one agency-official intro link to its operating agency. The first intro link on a non-US entity must be the agency portal.
  • No entity has Wikipedia as its sole source.
  • The non-US share of the inventory rises from ~4 % to ≥ 15 % across the corpus.
  • 100 % first-link agency coverage on non-US entities.

Locale fallback chain

When the UI is rendering a LEARN link list, the data layer picks links in this order, breaking ties by last_verified desc:

  1. Links whose language matches the active UI locale.
  2. Links whose language matches the operator's native language (Roscosmos → ru, CNSA → zh, JAXA → ja, ISRO → hi or en, MBRSC → ar).
  3. English (en).
  4. Multi-lingual landing pages (*).

The fallback chain is additive — a UI locale never hides a non-matching link, it only re-orders. Native-language agency pages stay visible to every user.

Rendering rules

Every outbound LEARN link renders as:

html
<a
  href={link.url}
  target="_blank"
  rel="noopener noreferrer external"
  hreflang={link.language}
>
  {link.label} ↗
</a>
<LinkCredit link={link} />

<LinkCredit /> renders below the anchor: Source name · language · last verified. Rendered in every panel that shows LEARN links: MissionPanel, IssModulePanel, PlanetPanel, SunPanel, SmallBodyPanel, plus the route components for /moon, /mars, /earth that render their own LEARN sections.

Public disclosure

A new route /library lists every outbound link in the manifest, grouped by source, sorted newest verified first within each source. Same architecture as /credits (ADR-047): top intro, table of contents, per-source section with logo + license summary + entry list.

The public footer (src/routes/+layout.svelte) gains a second link so /credits and /library sit together in the bottom-trailing strip.

The Mission Library label on /missions is renamed to Mission Catalog so the word "library" is unambiguous between users and the new page.

Fail-closed pipeline

scripts/build-link-provenance.ts is the single source of truth for the manifest. It walks every links[] array and the wiki strings on small bodies, normalises the URL (strips utm_*, fbclid, AMP suffixes, mobile m-prefix), classifies the host against source-logos.json, infers the language from URL or marks for human review, then writes the manifest in deterministic order.

scripts/check-learn-links.ts is the live-verification step. HEAD-probes every URL with retry, falls back to GET on 405, honours each host's robots.txt, records redirect chains, expired TLS, slow responses (> 5 s). Writes docs/provenance/last-link-check.md.

scripts/validate-data.ts enforces schema + integrity:

  • Every source_id resolves in source-logos.json.
  • Every language is BCP-47 (or *).
  • No duplicate (entity_id, url) pairs.
  • No tracker query parameters survived normalisation.
  • Every intro / core link returned 2xx in the most recent last-link-check.md (when present); 4xx/5xx fail closed.
  • deep 4xx/5xx warn but do not fail.

npm run fetch chains fetch-assets && build-image-provenance && build-link-provenance && check-learn-links && validate-data so a refetch re-derives every manifest and re-validates outbound links atomically.

npm run build keeps validate-data only as the pre-hook (no live network in CI); the link-check report is the source of truth at build time.

Authoring rule

When adding a new entity to static/data/** or a new link to an existing entity:

  1. The first intro link on a non-US entity is the operating agency portal.
  2. Add the same link in the operator's native language when the operator publishes one.
  3. Run npm run build-link-provenance to register the link in the manifest; if classification falls back to a ? source, edit the row by hand to set a real source_id.
  4. Run npm run validate-data and npm run check-learn-links; both must pass.
  5. The link will appear on /library automatically.

Consequences

Positive

  • Non-US missions get the editorial weight they deserve: their operator's own page is the first thing a user sees, in the operator's language when one exists.
  • Provenance is machine-checked: every outbound URL has a recorded source, language, and last-verified date.
  • Drift is fail-closed: a 4xx on an intro link breaks the build until the manifest is updated.
  • The public /library page is honest disclosure — every external link we send users to is enumerated, grouped, and dated.

Negative

  • ~25 entities need editorial enrichment in Milestone L-C.
  • The link-checker adds live network to npm run fetch. Build CI sticks to validate-data only; link-checking happens at refetch time, not on every PR.
  • Native-language pages may break first, since they are less stable than the English versions. The replaced_with audit trail is the answer.

Out of scope (intentional, tracked separately)

  • Deep-link permission letters to non-Western agencies for content beyond what their public sites license — stays in #46.
  • NASA-partnership-credit imagery enrichment — stays in #45.
  • JSON-LD sameAs block on entity pages for SEO discoverability of source diversity — docs/wip/learn-link-backlog.md.
  • Translation of the editorial intro on /library beyond the existing 14 locales — follows the existing i18n process.
  • ADR-046 — Agency-first build-time imagery sourcing.
  • ADR-047 — Provenance manifests + license stewardship.
  • RFC-015 — LEARN-link rollout (multi-milestone plan).
  • Epic #51 — LEARN-link stewardship rollout.

Orrery — architecture documentation · MIT · No tracking