Skip to content

ADR-047 — Provenance manifests + license stewardship

Status · Accepted Date · 2026-05-07 Extends · docs/adr/ADR-046.md (agency-first imagery sourcing) Scope · static/data/image-provenance.json, static/data/text-sources.json, static/data/source-logos.json, static/data/license-waivers.json, scripts/license-allowlist.ts, scripts/build-image-provenance.ts, scripts/validate-data.ts, src/lib/components/ImageCredit.svelte, src/routes/credits/+page.svelte, src/routes/+layout.svelte

Context

ADR-046 locks the source priority for build-time imagery (operating agency, then partner archives, then Wikimedia Commons, then NASA Images API). It does not describe how attribution survives from fetch to UI render, and does not define a stewardship contract for the editorial text we paraphrase from primary sources.

Without that contract:

  • Per-image gallery credits drift back to a generic "Imagery: NASA / Wikimedia" string the next time someone touches the panel template.
  • License rationales are not auditable on a per-file basis.
  • A new image fetched into static/images/** can ship without a recorded source URL or license, and reviewers have no machine-checkable way to flag it.
  • Editorial text that was paraphrased from primary sources (NASA press kits, Wikipedia, ESA mission pages) carries no attribution at all.

Orrery deploys offline, runs entirely in the browser, and re-distributes hundreds of images and tens of editorial fragments. We are downstream of the world's space agencies and Wikimedia. The "highest standard" the user asked for is: every image and every reused text fragment should be traceable to its origin, in machine-checked form, with the public credits page rendering that data.

Decision

Adopt a provenance-manifest model with a fail-closed pipeline.

Manifests

  1. static/data/image-provenance.json — one entry per shipped image. Required fields: id, path, source_type, title, author, agency, source_url, license_short, license_url, license_rationale, modifications[], plus optional revid / pageid (Wikimedia) and nasa_id (NASA Images API). Generated at build time from existing curated maps + live Wikimedia imageinfo calls. Schema: static/data/schemas/image-provenance.schema.json.

  2. static/data/text-sources.json — one entry per reused or paraphrased editorial fragment. Required fields: id, location { file, json_path?, i18n_key? }, category, relationship (original | paraphrased-from | quoted-from | translated-from | adapted-from), license_short, license_rationale. Optional: snippet, source_url, source_publisher, source_author, license_url, translation_status, translation_reviewer. Schema: static/data/schemas/text-sources.schema.json.

  3. static/data/source-logos.json — masthead manifest for the public /credits page. Required fields: id, name, kind, url, license_summary. Optional logo_path. Schema: static/data/schemas/source-logos.schema.json.

  4. static/data/license-waivers.json — narrow exceptions to the license allowlist. Each waiver records license_short, scope, justification, reviewer, decided_at, optional expires_at. Schema: static/data/schemas/license-waivers.schema.json.

License allowlist

scripts/license-allowlist.ts is the canonical list of acceptable licenses (PD-NASA, PD-USGov, PD-Russia, PD-Old, PD-self, PD-trivial, CC0, CC-BY 1.0–4.0, CC-BY-IGO 3.0/4.0, CC-BY-SA 1.0–4.0, CC-BY-SA IGO 3.0/4.0, plus the project-internal Orrery-Original for original prose). The file also exports normaliseLicenseShortName() for the loose Wikimedia extmetadata.LicenseShortName strings.

A license outside the allowlist must be covered by a matching waiver row, or validate-data and build-image-provenance fail closed.

Fail-closed pipeline

  • scripts/build-image-provenance.ts — walks static/images/**, static/textures/, static/logos/. For Wikimedia Commons titles (taken from the curated maps in scripts/fetch-assets.ts), it queries the Commons API for imageinfo/extmetadata and records Artist, LicenseShortName, LicenseUrl, revision id, and the Commons file-page URL. For NASA Images API entries it records the search URL and PD-NASA rationale. For curated assets (rocket reference, agency logos, lunar discs, Solar System Scope textures) it records the curated TASL string. Writes the manifest plus a diff report at docs/provenance/last-fetch-diff.md. Refuses to write the manifest when any required field is missing or a license is not allowed/waived.

  • scripts/validate-data.ts — runs the JSON-schema check on every manifest, then four runtime invariants: every image-provenance.json license must be allowed/waived, every entry.path must resolve to a file under static/, no duplicate entry.path values, and the same allowlist + uniqueness checks for text-sources.json and source-logos.json (with logo_path files-on-disk).

  • npm run build — runs validate-data before the SvelteKit build (pre-build hook). A malformed manifest cannot ship.

  • npm run fetch — wraps npm run fetch-assets then npm run build-image-provenance then npm run validate-data. The diff report at docs/provenance/last-fetch-diff.md is the rolling review surface for new content; the run is considered complete only when validate-data passes.

Runtime surface

  • src/lib/data.ts exposes getImageProvenanceManifest(), getImageProvenance(path), getSourceLogos(), getTextSources(). UI never fetches JSON directly per ADR-006.

  • src/lib/components/ImageCredit.svelte renders the per-image TASL line in every gallery lightbox: title, author/agency, source link, license short name + license URL, modification disclosure, and the no_endorsement_disclaimer string.

  • src/routes/credits/+page.svelte renders the public bill of materials, grouped by source. The route is reachable from a small footer link in src/routes/+layout.svelte and is intentionally not in primary nav.

Authoring rule

Any new mission/planet/site copy with external provenance, any new image under static/images/** or static/textures/** or static/logos/**, and any new source logo must add the corresponding row in text-sources.json, image-provenance.json (auto-generated), or source-logos.json in the same PR. New Wikimedia entries must include their Commons filename in the appropriate curated map in scripts/fetch-assets.ts so build-image-provenance.ts can resolve them.

Ongoing review hooks

  • npm run fetch always emits the diff report and runs validate-data. Stdout shows the digest.
  • npm run build chains validate-data; CI does the same on every PR.
  • The Milestone D backlog at docs/wip/provenance-backlog.md captures cadence ideas that are not yet wired (weekly link-check GH Action, PR-comment hook, Sentry breadcrumb on lightbox-without-provenance, Playwright visual regression on /credits, per-locale staleness checker, periodic Wikimedia category sweep). New review-cadence ideas land there.

Rationale

  • Pulls every attribution claim into machine-checkable artefacts, so a missing field fails CI rather than slipping past human review.
  • Keeps the bottom-sheet panel footers stable (Milestone B contextual labels) while exposing exact per-image attribution in the lightbox.
  • Provides the public credits page with structured data instead of ad-hoc HTML — the same data drives /credits and the in-panel <ImageCredit>.
  • Locks the license allowlist + waiver model so unfamiliar licenses don't quietly enter the build.

Alternatives considered

  • Per-component inline credits. Already what we had. Drifts. Rejected.
  • Inline <credit> JSON in each static/images/**/.json sidecar. Adds many small files; harder to query for /credits. Rejected.
  • Runtime fetch from Commons / NASA on demand. Violates ADR-016 (build-time only). Rejected.
  • Skip the text manifest, treat all text as "original". Honest for UI strings but dishonest for editorial copy paraphrased from primary sources. Rejected.

Consequences

Positive:

  • /credits is a complete, auditable bill of materials.
  • Reviewers can grep for license / publisher / source URL.
  • Drift is loud — a Wikimedia license change surfaces in the next fetch's diff report.

Negative:

  • More moving parts in validate-data and build-image-provenance.
  • Editorial copy now requires an entry in text-sources.json when paraphrased; minor authoring overhead.
  • Wikimedia API enrichment runs at fetch time; rate-limited.

Implementation notes

  • scripts/build-image-provenance.ts — supports --offline for local runs that should skip Commons enrichment and use curated fallbacks.
  • scripts/fetch-assets.ts — the main() runner is now guarded by import.meta.url === process.argv[1] so the file can be imported (its curated maps are reused by the provenance builder) without re-firing the network fetch.
  • static/data/license-waivers.json ships empty ({ schema_version: 1, waivers: [] }); the schema is enforced even with an empty list so the next waiver follows the structure.
  • The runtime pipeline gracefully degrades: when image-provenance.json is absent (fresh checkout pre-fetch), getImageProvenance(path) returns null and <ImageCredit> renders nothing — the contextual gallery footer from Milestone B is still shown. This avoids hard-failing the dev experience while keeping the contract strict in CI.
  • ADR-046 — Agency-first build-time imagery sourcing.
  • ADR-016 — External assets resolved at build time (transport).
  • ADR-006 — Mission data via static/data/ (data layer rule).
  • ADR-019 — ajv schema validation on PR.
  • Backlog — docs/wip/provenance-backlog.md.

Orrery — architecture documentation · MIT · No tracking