ADR-047 — Provenance manifests + license stewardship
Status · Accepted Date · 2026-05-07 Extends ·
docs/adr/ADR-046.md(agency-first imagery sourcing) Scope ·static/data/image-provenance.json,static/data/text-sources.json,static/data/source-logos.json,static/data/license-waivers.json,scripts/license-allowlist.ts,scripts/build-image-provenance.ts,scripts/validate-data.ts,src/lib/components/ImageCredit.svelte,src/routes/credits/+page.svelte,src/routes/+layout.svelte
Context
ADR-046 locks the source priority for build-time imagery (operating agency, then partner archives, then Wikimedia Commons, then NASA Images API). It does not describe how attribution survives from fetch to UI render, and does not define a stewardship contract for the editorial text we paraphrase from primary sources.
Without that contract:
- Per-image gallery credits drift back to a generic "Imagery: NASA / Wikimedia" string the next time someone touches the panel template.
- License rationales are not auditable on a per-file basis.
- A new image fetched into
static/images/**can ship without a recorded source URL or license, and reviewers have no machine-checkable way to flag it. - Editorial text that was paraphrased from primary sources (NASA press kits, Wikipedia, ESA mission pages) carries no attribution at all.
Orrery deploys offline, runs entirely in the browser, and re-distributes hundreds of images and tens of editorial fragments. We are downstream of the world's space agencies and Wikimedia. The "highest standard" the user asked for is: every image and every reused text fragment should be traceable to its origin, in machine-checked form, with the public credits page rendering that data.
Decision
Adopt a provenance-manifest model with a fail-closed pipeline.
Manifests
static/data/image-provenance.json— one entry per shipped image. Required fields:id,path,source_type,title,author,agency,source_url,license_short,license_url,license_rationale,modifications[], plus optionalrevid/pageid(Wikimedia) andnasa_id(NASA Images API). Generated at build time from existing curated maps + live Wikimediaimageinfocalls. Schema:static/data/schemas/image-provenance.schema.json.static/data/text-sources.json— one entry per reused or paraphrased editorial fragment. Required fields:id,location { file, json_path?, i18n_key? },category,relationship(original|paraphrased-from|quoted-from|translated-from|adapted-from),license_short,license_rationale. Optional:snippet,source_url,source_publisher,source_author,license_url,translation_status,translation_reviewer. Schema:static/data/schemas/text-sources.schema.json.static/data/source-logos.json— masthead manifest for the public/creditspage. Required fields:id,name,kind,url,license_summary. Optionallogo_path. Schema:static/data/schemas/source-logos.schema.json.static/data/license-waivers.json— narrow exceptions to the license allowlist. Each waiver recordslicense_short,scope,justification,reviewer,decided_at, optionalexpires_at. Schema:static/data/schemas/license-waivers.schema.json.
License allowlist
scripts/license-allowlist.ts is the canonical list of acceptable licenses (PD-NASA, PD-USGov, PD-Russia, PD-Old, PD-self, PD-trivial, CC0, CC-BY 1.0–4.0, CC-BY-IGO 3.0/4.0, CC-BY-SA 1.0–4.0, CC-BY-SA IGO 3.0/4.0, plus the project-internal Orrery-Original for original prose). The file also exports normaliseLicenseShortName() for the loose Wikimedia extmetadata.LicenseShortName strings.
A license outside the allowlist must be covered by a matching waiver row, or validate-data and build-image-provenance fail closed.
Fail-closed pipeline
scripts/build-image-provenance.ts— walksstatic/images/**,static/textures/,static/logos/. For Wikimedia Commons titles (taken from the curated maps inscripts/fetch-assets.ts), it queries the Commons API forimageinfo/extmetadataand recordsArtist,LicenseShortName,LicenseUrl, revision id, and the Commons file-page URL. For NASA Images API entries it records the search URL and PD-NASA rationale. For curated assets (rocket reference, agency logos, lunar discs, Solar System Scope textures) it records the curated TASL string. Writes the manifest plus a diff report atdocs/provenance/last-fetch-diff.md. Refuses to write the manifest when any required field is missing or a license is not allowed/waived.scripts/validate-data.ts— runs the JSON-schema check on every manifest, then four runtime invariants: everyimage-provenance.jsonlicense must be allowed/waived, everyentry.pathmust resolve to a file understatic/, no duplicateentry.pathvalues, and the same allowlist + uniqueness checks fortext-sources.jsonandsource-logos.json(withlogo_pathfiles-on-disk).npm run build— runsvalidate-databefore the SvelteKit build (pre-build hook). A malformed manifest cannot ship.npm run fetch— wrapsnpm run fetch-assetsthennpm run build-image-provenancethennpm run validate-data. The diff report atdocs/provenance/last-fetch-diff.mdis the rolling review surface for new content; the run is considered complete only when validate-data passes.
Runtime surface
src/lib/data.tsexposesgetImageProvenanceManifest(),getImageProvenance(path),getSourceLogos(),getTextSources(). UI neverfetches JSON directly per ADR-006.src/lib/components/ImageCredit.svelterenders the per-image TASL line in every gallery lightbox: title, author/agency, source link, license short name + license URL, modification disclosure, and theno_endorsement_disclaimerstring.src/routes/credits/+page.svelterenders the public bill of materials, grouped by source. The route is reachable from a small footer link insrc/routes/+layout.svelteand is intentionally not in primary nav.
Authoring rule
Any new mission/planet/site copy with external provenance, any new image under static/images/** or static/textures/** or static/logos/**, and any new source logo must add the corresponding row in text-sources.json, image-provenance.json (auto-generated), or source-logos.json in the same PR. New Wikimedia entries must include their Commons filename in the appropriate curated map in scripts/fetch-assets.ts so build-image-provenance.ts can resolve them.
Ongoing review hooks
npm run fetchalways emits the diff report and runs validate-data. Stdout shows the digest.npm run buildchains validate-data; CI does the same on every PR.- The Milestone D backlog at
docs/wip/provenance-backlog.mdcaptures cadence ideas that are not yet wired (weekly link-check GH Action, PR-comment hook, Sentry breadcrumb on lightbox-without-provenance, Playwright visual regression on/credits, per-locale staleness checker, periodic Wikimedia category sweep). New review-cadence ideas land there.
Rationale
- Pulls every attribution claim into machine-checkable artefacts, so a missing field fails CI rather than slipping past human review.
- Keeps the bottom-sheet panel footers stable (Milestone B contextual labels) while exposing exact per-image attribution in the lightbox.
- Provides the public credits page with structured data instead of ad-hoc HTML — the same data drives
/creditsand the in-panel<ImageCredit>. - Locks the license allowlist + waiver model so unfamiliar licenses don't quietly enter the build.
Alternatives considered
- Per-component inline credits. Already what we had. Drifts. Rejected.
- Inline
<credit>JSON in eachstatic/images/**/.jsonsidecar. Adds many small files; harder to query for /credits. Rejected. - Runtime fetch from Commons / NASA on demand. Violates ADR-016 (build-time only). Rejected.
- Skip the text manifest, treat all text as "original". Honest for UI strings but dishonest for editorial copy paraphrased from primary sources. Rejected.
Consequences
Positive:
/creditsis a complete, auditable bill of materials.- Reviewers can grep for license / publisher / source URL.
- Drift is loud — a Wikimedia license change surfaces in the next fetch's diff report.
Negative:
- More moving parts in
validate-dataandbuild-image-provenance. - Editorial copy now requires an entry in
text-sources.jsonwhen paraphrased; minor authoring overhead. - Wikimedia API enrichment runs at fetch time; rate-limited.
Implementation notes
scripts/build-image-provenance.ts— supports--offlinefor local runs that should skip Commons enrichment and use curated fallbacks.scripts/fetch-assets.ts— themain()runner is now guarded byimport.meta.url === process.argv[1]so the file can be imported (its curated maps are reused by the provenance builder) without re-firing the network fetch.static/data/license-waivers.jsonships empty ({ schema_version: 1, waivers: [] }); the schema is enforced even with an empty list so the next waiver follows the structure.- The runtime pipeline gracefully degrades: when
image-provenance.jsonis absent (fresh checkout pre-fetch),getImageProvenance(path)returnsnulland<ImageCredit>renders nothing — the contextual gallery footer from Milestone B is still shown. This avoids hard-failing the dev experience while keeping the contract strict in CI.
Related
- ADR-046 — Agency-first build-time imagery sourcing.
- ADR-016 — External assets resolved at build time (transport).
- ADR-006 — Mission data via
static/data/(data layer rule). - ADR-019 — ajv schema validation on PR.
- Backlog —
docs/wip/provenance-backlog.md.