Skip to content

PRD-018 · Image Pipeline v2 — vision-model scoring + smart cropping

Status · Draft v0.4 (all 4 v2.0 architectural decisions resolved 2026-05-16) Date · 2026-05-16 Owner · Marko Closes into · RFC-022 Slice gate · v1.x (after PRD-015 mobile + PRD-016 audio + PRD-017 sensory)

Why this is a PRD. The decision to add vision-model scoring + smart cropping to the asset pipeline changes the cost model (paid Anthropic Vision API calls per build, ~$67 first-build / ~$0 cached), the build duration (sharp pre-cropping at 3 aspect ratios for ~1100 curated images = real CI minutes), the editorial trust model (model-picked vs human-picked imagery), and the mobile bundle (smart-crop variants shave ~30 MB off the fleet-gallery bucket). It is purely additive — the existing image-provenance.json (ADR-047, 1345 entries) stays untouched, the existing curation flow is preserved, and the v2 layer is a new sidecar that can be backed out without breaking what ships today.


§why

The current image pipeline (scripts/fetch-assets.ts + ADR-016 + ADR-046 + ADR-047) ships 1345 provenance-tracked images. It works — every image has a license, a credit, a last-verified date, and is enforced by validate-data.ts as a fail-closed gate. The system is healthy.

What it does not do: tell us anything about the visual content of the images. Two specific failure modes recur in PR review:

  1. Wrong image picked. A NASA Images API result returns a press conference photo when we wanted Curiosity on Mars. The metadata search ranks by tags, not by what's in the frame. We catch these by hand; we miss some.
  2. Right image, wrong crop. A clean wide-shot of a spacecraft has the spacecraft in the right third. CSS object-fit: cover with default centred object-position chops the spacecraft off. Subject lost.

PRD-018 / RFC-022 fix both at build time, without touching the provenance pipeline that already works. The v2 layer adds:

  • Per-image vision-API scoring (Claude Sonnet 4.6) — each image gets a 1-10 score, a one-sentence subject description, a category (one of nine), and a focal-point coordinate (x, y as 0.0–1.0).
  • Smart-crop variants — at build time, sharp generates 1:1 + 4:3 + 16:9 pre-cropped variants using the focal point as the crop anchor. Three variants per source image; mobile build picks 1:1 for fleet galleries, web picks 16:9 for hero, 4:3 for cards.
  • A new sidecar manifeststatic/data/image-vision.json keyed by image path. Joins to image-provenance.json (ADR-047) at runtime; never modifies it. Backing out v2 = deleting the sidecar; the existing image pipeline keeps working.

This is evolution, not replacement. Per Marko's directive: "evolution of what we have now, not to change anything unless we're evolving, so a new layer on." ADR-047 stays. The sidecar is the new layer.


§audiences

AudienceWhy this helps them
Curious learnerMission galleries actually show what they came to see. The hero crop on /missions/curiosity shows the rover, not a press photo of an admin office.
Educator / journalistReliable visual-quality bar across the corpus. Cite an Orrery page knowing the imagery isn't accidental.
Editor / curator (Marko)Fewer manual review iterations. Audit-report HTML surfaces every candidate the model considered + the score that picked the winner. Easy to spot a bad pick and tune the prompt.
Mobile audience (post PRD-015)~30 MB lighter Capacitor bundle thanks to 1:1 pre-cropped fleet-gallery variants. Fits inside the ~85 MB target with headroom.

§what's already shipped (image-pipeline-readiness inventory)

CapabilityStatusSource
scripts/fetch-assets.ts (NASA Images API + Wikimedia + agency portals fetch)shippedADR-016 (build-time asset resolution)
static/data/image-provenance.json (1345 entries, license + credit + last-verified)shippedADR-046 + ADR-047
validate-data.ts enforces image provenance integrity (fail-closed gate)shippedADR-047 Milestone C
Asset-size cap (8 MiB per image, workbox precache cap)shippedvalidate-data.ts
Mission galleries render via runtime fetch from NASA Images APIshipped (but a v2 candidate to remove — see M4)gallery components
Hero photos use CSS object-fit: cover with default centred object-positionshipped (but the failure mode v2 fixes — see M2)hero components
MOBILE=1 build env (lazy locales + thumbnail tier per RFC-018 §4)planned (v0.8)RFC-018 §4

§goal

Ship a vision-pipeline v2 layer that runs at build time, joins to the existing provenance manifest by image path, and produces per-image scoring + focal-point + smart-crop variants for every image in the corpus (~1345 entries). Frontend renders use the new manifest for object-position + variant selection; v2 layer can be backed out by deleting the sidecar without breaking anything else.

v2.0 ship gate = scoring + focal-point + sharp pre-cropping (1:1 + 4:3 + 16:9) + frontend integration + audit report HTML.

v2.1 = multi-source candidate pool (extend fetch-assets.ts to ESA + JAXA + ROSCOSMOS portals beyond NASA + Wikimedia).

v2.2 = manual override UI (operator-tuned scoring overrides for edge cases the model gets wrong).


§user stories

US-1 — Editorial trust on hero images. A visitor opens /missions/curiosity and the hero image is unmistakably Curiosity on Mars. Not a press conference; not a diagram. The vision pipeline rejected the press photo at score < 5.

US-2 — Subject preserved across aspect ratios. A wide-shot of Saturn V is centred on the rocket, not on empty sky to the left. Hero (16:9) shows the rocket in the centre third; gallery card (4:3) shows it framed; mobile thumbnail (1:1) is tightly cropped on the rocket's middle stage. All three crops use the same focal point from the vision API.

US-3 — Mobile bundle savings. The Capacitor build (PRD-015 / RFC-018) bundles the 1:1 pre-cropped variants of fleet-gallery images at ~256 px instead of the full hero quality. ~30 MB saved off the existing 120 MB fleet-gallery bucket on mobile.

US-4 — Editor audit loop. Marko runs npm run images:audit-report, opens audit-report.html (gitignored), and sees every candidate per mission with: thumbnail, score, category, subject, focal-point crosshair, selection status, reject reason if rejected. Spots a bad pick → adjusts the gallery_query for that mission → re-runs the pipeline → cache hits everywhere except the changed query.

US-5 — Build-time only, zero runtime API exposure. Anthropic Vision API key never reaches the browser. All scoring happens at build. Frontend reads the static image-vision.json sidecar.

US-6 — "This image is bad" — human-feedback curation loop. From the audit report, Marko clicks a "flag" button on any image, types a reason ("subject is occluded by hardware caption", "wrong rover", "looks like a render"), and the image is added to a curation deny-list (static/data/image-curation.json). Subsequent pipeline runs treat that image as score: 0, rejected_by: "human". The flag + reason feed back into the scoring prompt as an example ("avoid this kind of result, see deny-list reason X") so the model learns Marko's editorial preferences over time without retraining.

US-7 — Granular pipeline scoping. Marko runs the pipeline against subsets, not always the full corpus: --mission curiosity, --agency NASA, --source nasa-images-api, --fleet-asset crew-portraits, --segment fleet-galleries. Each subset reuses the cache for everything outside the subset; inside the subset, scoring + cropping re-runs (or stays cached if hash-unchanged). Full rebuild (--all) is the catch-all but never the default. Cost-per-iteration drops from $67 → $0.50 when iterating on a single mission.


§must-have requirements

IDRequirement
M1Vision-API scoring at build time. Claude Sonnet 4.6 (claude-sonnet-4-6) scores every image candidate. Returns: score (1-10), subject (one sentence), category (one of: spacecraft / surface / launch / orbital / hardware / people / diagram / render / other), focal_point ({ x: 0.0-1.0, y: 0.0-1.0 }), reject_reason (string or null).
M2Selection threshold: score >= 5 AND category not in {people, diagram} for general use. Status-aware: PLANNED missions accept render (5-7 acceptable). FLOWN/ACTIVE missions reject render outright.
M3Hero + gallery selection algorithm: hero = highest-scoring spacecraft or surface candidate (or fallback to highest non-rejected). Gallery = next 8 by score, with category diversity (max 4 per category in 9-image gallery).
M4Smart-crop variants generated at build time via sharp. Three variants per source image: 1:1 (square, mobile thumbnails), 4:3 (gallery cards), 16:9 (hero). Crop anchor = focal_point from vision API. Each variant is a separate file: {base}.1x1.jpg, {base}.4x3.jpg, {base}.16x9.jpg. Source image kept (used for full-quality lightbox).
M5New sidecar manifest: static/data/image-vision.json. Schema: { "<image-path>": { score, subject, category, focal_point, variants: { "1x1": "...", "4x3": "...", "16x9": "..." } } }. Joins to image-provenance.json by image-path key. Does not modify image-provenance.json.
M6validate-data.ts gains a NEW OPTIONAL check: if a v2-scored image lacks a manifest entry, log a warning (not a fail). Fail-closed only if image-vision.json exists but is malformed. The existing fail-closed image-provenance gate stays unchanged.
M7Hash-based cache. Per-image cache key = SHA-256 of (source-image-bytes + scoring-prompt-version). Per-variant cache key = SHA-256 of (source-image-bytes + crop-spec). Cache lives at .image-cache/ (gitignored). Unchanged source + unchanged prompt = no API call, no re-crop.
M8Build-time budget: full cache-cold rebuild < 30 minutes wall clock on Marko's M-series Mac. Cache-warm rebuild < 60 seconds. Per-image cost ~$0.05 (Sonnet) → ~$67 first build for 1345 images, ~$0 cached builds.
M9Frontend reads image-vision.json at build time (Vite static import). Hero / card / thumbnail components select the correct variant by container aspect ratio + viewport. Mobile (PRD-015 wrapper) picks 1:1 for fleet galleries; desktop picks 4:3 for cards + 16:9 for hero.
M10Runtime NASA Images API calls eliminated for any view served by image-vision.json. Gallery fully offline from manifest + bundled images. Removes the "LIVE" indicator from the gallery component.
M11ANTHROPIC_API_KEY setup is a documented v2.0 prerequisite. Anthropic API is NOT covered by Claude Code subscription (announced 2026-05) — v2 needs its own paid API access. Setup: GitHub Actions secret + local ~/.zshrc export or .env.local. Same key as PRD-016 audio pipeline (Anthropic billing is per-account). Documented in docs/guides/image-pipeline-v2.md. Auth failures don't fail the build (M13 fallback applies).
M12Audit report HTML auto-generated on every scoring run: static/audit-report.html (gitignored — dev-only). Shows every candidate per image with: thumbnail at 192 px, score, category, subject, focal-point crosshair, selection status, reject reason if rejected. Marko opens locally to spot tuning opportunities.
M13Fallback behaviour: zero-acceptable-images for an image slot → use highest-scoring candidate regardless of threshold + flag with "fallback": true in manifest entry. Build never fails closed because of a vision-API call result. API outage during build → fall back to last cached scores; build continues.
M14Per-image processing is idempotent. Running the pipeline twice with no source changes produces a byte-identical image-vision.json and byte-identical variant files.
M15Human curation deny-list. A new sidecar static/data/image-curation.json (committed to the repo) lists images Marko has flagged as bad, each with a one-line reason. Pipeline reads on every run; flagged images are scored as score: 0, rejected_by: "human" regardless of model output. The deny-list survives cache rebuilds.
M16Curation feedback loop in scoring prompt. The vision-API prompt includes the most recent ~5 deny-list entries as in-context examples ("avoid: subject occluded by caption / wrong rover / looks like a render"). Updates Marko's editorial bar over time without re-training the model. Deny-list size capped at 100 entries — older entries rotate out of the prompt context but stay in the deny-list.
M17Audit report flag UI. The audit-report HTML has a "🚩 Flag this image" button on each candidate. Click → opens a small form to enter a reason → POSTs to a tiny scripts/flag-image.ts helper that appends to image-curation.json. The audit-report is HTML-only (no server); the helper runs as node scripts/flag-image.ts from a clipboard payload. Operator workflow: click flag → reason copied to clipboard → run helper → commit.
M18Granular pipeline scoping flags. CLI supports: --mission <id>, --agency <name> (NASA / ESA / JAXA / ROSCOSMOS / CNSA / ISRO), --source <name> (nasa-images-api / wikimedia-commons / agency-portal-* / curated-url), --fleet-asset <type> (heroes / patches / portraits / galleries), --segment <name> (mission-galleries / fleet / agency-logos / science-diagrams / planet-textures), --new-only (process only entries that don't yet have a manifest record OR whose hash inputs changed), --changed-since <git-ref> (process only entries whose source files were modified since the given git ref), --all (catch-all, full corpus). Subsets reuse cache outside their scope; inside their scope, scoring + cropping re-runs unless hash-unchanged.
M19Default CLI behaviour = incremental. Running npm run images:score with no flags is equivalent to --new-only: scans image-provenance.json for entries missing from image-vision.json (or whose hash inputs changed) and processes only those. Routine workflow ("I added new images, run the pipeline") costs $0–$5 per run, never $67. Cost discipline is preserved — --all is now the explicit opt-in for "re-score the whole corpus" (used after a prompt-rubric change or model swap).
M20Architectural guarantee — never reprocess unchanged entries. Cache invalidation triggers are EXPLICIT and finite: (a) source image bytes change, (b) scoring_prompt_version constant bumps, (c) vision_model config string changes, (d) image is added to image-curation.json, (e) sharp major version upgrades (variant cache only). Time-based invalidation (e.g. "re-score everything monthly") is explicitly NOT a trigger. Routine builds must be effectively free for unchanged entries.
M21Single-mission iteration cost: --mission curiosity scores ≤ 15 images at ~$0.05 each = ≤ $0.75 per iteration. Single-agency iteration: --agency JAXA scores ~80 images = ~$4. Whole corpus (--all): ~$67 first build, ~$0 cached. Incremental default (--new-only implicit): $0 if nothing changed, $0.05–$5 typical when a few entries land.

§should-have requirements

IDRequirement
S1--mission <id> flag on the pipeline CLI to score / re-crop a single mission's images only (faster iteration loop).
S2--force-score flag to invalidate the scoring cache (re-run vision API even on cached entries — useful when prompt is updated).
S3--skip-scoring flag to run only the variant-cropping path on existing scores (useful when only the crop logic changed).
S4Scoring telemetry: every API call logged to static/data/image-vision-cost-ledger.json (similar shape to PRD-016 audio cost ledger). Tracks per-build $ spend; threshold soft-warn at $50/build, hard-halt at $200/build.
S5Audit report shows per-image cost ledger entries inline so Marko can see "this build spent $0.45 on 9 candidates for this mission."

§will-not-have (v2.0)

  • Manual override UI. Deferred to v2.2.
  • Multi-source candidate pool beyond NASA + Wikimedia. ESA / JAXA / ROSCOSMOS portal scrapers deferred to v2.1.
  • Video scoring. Out of scope; videos remain human-curated.
  • Texture / logo / agency-asset scoring. These are static, human-picked, and don't benefit from vision scoring.
  • Per-user image-quality preferences. No runtime user knobs; the build picks the variant by container aspect ratio.
  • Cropping the source images in-place. Variants are NEW files; the source survives untouched (lightbox + future re-crop).
  • Modifying image-provenance.json schema. Per Marko's "evolution, new layer on" directive. ADR-047 stays untouched.
  • Replacing validate-data.ts's existing fail-closed image-provenance check. That gate stays as-is; v2 adds an OPTIONAL check.

§success-criteria

Editorial:

  1. ≥ 90 % of mission hero images post-v2 show the relevant spacecraft/surface (verified by Marko + 1 reviewer pass on all 30 missions).
  2. ≤ 5 % of gallery images post-v2 are categorised people or diagram (manifest-level audit).
  3. ≥ 85 % of mission hero images render with the focal subject inside the visible crop area at the desktop hero aspect ratio (16:9).

Technical: 4. First-build cache-cold rebuild < 30 min; cached rebuild < 60 s. 5. First-build cost ≤ $80 (sized for ~1345 entries on Sonnet 4.6 with 5 % overhead). 6. Mobile (Capacitor) fleet-gallery bucket drops from ~120 MB → ~90 MB (target ~30 MB savings via 1:1 pre-cropped variants). 7. No regression in PRD-015 M11 ceiling (~150 MB Capacitor install). 8. validate-data.ts continues to enforce ADR-047 image-provenance integrity unchanged.

Operational: 9. Marko can npm run images:audit-report + open the HTML in browser within 10 s. 10. Single-mission re-score --mission curiosity --force-score completes in < 2 minutes including API calls + sharp variant generation.


§dependencies

  • PRD-015 / RFC-018 (mobile wrapper) must exist for the mobile-bundle savings (M9 + success criterion #6) to materialise. v2.0 ships independently; mobile benefit lands when the Capacitor build picks 1:1 variants.
  • ADR-016 (build-time asset resolution) — v2.0 strictly respects: zero runtime third-party API calls.
  • ADR-046 (asset pipeline) + ADR-047 (image-provenance manifest) — v2.0 leaves untouched. Sidecar joins by image-path key.
  • Anthropic API SDK in build chain@anthropic-ai/sdk added as devDependency.
  • sharp — added as devDependency for build-time variant generation.

§resolved decisions

Resolved 2026-05-16 in conversation with Marko.

  1. Manifest model — RESOLVED: New sidecar layer (static/data/image-vision.json) joining to image-provenance.json by image-path key. ADR-047 untouched. Per Marko's "evolution, new layer on" directive. Vision pipeline can be backed out by deleting the sidecar without breaking anything else.
  2. Vision model — RESOLVED: Claude Sonnet 4.6 (claude-sonnet-4-6). Most accurate at subject/category/focal-point. ~$0.05/image; ~$67 first build for 1345 entries; ~$0 cached.
  3. Corpus scope — RESOLVED: Whole corpus (1345 image-provenance entries). Maximum editorial coverage. First-build cost accepted.
  4. Smart-crop variants — RESOLVED: Pull into v2.0 (1:1 + 4:3 + 16:9 via sharp at build time). Mobile bundle savings land immediately (~30 MB off fleet-gallery bucket). v2.0 is bigger, but lands the architectural value.
  5. Human curation feedback loop — RESOLVED: New sidecar image-curation.json (committed) + scoring prompt includes recent deny-list entries as in-context examples. Marko flags via audit-report → reason captured → next pipeline run treats image as score: 0, rejected_by: "human" AND the model sees Marko's editorial bar in the prompt. Closes US-6.
  6. Granular pipeline scoping — RESOLVED: 6 explicit CLI flags (--mission, --agency, --source, --fleet-asset, --segment, --all). No implicit default; operator picks scope. Iteration cost drops from $67 (full) to $0.50 (single mission). Closes US-7.

§open questions

  1. Score threshold calibration — confirm >= 5? Original draft proposed it. After first scoring pass, Marko reviews the distribution + adjusts. Implementation-time decision.
  2. PRD-016 audio asset hosting parallel — does the vision sidecar live in static/data/ (alongside image-provenance.json) or in a new static/data/vision/ subdirectory? Recommend static/data/image-vision.json flat (matches existing convention).
  3. @anthropic-ai/sdk version + token-counting accuracy — Sonnet 4.6 is current as of 2026-05; verify SDK version supports it (likely >=0.30.0). Implementation-time check.
  4. CI cost-cap policy. What's the per-CI-run hard cap? PRD-016 baked $50/$200; this v2 inherits the same cost ledger pattern. Confirm same thresholds apply or set tighter for image-vision specifically (since first build can hit $67 alone).
  5. Audit report retention. v2.0 auto-generates static/audit-report.html per scoring run. Should we retain a history (last 10 builds for diff) or always overwrite? Recommend overwrite for v1, history in v2.1.

PRD-018 · Orrery · Image Pipeline v2 · Drafted 2026-05-16 · Closes-into-RFC-022

Orrery — architecture documentation · MIT · No tracking