PRD-018 · Image Pipeline v2 — vision-model scoring + smart cropping
Status · Draft v0.4 (all 4 v2.0 architectural decisions resolved 2026-05-16) Date · 2026-05-16 Owner · Marko Closes into · RFC-022 Slice gate · v1.x (after PRD-015 mobile + PRD-016 audio + PRD-017 sensory)
Why this is a PRD. The decision to add vision-model scoring + smart cropping to the asset pipeline changes the cost model (paid Anthropic Vision API calls per build, ~$67 first-build / ~$0 cached), the build duration (sharp pre-cropping at 3 aspect ratios for ~1100 curated images = real CI minutes), the editorial trust model (model-picked vs human-picked imagery), and the mobile bundle (smart-crop variants shave ~30 MB off the fleet-gallery bucket). It is purely additive — the existing
image-provenance.json(ADR-047, 1345 entries) stays untouched, the existing curation flow is preserved, and the v2 layer is a new sidecar that can be backed out without breaking what ships today.
§why
The current image pipeline (scripts/fetch-assets.ts + ADR-016 + ADR-046 + ADR-047) ships 1345 provenance-tracked images. It works — every image has a license, a credit, a last-verified date, and is enforced by validate-data.ts as a fail-closed gate. The system is healthy.
What it does not do: tell us anything about the visual content of the images. Two specific failure modes recur in PR review:
- Wrong image picked. A NASA Images API result returns a press conference photo when we wanted Curiosity on Mars. The metadata search ranks by tags, not by what's in the frame. We catch these by hand; we miss some.
- Right image, wrong crop. A clean wide-shot of a spacecraft has the spacecraft in the right third. CSS
object-fit: coverwith default centredobject-positionchops the spacecraft off. Subject lost.
PRD-018 / RFC-022 fix both at build time, without touching the provenance pipeline that already works. The v2 layer adds:
- Per-image vision-API scoring (Claude Sonnet 4.6) — each image gets a 1-10 score, a one-sentence subject description, a category (one of nine), and a focal-point coordinate (x, y as 0.0–1.0).
- Smart-crop variants — at build time,
sharpgenerates 1:1 + 4:3 + 16:9 pre-cropped variants using the focal point as the crop anchor. Three variants per source image; mobile build picks 1:1 for fleet galleries, web picks 16:9 for hero, 4:3 for cards. - A new sidecar manifest —
static/data/image-vision.jsonkeyed by image path. Joins toimage-provenance.json(ADR-047) at runtime; never modifies it. Backing out v2 = deleting the sidecar; the existing image pipeline keeps working.
This is evolution, not replacement. Per Marko's directive: "evolution of what we have now, not to change anything unless we're evolving, so a new layer on." ADR-047 stays. The sidecar is the new layer.
§audiences
| Audience | Why this helps them |
|---|---|
| Curious learner | Mission galleries actually show what they came to see. The hero crop on /missions/curiosity shows the rover, not a press photo of an admin office. |
| Educator / journalist | Reliable visual-quality bar across the corpus. Cite an Orrery page knowing the imagery isn't accidental. |
| Editor / curator (Marko) | Fewer manual review iterations. Audit-report HTML surfaces every candidate the model considered + the score that picked the winner. Easy to spot a bad pick and tune the prompt. |
| Mobile audience (post PRD-015) | ~30 MB lighter Capacitor bundle thanks to 1:1 pre-cropped fleet-gallery variants. Fits inside the ~85 MB target with headroom. |
§what's already shipped (image-pipeline-readiness inventory)
| Capability | Status | Source |
|---|---|---|
scripts/fetch-assets.ts (NASA Images API + Wikimedia + agency portals fetch) | shipped | ADR-016 (build-time asset resolution) |
static/data/image-provenance.json (1345 entries, license + credit + last-verified) | shipped | ADR-046 + ADR-047 |
validate-data.ts enforces image provenance integrity (fail-closed gate) | shipped | ADR-047 Milestone C |
| Asset-size cap (8 MiB per image, workbox precache cap) | shipped | validate-data.ts |
| Mission galleries render via runtime fetch from NASA Images API | shipped (but a v2 candidate to remove — see M4) | gallery components |
Hero photos use CSS object-fit: cover with default centred object-position | shipped (but the failure mode v2 fixes — see M2) | hero components |
| MOBILE=1 build env (lazy locales + thumbnail tier per RFC-018 §4) | planned (v0.8) | RFC-018 §4 |
§goal
Ship a vision-pipeline v2 layer that runs at build time, joins to the existing provenance manifest by image path, and produces per-image scoring + focal-point + smart-crop variants for every image in the corpus (~1345 entries). Frontend renders use the new manifest for object-position + variant selection; v2 layer can be backed out by deleting the sidecar without breaking anything else.
v2.0 ship gate = scoring + focal-point + sharp pre-cropping (1:1 + 4:3 + 16:9) + frontend integration + audit report HTML.
v2.1 = multi-source candidate pool (extend fetch-assets.ts to ESA + JAXA + ROSCOSMOS portals beyond NASA + Wikimedia).
v2.2 = manual override UI (operator-tuned scoring overrides for edge cases the model gets wrong).
§user stories
US-1 — Editorial trust on hero images. A visitor opens /missions/curiosity and the hero image is unmistakably Curiosity on Mars. Not a press conference; not a diagram. The vision pipeline rejected the press photo at score < 5.
US-2 — Subject preserved across aspect ratios. A wide-shot of Saturn V is centred on the rocket, not on empty sky to the left. Hero (16:9) shows the rocket in the centre third; gallery card (4:3) shows it framed; mobile thumbnail (1:1) is tightly cropped on the rocket's middle stage. All three crops use the same focal point from the vision API.
US-3 — Mobile bundle savings. The Capacitor build (PRD-015 / RFC-018) bundles the 1:1 pre-cropped variants of fleet-gallery images at ~256 px instead of the full hero quality. ~30 MB saved off the existing 120 MB fleet-gallery bucket on mobile.
US-4 — Editor audit loop. Marko runs npm run images:audit-report, opens audit-report.html (gitignored), and sees every candidate per mission with: thumbnail, score, category, subject, focal-point crosshair, selection status, reject reason if rejected. Spots a bad pick → adjusts the gallery_query for that mission → re-runs the pipeline → cache hits everywhere except the changed query.
US-5 — Build-time only, zero runtime API exposure. Anthropic Vision API key never reaches the browser. All scoring happens at build. Frontend reads the static image-vision.json sidecar.
US-6 — "This image is bad" — human-feedback curation loop. From the audit report, Marko clicks a "flag" button on any image, types a reason ("subject is occluded by hardware caption", "wrong rover", "looks like a render"), and the image is added to a curation deny-list (static/data/image-curation.json). Subsequent pipeline runs treat that image as score: 0, rejected_by: "human". The flag + reason feed back into the scoring prompt as an example ("avoid this kind of result, see deny-list reason X") so the model learns Marko's editorial preferences over time without retraining.
US-7 — Granular pipeline scoping. Marko runs the pipeline against subsets, not always the full corpus: --mission curiosity, --agency NASA, --source nasa-images-api, --fleet-asset crew-portraits, --segment fleet-galleries. Each subset reuses the cache for everything outside the subset; inside the subset, scoring + cropping re-runs (or stays cached if hash-unchanged). Full rebuild (--all) is the catch-all but never the default. Cost-per-iteration drops from $67 → $0.50 when iterating on a single mission.
§must-have requirements
| ID | Requirement |
|---|---|
| M1 | Vision-API scoring at build time. Claude Sonnet 4.6 (claude-sonnet-4-6) scores every image candidate. Returns: score (1-10), subject (one sentence), category (one of: spacecraft / surface / launch / orbital / hardware / people / diagram / render / other), focal_point ({ x: 0.0-1.0, y: 0.0-1.0 }), reject_reason (string or null). |
| M2 | Selection threshold: score >= 5 AND category not in {people, diagram} for general use. Status-aware: PLANNED missions accept render (5-7 acceptable). FLOWN/ACTIVE missions reject render outright. |
| M3 | Hero + gallery selection algorithm: hero = highest-scoring spacecraft or surface candidate (or fallback to highest non-rejected). Gallery = next 8 by score, with category diversity (max 4 per category in 9-image gallery). |
| M4 | Smart-crop variants generated at build time via sharp. Three variants per source image: 1:1 (square, mobile thumbnails), 4:3 (gallery cards), 16:9 (hero). Crop anchor = focal_point from vision API. Each variant is a separate file: {base}.1x1.jpg, {base}.4x3.jpg, {base}.16x9.jpg. Source image kept (used for full-quality lightbox). |
| M5 | New sidecar manifest: static/data/image-vision.json. Schema: { "<image-path>": { score, subject, category, focal_point, variants: { "1x1": "...", "4x3": "...", "16x9": "..." } } }. Joins to image-provenance.json by image-path key. Does not modify image-provenance.json. |
| M6 | validate-data.ts gains a NEW OPTIONAL check: if a v2-scored image lacks a manifest entry, log a warning (not a fail). Fail-closed only if image-vision.json exists but is malformed. The existing fail-closed image-provenance gate stays unchanged. |
| M7 | Hash-based cache. Per-image cache key = SHA-256 of (source-image-bytes + scoring-prompt-version). Per-variant cache key = SHA-256 of (source-image-bytes + crop-spec). Cache lives at .image-cache/ (gitignored). Unchanged source + unchanged prompt = no API call, no re-crop. |
| M8 | Build-time budget: full cache-cold rebuild < 30 minutes wall clock on Marko's M-series Mac. Cache-warm rebuild < 60 seconds. Per-image cost ~$0.05 (Sonnet) → ~$67 first build for 1345 images, ~$0 cached builds. |
| M9 | Frontend reads image-vision.json at build time (Vite static import). Hero / card / thumbnail components select the correct variant by container aspect ratio + viewport. Mobile (PRD-015 wrapper) picks 1:1 for fleet galleries; desktop picks 4:3 for cards + 16:9 for hero. |
| M10 | Runtime NASA Images API calls eliminated for any view served by image-vision.json. Gallery fully offline from manifest + bundled images. Removes the "LIVE" indicator from the gallery component. |
| M11 | ANTHROPIC_API_KEY setup is a documented v2.0 prerequisite. Anthropic API is NOT covered by Claude Code subscription (announced 2026-05) — v2 needs its own paid API access. Setup: GitHub Actions secret + local ~/.zshrc export or .env.local. Same key as PRD-016 audio pipeline (Anthropic billing is per-account). Documented in docs/guides/image-pipeline-v2.md. Auth failures don't fail the build (M13 fallback applies). |
| M12 | Audit report HTML auto-generated on every scoring run: static/audit-report.html (gitignored — dev-only). Shows every candidate per image with: thumbnail at 192 px, score, category, subject, focal-point crosshair, selection status, reject reason if rejected. Marko opens locally to spot tuning opportunities. |
| M13 | Fallback behaviour: zero-acceptable-images for an image slot → use highest-scoring candidate regardless of threshold + flag with "fallback": true in manifest entry. Build never fails closed because of a vision-API call result. API outage during build → fall back to last cached scores; build continues. |
| M14 | Per-image processing is idempotent. Running the pipeline twice with no source changes produces a byte-identical image-vision.json and byte-identical variant files. |
| M15 | Human curation deny-list. A new sidecar static/data/image-curation.json (committed to the repo) lists images Marko has flagged as bad, each with a one-line reason. Pipeline reads on every run; flagged images are scored as score: 0, rejected_by: "human" regardless of model output. The deny-list survives cache rebuilds. |
| M16 | Curation feedback loop in scoring prompt. The vision-API prompt includes the most recent ~5 deny-list entries as in-context examples ("avoid: subject occluded by caption / wrong rover / looks like a render"). Updates Marko's editorial bar over time without re-training the model. Deny-list size capped at 100 entries — older entries rotate out of the prompt context but stay in the deny-list. |
| M17 | Audit report flag UI. The audit-report HTML has a "🚩 Flag this image" button on each candidate. Click → opens a small form to enter a reason → POSTs to a tiny scripts/flag-image.ts helper that appends to image-curation.json. The audit-report is HTML-only (no server); the helper runs as node scripts/flag-image.ts from a clipboard payload. Operator workflow: click flag → reason copied to clipboard → run helper → commit. |
| M18 | Granular pipeline scoping flags. CLI supports: --mission <id>, --agency <name> (NASA / ESA / JAXA / ROSCOSMOS / CNSA / ISRO), --source <name> (nasa-images-api / wikimedia-commons / agency-portal-* / curated-url), --fleet-asset <type> (heroes / patches / portraits / galleries), --segment <name> (mission-galleries / fleet / agency-logos / science-diagrams / planet-textures), --new-only (process only entries that don't yet have a manifest record OR whose hash inputs changed), --changed-since <git-ref> (process only entries whose source files were modified since the given git ref), --all (catch-all, full corpus). Subsets reuse cache outside their scope; inside their scope, scoring + cropping re-runs unless hash-unchanged. |
| M19 | Default CLI behaviour = incremental. Running npm run images:score with no flags is equivalent to --new-only: scans image-provenance.json for entries missing from image-vision.json (or whose hash inputs changed) and processes only those. Routine workflow ("I added new images, run the pipeline") costs $0–$5 per run, never $67. Cost discipline is preserved — --all is now the explicit opt-in for "re-score the whole corpus" (used after a prompt-rubric change or model swap). |
| M20 | Architectural guarantee — never reprocess unchanged entries. Cache invalidation triggers are EXPLICIT and finite: (a) source image bytes change, (b) scoring_prompt_version constant bumps, (c) vision_model config string changes, (d) image is added to image-curation.json, (e) sharp major version upgrades (variant cache only). Time-based invalidation (e.g. "re-score everything monthly") is explicitly NOT a trigger. Routine builds must be effectively free for unchanged entries. |
| M21 | Single-mission iteration cost: --mission curiosity scores ≤ 15 images at ~$0.05 each = ≤ $0.75 per iteration. Single-agency iteration: --agency JAXA scores ~80 images = ~$4. Whole corpus (--all): ~$67 first build, ~$0 cached. Incremental default (--new-only implicit): $0 if nothing changed, $0.05–$5 typical when a few entries land. |
§should-have requirements
| ID | Requirement |
|---|---|
| S1 | --mission <id> flag on the pipeline CLI to score / re-crop a single mission's images only (faster iteration loop). |
| S2 | --force-score flag to invalidate the scoring cache (re-run vision API even on cached entries — useful when prompt is updated). |
| S3 | --skip-scoring flag to run only the variant-cropping path on existing scores (useful when only the crop logic changed). |
| S4 | Scoring telemetry: every API call logged to static/data/image-vision-cost-ledger.json (similar shape to PRD-016 audio cost ledger). Tracks per-build $ spend; threshold soft-warn at $50/build, hard-halt at $200/build. |
| S5 | Audit report shows per-image cost ledger entries inline so Marko can see "this build spent $0.45 on 9 candidates for this mission." |
§will-not-have (v2.0)
- Manual override UI. Deferred to v2.2.
- Multi-source candidate pool beyond NASA + Wikimedia. ESA / JAXA / ROSCOSMOS portal scrapers deferred to v2.1.
- Video scoring. Out of scope; videos remain human-curated.
- Texture / logo / agency-asset scoring. These are static, human-picked, and don't benefit from vision scoring.
- Per-user image-quality preferences. No runtime user knobs; the build picks the variant by container aspect ratio.
- Cropping the source images in-place. Variants are NEW files; the source survives untouched (lightbox + future re-crop).
- Modifying
image-provenance.jsonschema. Per Marko's "evolution, new layer on" directive. ADR-047 stays untouched. - Replacing
validate-data.ts's existing fail-closed image-provenance check. That gate stays as-is; v2 adds an OPTIONAL check.
§success-criteria
Editorial:
- ≥ 90 % of mission hero images post-v2 show the relevant spacecraft/surface (verified by Marko + 1 reviewer pass on all 30 missions).
- ≤ 5 % of gallery images post-v2 are categorised
peopleordiagram(manifest-level audit). - ≥ 85 % of mission hero images render with the focal subject inside the visible crop area at the desktop hero aspect ratio (16:9).
Technical: 4. First-build cache-cold rebuild < 30 min; cached rebuild < 60 s. 5. First-build cost ≤ $80 (sized for ~1345 entries on Sonnet 4.6 with 5 % overhead). 6. Mobile (Capacitor) fleet-gallery bucket drops from ~120 MB → ~90 MB (target ~30 MB savings via 1:1 pre-cropped variants). 7. No regression in PRD-015 M11 ceiling (~150 MB Capacitor install). 8. validate-data.ts continues to enforce ADR-047 image-provenance integrity unchanged.
Operational: 9. Marko can npm run images:audit-report + open the HTML in browser within 10 s. 10. Single-mission re-score --mission curiosity --force-score completes in < 2 minutes including API calls + sharp variant generation.
§dependencies
- PRD-015 / RFC-018 (mobile wrapper) must exist for the mobile-bundle savings (M9 + success criterion #6) to materialise. v2.0 ships independently; mobile benefit lands when the Capacitor build picks 1:1 variants.
- ADR-016 (build-time asset resolution) — v2.0 strictly respects: zero runtime third-party API calls.
- ADR-046 (asset pipeline) + ADR-047 (image-provenance manifest) — v2.0 leaves untouched. Sidecar joins by image-path key.
- Anthropic API SDK in build chain —
@anthropic-ai/sdkadded as devDependency. sharp— added as devDependency for build-time variant generation.
§resolved decisions
Resolved 2026-05-16 in conversation with Marko.
- Manifest model — RESOLVED: New sidecar layer (
static/data/image-vision.json) joining toimage-provenance.jsonby image-path key. ADR-047 untouched. Per Marko's "evolution, new layer on" directive. Vision pipeline can be backed out by deleting the sidecar without breaking anything else. - Vision model — RESOLVED: Claude Sonnet 4.6 (
claude-sonnet-4-6). Most accurate at subject/category/focal-point. ~$0.05/image; ~$67 first build for 1345 entries; ~$0 cached. - Corpus scope — RESOLVED: Whole corpus (1345 image-provenance entries). Maximum editorial coverage. First-build cost accepted.
- Smart-crop variants — RESOLVED: Pull into v2.0 (1:1 + 4:3 + 16:9 via
sharpat build time). Mobile bundle savings land immediately (~30 MB off fleet-gallery bucket). v2.0 is bigger, but lands the architectural value. - Human curation feedback loop — RESOLVED: New sidecar
image-curation.json(committed) + scoring prompt includes recent deny-list entries as in-context examples. Marko flags via audit-report → reason captured → next pipeline run treats image asscore: 0, rejected_by: "human"AND the model sees Marko's editorial bar in the prompt. Closes US-6. - Granular pipeline scoping — RESOLVED: 6 explicit CLI flags (
--mission,--agency,--source,--fleet-asset,--segment,--all). No implicit default; operator picks scope. Iteration cost drops from $67 (full) to $0.50 (single mission). Closes US-7.
§open questions
- Score threshold calibration — confirm
>= 5? Original draft proposed it. After first scoring pass, Marko reviews the distribution + adjusts. Implementation-time decision. - PRD-016 audio asset hosting parallel — does the vision sidecar live in
static/data/(alongsideimage-provenance.json) or in a newstatic/data/vision/subdirectory? Recommendstatic/data/image-vision.jsonflat (matches existing convention). @anthropic-ai/sdkversion + token-counting accuracy — Sonnet 4.6 is current as of 2026-05; verify SDK version supports it (likely>=0.30.0). Implementation-time check.- CI cost-cap policy. What's the per-CI-run hard cap? PRD-016 baked $50/$200; this v2 inherits the same cost ledger pattern. Confirm same thresholds apply or set tighter for image-vision specifically (since first build can hit $67 alone).
- Audit report retention. v2.0 auto-generates
static/audit-report.htmlper scoring run. Should we retain a history (last 10 builds for diff) or always overwrite? Recommend overwrite for v1, history in v2.1.
PRD-018 · Orrery · Image Pipeline v2 · Drafted 2026-05-16 · Closes-into-RFC-022