PRD-018 · Image Pipeline v2 — vision-model scoring + smart cropping

Status · Draft v0.4 (all 4 v2.0 architectural decisions resolved 2026-05-16) Date · 2026-05-16 Owner · Marko Closes into · RFC-022 Slice gate · v1.x (after PRD-015 mobile + PRD-016 audio + PRD-017 sensory)

Why this is a PRD. The decision to add vision-model scoring + smart cropping to the asset pipeline changes the cost model (paid Anthropic Vision API calls per build, ~$67 first-build / ~$0 cached), the build duration (sharp pre-cropping at 3 aspect ratios for ~1100 curated images = real CI minutes), the editorial trust model (model-picked vs human-picked imagery), and the mobile bundle (smart-crop variants shave ~30 MB off the fleet-gallery bucket). It is purely additive — the existing image-provenance.json (ADR-047, 1345 entries) stays untouched, the existing curation flow is preserved, and the v2 layer is a new sidecar that can be backed out without breaking what ships today.

§why

The current image pipeline (scripts/fetch-assets.ts + ADR-016 + ADR-046 + ADR-047) ships 1345 provenance-tracked images. It works — every image has a license, a credit, a last-verified date, and is enforced by validate-data.ts as a fail-closed gate. The system is healthy.

What it does not do: tell us anything about the visual content of the images. Two specific failure modes recur in PR review:

Wrong image picked. A NASA Images API result returns a press conference photo when we wanted Curiosity on Mars. The metadata search ranks by tags, not by what's in the frame. We catch these by hand; we miss some.
Right image, wrong crop. A clean wide-shot of a spacecraft has the spacecraft in the right third. CSS object-fit: cover with default centred object-position chops the spacecraft off. Subject lost.

PRD-018 / RFC-022 fix both at build time, without touching the provenance pipeline that already works. The v2 layer adds:

Per-image vision-API scoring (Claude Sonnet 4.6) — each image gets a 1-10 score, a one-sentence subject description, a category (one of nine), and a focal-point coordinate (x, y as 0.0–1.0).
Smart-crop variants — at build time, sharp generates 1:1 + 4:3 + 16:9 pre-cropped variants using the focal point as the crop anchor. Three variants per source image; mobile build picks 1:1 for fleet galleries, web picks 16:9 for hero, 4:3 for cards.
A new sidecar manifest — static/data/image-vision.json keyed by image path. Joins to image-provenance.json (ADR-047) at runtime; never modifies it. Backing out v2 = deleting the sidecar; the existing image pipeline keeps working.

This is evolution, not replacement. Per Marko's directive: "evolution of what we have now, not to change anything unless we're evolving, so a new layer on." ADR-047 stays. The sidecar is the new layer.

§audiences

Audience	Why this helps them
Curious learner	Mission galleries actually show what they came to see. The hero crop on `/missions/curiosity` shows the rover, not a press photo of an admin office.
Educator / journalist	Reliable visual-quality bar across the corpus. Cite an Orrery page knowing the imagery isn't accidental.
Editor / curator (Marko)	Fewer manual review iterations. Audit-report HTML surfaces every candidate the model considered + the score that picked the winner. Easy to spot a bad pick and tune the prompt.
Mobile audience (post PRD-015)	~30 MB lighter Capacitor bundle thanks to 1:1 pre-cropped fleet-gallery variants. Fits inside the ~85 MB target with headroom.

§what's already shipped (image-pipeline-readiness inventory)

Capability	Status	Source
`scripts/fetch-assets.ts` (NASA Images API + Wikimedia + agency portals fetch)	shipped	ADR-016 (build-time asset resolution)
`static/data/image-provenance.json` (1345 entries, license + credit + last-verified)	shipped	ADR-046 + ADR-047
`validate-data.ts` enforces image provenance integrity (fail-closed gate)	shipped	ADR-047 Milestone C
Asset-size cap (8 MiB per image, workbox precache cap)	shipped	validate-data.ts
Mission galleries render via runtime fetch from NASA Images API	shipped (but a v2 candidate to remove — see M4)	gallery components
Hero photos use CSS `object-fit: cover` with default centred `object-position`	shipped (but the failure mode v2 fixes — see M2)	hero components
MOBILE=1 build env (lazy locales + thumbnail tier per RFC-018 §4)	planned (v0.8)	RFC-018 §4

§goal

Ship a vision-pipeline v2 layer that runs at build time, joins to the existing provenance manifest by image path, and produces per-image scoring + focal-point + smart-crop variants for every image in the corpus (~1345 entries). Frontend renders use the new manifest for object-position + variant selection; v2 layer can be backed out by deleting the sidecar without breaking anything else.

v2.0 ship gate = scoring + focal-point + sharp pre-cropping (1:1 + 4:3 + 16:9) + frontend integration + audit report HTML.

v2.1 = multi-source candidate pool (extend fetch-assets.ts to ESA + JAXA + ROSCOSMOS portals beyond NASA + Wikimedia).

v2.2 = manual override UI (operator-tuned scoring overrides for edge cases the model gets wrong).

§user stories

US-1 — Editorial trust on hero images. A visitor opens /missions/curiosity and the hero image is unmistakably Curiosity on Mars. Not a press conference; not a diagram. The vision pipeline rejected the press photo at score < 5.

US-2 — Subject preserved across aspect ratios. A wide-shot of Saturn V is centred on the rocket, not on empty sky to the left. Hero (16:9) shows the rocket in the centre third; gallery card (4:3) shows it framed; mobile thumbnail (1:1) is tightly cropped on the rocket's middle stage. All three crops use the same focal point from the vision API.

US-3 — Mobile bundle savings. The Capacitor build (PRD-015 / RFC-018) bundles the 1:1 pre-cropped variants of fleet-gallery images at ~256 px instead of the full hero quality. ~30 MB saved off the existing 120 MB fleet-gallery bucket on mobile.

US-4 — Editor audit loop. Marko runs npm run images:audit-report, opens audit-report.html (gitignored), and sees every candidate per mission with: thumbnail, score, category, subject, focal-point crosshair, selection status, reject reason if rejected. Spots a bad pick → adjusts the gallery_query for that mission → re-runs the pipeline → cache hits everywhere except the changed query.

US-5 — Build-time only, zero runtime API exposure. Anthropic Vision API key never reaches the browser. All scoring happens at build. Frontend reads the static image-vision.json sidecar.

US-6 — "This image is bad" — human-feedback curation loop. From the audit report, Marko clicks a "flag" button on any image, types a reason ("subject is occluded by hardware caption", "wrong rover", "looks like a render"), and the image is added to a curation deny-list (static/data/image-curation.json). Subsequent pipeline runs treat that image as score: 0, rejected_by: "human". The flag + reason feed back into the scoring prompt as an example ("avoid this kind of result, see deny-list reason X") so the model learns Marko's editorial preferences over time without retraining.

US-7 — Granular pipeline scoping. Marko runs the pipeline against subsets, not always the full corpus: --mission curiosity, --agency NASA, --source nasa-images-api, --fleet-asset crew-portraits, --segment fleet-galleries. Each subset reuses the cache for everything outside the subset; inside the subset, scoring + cropping re-runs (or stays cached if hash-unchanged). Full rebuild (--all) is the catch-all but never the default. Cost-per-iteration drops from $67 → $0.50 when iterating on a single mission.

§must-have requirements

ID	Requirement
M1	Vision-API scoring at build time. Claude Sonnet 4.6 (`claude-sonnet-4-6`) scores every image candidate. Returns: `score` (1-10), `subject` (one sentence), `category` (one of: `spacecraft` / `surface` / `launch` / `orbital` / `hardware` / `people` / `diagram` / `render` / `other`), `focal_point` ({ x: 0.0-1.0, y: 0.0-1.0 }), `reject_reason` (string or null).
M2	Selection threshold: `score >= 5` AND category not in {`people`, `diagram`} for general use. Status-aware: PLANNED missions accept `render` (5-7 acceptable). FLOWN/ACTIVE missions reject `render` outright.
M3	Hero + gallery selection algorithm: hero = highest-scoring `spacecraft` or `surface` candidate (or fallback to highest non-rejected). Gallery = next 8 by score, with category diversity (max 4 per category in 9-image gallery).
M4	Smart-crop variants generated at build time via `sharp`. Three variants per source image: `1:1` (square, mobile thumbnails), `4:3` (gallery cards), `16:9` (hero). Crop anchor = `focal_point` from vision API. Each variant is a separate file: `{base}.1x1.jpg`, `{base}.4x3.jpg`, `{base}.16x9.jpg`. Source image kept (used for full-quality lightbox).
M5	New sidecar manifest: `static/data/image-vision.json`. Schema: `{ "<image-path>": { score, subject, category, focal_point, variants: { "1x1": "...", "4x3": "...", "16x9": "..." } } }`. Joins to `image-provenance.json` by image-path key. Does not modify image-provenance.json.
M6	`validate-data.ts` gains a NEW OPTIONAL check: if a v2-scored image lacks a manifest entry, log a warning (not a fail). Fail-closed only if `image-vision.json` exists but is malformed. The existing fail-closed image-provenance gate stays unchanged.
M7	Hash-based cache. Per-image cache key = SHA-256 of (source-image-bytes + scoring-prompt-version). Per-variant cache key = SHA-256 of (source-image-bytes + crop-spec). Cache lives at `.image-cache/` (gitignored). Unchanged source + unchanged prompt = no API call, no re-crop.
M8	Build-time budget: full cache-cold rebuild < 30 minutes wall clock on Marko's M-series Mac. Cache-warm rebuild < 60 seconds. Per-image cost ~$0.05 (Sonnet) → ~$67 first build for 1345 images, ~$0 cached builds.
M9	Frontend reads `image-vision.json` at build time (Vite static import). Hero / card / thumbnail components select the correct variant by container aspect ratio + viewport. Mobile (PRD-015 wrapper) picks 1:1 for fleet galleries; desktop picks 4:3 for cards + 16:9 for hero.
M10	Runtime NASA Images API calls eliminated for any view served by `image-vision.json`. Gallery fully offline from manifest + bundled images. Removes the "LIVE" indicator from the gallery component.
M11	`ANTHROPIC_API_KEY` setup is a documented v2.0 prerequisite. Anthropic API is NOT covered by Claude Code subscription (announced 2026-05) — v2 needs its own paid API access. Setup: GitHub Actions secret + local `~/.zshrc` export or `.env.local`. Same key as PRD-016 audio pipeline (Anthropic billing is per-account). Documented in `docs/guides/image-pipeline-v2.md`. Auth failures don't fail the build (M13 fallback applies).
M12	Audit report HTML auto-generated on every scoring run: `static/audit-report.html` (gitignored — dev-only). Shows every candidate per image with: thumbnail at 192 px, score, category, subject, focal-point crosshair, selection status, reject reason if rejected. Marko opens locally to spot tuning opportunities.
M13	Fallback behaviour: zero-acceptable-images for an image slot → use highest-scoring candidate regardless of threshold + flag with `"fallback": true` in manifest entry. Build never fails closed because of a vision-API call result. API outage during build → fall back to last cached scores; build continues.
M14	Per-image processing is idempotent. Running the pipeline twice with no source changes produces a byte-identical `image-vision.json` and byte-identical variant files.
M15	Human curation deny-list. A new sidecar `static/data/image-curation.json` (committed to the repo) lists images Marko has flagged as bad, each with a one-line reason. Pipeline reads on every run; flagged images are scored as `score: 0, rejected_by: "human"` regardless of model output. The deny-list survives cache rebuilds.
M16	Curation feedback loop in scoring prompt. The vision-API prompt includes the most recent ~5 deny-list entries as in-context examples ("avoid: subject occluded by caption / wrong rover / looks like a render"). Updates Marko's editorial bar over time without re-training the model. Deny-list size capped at 100 entries — older entries rotate out of the prompt context but stay in the deny-list.
M17	Audit report flag UI. The audit-report HTML has a "🚩 Flag this image" button on each candidate. Click → opens a small form to enter a reason → POSTs to a tiny `scripts/flag-image.ts` helper that appends to `image-curation.json`. The audit-report is HTML-only (no server); the helper runs as `node scripts/flag-image.ts` from a clipboard payload. Operator workflow: click flag → reason copied to clipboard → run helper → commit.
M18	Granular pipeline scoping flags. CLI supports: `--mission <id>`, `--agency <name>` (NASA / ESA / JAXA / ROSCOSMOS / CNSA / ISRO), `--source <name>` (nasa-images-api / wikimedia-commons / agency-portal-* / curated-url), `--fleet-asset <type>` (heroes / patches / portraits / galleries), `--segment <name>` (mission-galleries / fleet / agency-logos / science-diagrams / planet-textures), `--new-only` (process only entries that don't yet have a manifest record OR whose hash inputs changed), `--changed-since <git-ref>` (process only entries whose source files were modified since the given git ref), `--all` (catch-all, full corpus). Subsets reuse cache outside their scope; inside their scope, scoring + cropping re-runs unless hash-unchanged.
M19	Default CLI behaviour = incremental. Running `npm run images:score` with no flags is equivalent to `--new-only`: scans `image-provenance.json` for entries missing from `image-vision.json` (or whose hash inputs changed) and processes only those. Routine workflow ("I added new images, run the pipeline") costs $0–$5 per run, never $67. Cost discipline is preserved — `--all` is now the explicit opt-in for "re-score the whole corpus" (used after a prompt-rubric change or model swap).
M20	Architectural guarantee — never reprocess unchanged entries. Cache invalidation triggers are EXPLICIT and finite: (a) source image bytes change, (b) `scoring_prompt_version` constant bumps, (c) `vision_model` config string changes, (d) image is added to `image-curation.json`, (e) `sharp` major version upgrades (variant cache only). Time-based invalidation (e.g. "re-score everything monthly") is explicitly NOT a trigger. Routine builds must be effectively free for unchanged entries.
M21	Single-mission iteration cost: `--mission curiosity` scores ≤ 15 images at ~$0.05 each = ≤ $0.75 per iteration. Single-agency iteration: `--agency JAXA` scores ~80 images = ~$4. Whole corpus (`--all`): ~$67 first build, ~$0 cached. Incremental default (`--new-only` implicit): $0 if nothing changed, $0.05–$5 typical when a few entries land.

§should-have requirements

ID	Requirement
S1	`--mission <id>` flag on the pipeline CLI to score / re-crop a single mission's images only (faster iteration loop).
S2	`--force-score` flag to invalidate the scoring cache (re-run vision API even on cached entries — useful when prompt is updated).
S3	`--skip-scoring` flag to run only the variant-cropping path on existing scores (useful when only the crop logic changed).
S4	Scoring telemetry: every API call logged to `static/data/image-vision-cost-ledger.json` (similar shape to PRD-016 audio cost ledger). Tracks per-build $ spend; threshold soft-warn at $50/build, hard-halt at $200/build.
S5	Audit report shows per-image cost ledger entries inline so Marko can see "this build spent $0.45 on 9 candidates for this mission."

§will-not-have (v2.0)

Manual override UI. Deferred to v2.2.
Multi-source candidate pool beyond NASA + Wikimedia. ESA / JAXA / ROSCOSMOS portal scrapers deferred to v2.1.
Video scoring. Out of scope; videos remain human-curated.
Texture / logo / agency-asset scoring. These are static, human-picked, and don't benefit from vision scoring.
Per-user image-quality preferences. No runtime user knobs; the build picks the variant by container aspect ratio.
Cropping the source images in-place. Variants are NEW files; the source survives untouched (lightbox + future re-crop).
Modifying image-provenance.json schema. Per Marko's "evolution, new layer on" directive. ADR-047 stays untouched.
Replacing validate-data.ts's existing fail-closed image-provenance check. That gate stays as-is; v2 adds an OPTIONAL check.

§success-criteria

Editorial:

≥ 90 % of mission hero images post-v2 show the relevant spacecraft/surface (verified by Marko + 1 reviewer pass on all 30 missions).
≤ 5 % of gallery images post-v2 are categorised people or diagram (manifest-level audit).
≥ 85 % of mission hero images render with the focal subject inside the visible crop area at the desktop hero aspect ratio (16:9).

Technical: 4. First-build cache-cold rebuild < 30 min; cached rebuild < 60 s. 5. First-build cost ≤ $80 (sized for ~1345 entries on Sonnet 4.6 with 5 % overhead). 6. Mobile (Capacitor) fleet-gallery bucket drops from ~120 MB → ~90 MB (target ~30 MB savings via 1:1 pre-cropped variants). 7. No regression in PRD-015 M11 ceiling (~150 MB Capacitor install). 8. validate-data.ts continues to enforce ADR-047 image-provenance integrity unchanged.

Operational: 9. Marko can npm run images:audit-report + open the HTML in browser within 10 s. 10. Single-mission re-score --mission curiosity --force-score completes in < 2 minutes including API calls + sharp variant generation.

§dependencies

PRD-015 / RFC-018 (mobile wrapper) must exist for the mobile-bundle savings (M9 + success criterion #6) to materialise. v2.0 ships independently; mobile benefit lands when the Capacitor build picks 1:1 variants.
ADR-016 (build-time asset resolution) — v2.0 strictly respects: zero runtime third-party API calls.
ADR-046 (asset pipeline) + ADR-047 (image-provenance manifest) — v2.0 leaves untouched. Sidecar joins by image-path key.
Anthropic API SDK in build chain — @anthropic-ai/sdk added as devDependency.
sharp — added as devDependency for build-time variant generation.

§resolved decisions

Resolved 2026-05-16 in conversation with Marko.

Manifest model — RESOLVED: New sidecar layer (static/data/image-vision.json) joining to image-provenance.json by image-path key. ADR-047 untouched. Per Marko's "evolution, new layer on" directive. Vision pipeline can be backed out by deleting the sidecar without breaking anything else.
Vision model — RESOLVED: Claude Sonnet 4.6 (claude-sonnet-4-6). Most accurate at subject/category/focal-point. ~$0.05/image; ~$67 first build for 1345 entries; ~$0 cached.
Corpus scope — RESOLVED: Whole corpus (1345 image-provenance entries). Maximum editorial coverage. First-build cost accepted.
Smart-crop variants — RESOLVED: Pull into v2.0 (1:1 + 4:3 + 16:9 via sharp at build time). Mobile bundle savings land immediately (~30 MB off fleet-gallery bucket). v2.0 is bigger, but lands the architectural value.
Human curation feedback loop — RESOLVED: New sidecar image-curation.json (committed) + scoring prompt includes recent deny-list entries as in-context examples. Marko flags via audit-report → reason captured → next pipeline run treats image as score: 0, rejected_by: "human" AND the model sees Marko's editorial bar in the prompt. Closes US-6.
Granular pipeline scoping — RESOLVED: 6 explicit CLI flags (--mission, --agency, --source, --fleet-asset, --segment, --all). No implicit default; operator picks scope. Iteration cost drops from $67 (full) to $0.50 (single mission). Closes US-7.

§open questions

Score threshold calibration — confirm >= 5? Original draft proposed it. After first scoring pass, Marko reviews the distribution + adjusts. Implementation-time decision. 5a. Adopt the same featured + demoted override shape PRD-020 / RFC-023 lock for launches-curation.json. Per Marko's 2026-05-19 directive: the image-curation file should follow the same heuristic-tier + curated-override pattern (rather than the current deny-list-only shape). Concretely: image-curation.json gains a featured list (force-include candidates the model under-scored) in addition to the existing deny-list (which becomes the demoted semantics). Pipeline reads both, applies after scoring, before selection. v2.0 candidate; not a blocker.
PRD-016 audio asset hosting parallel — does the vision sidecar live in static/data/ (alongside image-provenance.json) or in a new static/data/vision/ subdirectory? Recommend static/data/image-vision.json flat (matches existing convention).
@anthropic-ai/sdk version + token-counting accuracy — Sonnet 4.6 is current as of 2026-05; verify SDK version supports it (likely >=0.30.0). Implementation-time check.
CI cost-cap policy. What's the per-CI-run hard cap? PRD-016 baked $50/$200; this v2 inherits the same cost ledger pattern. Confirm same thresholds apply or set tighter for image-vision specifically (since first build can hit $67 alone).
Audit report retention. v2.0 auto-generates static/audit-report.html per scoring run. Should we retain a history (last 10 builds for diff) or always overwrite? Recommend overwrite for v1, history in v2.1.

PRD-018 · Orrery · Image Pipeline v2 · Drafted 2026-05-16 · Closes-into-RFC-022

§close — v0.7.0 (2026-05-24)

All 12 sub-slices of epic #148 shipped in v0.7.0:

Slice	Status	Artefact
S1 VisionProvider + Anthropic Sonnet 4.6	DONE	`scripts/vision/{provider,anthropic}.ts`
S2 Per-image cache + scoring prompt + 9-category schema	DONE	`scripts/vision/{cache,prompt}.ts`
S4 `image-vision.json` sidecar + frontend loader	DONE	`static/data/image-vision.json` + `src/lib/image-vision.ts`
S5 Granular CLI scope flags (`--segment`, `--new-only`, etc.)	DONE	`scripts/score-images.ts`
S6 `image-curation.json` + `flag-image.ts` + `audit-report.html`	DONE	NEW `scripts/flag-image.ts` + `scripts/build-audit-report.ts` + `static/data/image-curation.json`
S7 Curation feedback in scoring prompt (top-5 deny reasons → in-context bias)	DONE	`scripts/score-images.ts` threads to `scripts/vision/prompt.ts`
S8 `validate-data.ts` schema sanity for v2 manifests	DONE	`scripts/validate-data.ts` (image-vision + image-curation structural checks)
S9 MOBILE=1 picks 1:1 variants for fleet galleries	DONE	`src/lib/image-vision.ts` `isMobile` selector
S10 Cost ledger ($50 soft / $200 hard 30-day)	DONE	NEW `src/lib/cost-ledger.ts` + `static/data/cost-ledger.json`
S11 Whole-corpus scoring run	DONE	Full 1414-entry corpus scored; ledger updated
S12 ANTHROPIC_API_KEY operator + CI setup docs	DONE	`docs/guides/image-pipeline-v2.md` §Prerequisite
S13 v0.7 release gate	DEFER	Subsumed into Step 5 release-gates work

Whole-corpus cost actual (S11): ~$6 — well under the $80 ceiling in success-criterion #5 and well under the $50 soft threshold. Cache-cold for the 813 entries that hadn't been scored before; the existing 601 entries cache-hit at $0.

Success criteria status: technical #4-5 satisfied; #6-7 land with the Capacitor build (PRD-015 dependency). Editorial #1-3 require Marko's reviewer pass against the now-complete audit-report.html — operator pass deferred to post-tag review.

v0.7.0 wrap-up note: this PRD closes structurally. Open follow-ups (Open Q #5a featured + demoted shape; Open Q #9 audit-report retention/history) deferred to v2.1.

PRD-018 · Image Pipeline v2 — vision-model scoring + smart cropping ​

§why ​

§audiences ​

§what's already shipped (image-pipeline-readiness inventory) ​

§goal ​

§user stories ​

§must-have requirements ​

§should-have requirements ​

§will-not-have (v2.0) ​

§success-criteria ​

§dependencies ​

§resolved decisions ​

§open questions ​

§close — v0.7.0 (2026-05-24) ​