RFC-022 · Image Pipeline v2 — vision-model scoring, smart cropping, curation loop, granular scoping
Status: Draft v0.4 · 2026-05-16 · Closes: PRD-018
Why this is an RFC. The architecture binds every future asset fetch in the corpus (1345 provenance entries today, growing): the vision provider abstraction (so swapping Claude Sonnet → another vision model is config, not a rewrite), the cache-key shape that determines per-build cost ($67 cold vs $0 warm), the smart-crop variant layout that ripples through every hero/card/thumbnail component, the join model with
image-provenance.json(ADR-047 stays untouched per Marko's "evolution, new layer on" directive), the human-curation feedback loop (image-curation.jsondeny-list + in-context model bias), and the granular CLI scope flags (--mission,--agency,--source,--fleet-asset,--segment,--all) that determine whether iteration costs $0.50 or $67. These are interlocking commitments; one wrong cut early forces ugly retrofits later.
1 · Architecture overview
┌──────────────────────────────────────────────────┐
│ scripts/fetch-assets.ts │ EXISTING (ADR-016 / ADR-046)
│ Phase 1 — fetch candidates from sources │ ← unchanged
│ Phase 2 — download to local cache │ ← unchanged
└──────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ scripts/score-images.ts │ NEW (Phase 3)
│ • read image-curation.json deny-list │
│ • for each candidate: cache-key check │
│ • cache-miss → Anthropic Vision API call │
│ • write per-image score / focal / category │
│ to .image-cache/{hash}.json │
│ • subset of corpus per CLI scope flags │
└──────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ scripts/crop-variants.ts │ NEW (Phase 4)
│ • for each scored image: read focal_point │
│ • sharp() crops to 1:1 + 4:3 + 16:9 │
│ • cache-key = SHA-256(source + focal + ratio) │
│ • output: {base}.{ratio}.jpg next to source │
└──────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ scripts/build-image-vision-manifest.ts │ NEW (Phase 5)
│ • merge per-image cache files │
│ • write static/data/image-vision.json │
│ • write static/data/audit-report.html │
│ • write static/data/image-vision-cost-ledger.json │
└──────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Frontend (Vite static import) │
│ • imports image-vision.json + image-provenance │
│ • joins by image-path key │
│ • picks variant by container aspect + viewport │
│ • applies object-position from focal_point │
└──────────────────────────────────────────────────┘Five new scripts. Zero modifications to image-provenance.json (ADR-047 stays). Zero runtime API calls (ADR-016 stays). All side-effects are file I/O + HTTPS fetches at build time.
2 · Two new sidecar files (the v2 layer)
2.1 · static/data/image-vision.json
Generated by build-image-vision-manifest.ts. Keyed by image-path string (matches the keys in image-provenance.json for the join):
{
"version": 1,
"generated_at": "2026-05-16T14:32:11Z",
"vision_provider": "anthropic",
"vision_model": "claude-sonnet-4-6",
"prompt_version": "v1.0.0",
"entries": {
"static/images/missions/curiosity-hero.jpg": {
"score": 9,
"subject": "Curiosity rover wheel + Mars surface, sol 2370",
"category": "surface",
"focal_point": { "x": 0.42, "y": 0.58 },
"variants": {
"1x1": "static/images/missions/curiosity-hero.1x1.jpg",
"4x3": "static/images/missions/curiosity-hero.4x3.jpg",
"16x9": "static/images/missions/curiosity-hero.16x9.jpg"
},
"rejected_by": null,
"fallback": false,
"scored_at": "2026-05-16T14:30:02Z",
"scoring_cost_usd": 0.0512
},
"static/images/missions/saturn-v-launch.jpg": { /* ... */ }
}
}Key invariants:
- The keys are EXACTLY the same paths as
image-provenance.json(the join is by string equality). variantspaths point to actual files on disk (validated by the new validate-data check, M6).rejected_byisnull(model accepted) OR a string ("score-below-threshold","category-people","category-diagram","human"). Whenrejected_byis set, the entry is still in the manifest — frontend uses this to skip the image AND to surface "rejected" in the audit report.fallback: truewhen zero acceptable candidates existed and the highest-scoring was used despite threshold.
2.2 · static/data/image-curation.json
Manually maintained (committed to the repo) — the human-feedback deny-list:
{
"version": 1,
"deny_list": [
{
"image_path": "static/images/missions/some-bad-pick.jpg",
"reason": "Image is a press conference photo; subject is not the spacecraft",
"flagged_by": "marko",
"flagged_at": "2026-05-16T14:45:00Z"
},
{
"image_path": "static/images/fleet/wrong-rover-thumbnail.jpg",
"reason": "Shows Curiosity but caption claims Perseverance — wrong identification",
"flagged_by": "marko",
"flagged_at": "2026-05-15T09:12:00Z"
}
]
}Pipeline reads on every run; flagged images are scored as score: 0, rejected_by: "human" regardless of model output. Recent ~5 entries are injected into the scoring prompt as in-context "avoid this" examples (M16). Older entries stay in the deny-list but rotate out of the prompt context once the list grows past 100.
The audit-report HTML's "🚩 Flag" button doesn't write to disk directly (no server). It generates a clipboard payload (one JSON object) that the operator paste-runs through node scripts/flag-image.ts < /path/to/clipboard to append to the deny-list. Workflow: click flag → fill reason → text copied to clipboard → run helper → commit.
2.3 · Why two sidecars, not one big file
image-vision.jsonis machine-generated, large (~1345 entries), changes on every scoring run, deterministic.image-curation.jsonis human-authored, small (~10–100 entries over time), changes rarely, intentional.
Mixing them in one file means every model-rerun rewrites the file Marko hand-authored. Separating them keeps the human-edited file diff-friendly and source-controlled cleanly.
3 · Vision provider abstraction
Same pattern as PRD-016/RFC-019 TtsProvider — a thin interface so a future model swap (or escape to OpenAI Vision / Google Vision) is config, not a rewrite.
// scripts/vision/provider.ts
export interface VisionProvider {
readonly name: 'anthropic' | 'openai' | 'google';
readonly model: string;
score(input: {
imageBytes: Buffer;
imagePath: string;
contextHint?: string; // e.g. "Mars rover mission, NASA, ACTIVE"
denyListExamples: string[]; // last ~5 deny-list reasons as in-context bias
}): Promise<{
score: number; // 1–10
subject: string;
category: 'spacecraft' | 'surface' | 'launch' | 'orbital'
| 'hardware' | 'people' | 'diagram' | 'render' | 'other';
focal_point: { x: number; y: number };
reject_reason: string | null;
cost_usd: number;
}>;
}v2.0 ships with Anthropic Sonnet 4.6 (scripts/vision/anthropic.ts). Adding a new provider is implementing the interface + config in a future voices.json-style mapping.
4 · Scoring prompt (vision-API call shape)
Each call sends:
- The image bytes (multipart upload via Anthropic SDK)
- A structured system prompt describing the scoring rubric
- A user message with the per-image context hint + deny-list examples
System prompt outline:
You are an editorial image-quality scorer for an interactive space-history product (Orrery).
Your job: rate each image on visual quality + subject relevance for the use case.
Return STRICT JSON ONLY:
{
"score": <1-10>,
"subject": "<one sentence describing what's in the frame>",
"category": "<one of: spacecraft | surface | launch | orbital | hardware | people | diagram | render | other>",
"focal_point": { "x": <0.0-1.0>, "y": <0.0-1.0> },
"reject_reason": null OR "<short reason>"
}
SCORING RUBRIC:
9-10: Iconic, museum-quality. Pristine subject, clean composition, high resolution.
7-8: Strong editorial pick. Subject clear and centered or compositional, good resolution.
5-6: Acceptable. Subject visible, decent quality, some compositional weakness.
3-4: Marginal. Subject hard to read, low resolution, or composition flawed.
1-2: Reject. Wrong subject, bad quality, or category we don't surface.
CATEGORY RULES:
- "people": reject for general use (we surface space hardware, not press conferences)
- "diagram": reject for general use (we have hand-authored diagrams; raw infographics are noise)
- "render": ACCEPT for PLANNED missions (artist concepts), REJECT for FLOWN/ACTIVE
- all others: accept by score
FOCAL POINT:
Locate the visual center of the subject (rover, rocket, planet, instrument).
Express as { x: 0.0-1.0 horizontal, y: 0.0-1.0 vertical } from top-left.
This will be used as the crop anchor for 1:1, 4:3, 16:9 variants.
CONTEXT (mission / asset / agency):
<context_hint from caller>
EDITORIAL DENY-LIST (recent operator feedback — avoid producing similar results):
<last 5 deny-list reasons, one per line>User message:
Image: <inline base64 or attached>
Path: <imagePath>Returns the JSON shape above. Pipeline parses + writes to per-image cache file.
5 · Cache strategy
5.1 · Per-image cache key
SHA-256(
imageBytes // changes when source image changes
+ scoringPromptVersion // changes when prompt rubric is updated
+ visionModel // changes when model is swapped
)Cache file: .image-cache/scores/{hash16}.json (16-hex-char prefix of SHA-256). Contents = exact JSON returned by VisionProvider.score() plus a timestamp.
5.2 · Per-variant cache key
SHA-256(
imageBytes // changes when source changes
+ focalPoint // changes when scoring re-rates the focal point
+ targetAspect // 1x1 | 4x3 | 16x9
+ sharpVersion // changes when sharp dependency upgrades — invalidates all crops
)Cache file: .image-cache/variants/{hash16}.{ratio}.jpg (the cropped output, ready to copy to static/).
5.3 · Curation deny-list cache invalidation
When image-curation.json changes (file mtime newer than the per-image cache), the SCORING cache for the deny-listed image is invalidated AND the recent-5-examples in the prompt change → next run re-scores any images that landed within the prompt-context window. This is intentional — a new "avoid this" example should ripple through similar image scoring.
5.4 · Architectural guarantee — never reprocess unchanged entries
Cache invalidation triggers are EXPLICIT and finite:
| Trigger | Effect |
|---|---|
| Source image bytes change (file edited / replaced) | Per-image score cache invalidated; per-variant cache invalidated for that image |
scoring_prompt_version constant bumped (in scripts/vision/prompt.ts) | All score caches invalidated; variant caches unaffected |
vision_model config string changes (e.g. claude-sonnet-4-6 → claude-opus-5-0) | All score caches invalidated; variant caches unaffected |
Image added to image-curation.json deny-list | That image's score cache invalidated + the ~5 nearest in the prompt-context window are re-scored |
sharp major-version upgrade | All variant caches invalidated; score caches unaffected |
Time-based invalidation (e.g. "re-score everything monthly", "re-process if older than X") is explicitly NOT a trigger. Routine builds must be effectively free for unchanged entries. The --all --force-score command bypasses the cache entirely; that's the only way to forcibly reprocess unchanged entries, and it's an explicit operator gesture.
5.5 · Cost-per-iteration table
| Operation | Vision API calls | Sharp crops | Wall clock | Cost |
|---|---|---|---|---|
| Default (no flags) — incremental, nothing changed | 0 | 0 | ~30 s | $0 |
| Default (no flags) — 5 new images added | 5 | 15 | ~20 s | ~$0.25 |
| Default (no flags) — prompt-version bump | ~1345 | 0 | ~22 min | ~$67 (rare; explicit prompt edit triggered it) |
--all cold (true first build) | ~1345 | ~4035 | ~25 min | ~$67 |
--all cached (no source changes) | 0 | 0 | ~30 s | $0 |
--all --force-score | ~1345 | ~4035 | ~25 min | ~$67 |
--mission curiosity cold | ~15 | ~45 | ~30 s | ~$0.75 |
--mission curiosity cached | 0 | 0 | ~5 s | $0 |
--agency NASA cold | ~600 | ~1800 | ~12 min | ~$30 |
--source nasa-images-api cold | ~800 | ~2400 | ~16 min | ~$40 |
--fleet-asset patches cold | ~60 | ~180 | ~3 min | ~$3 |
--segment fleet-galleries cold | ~440 | ~1320 | ~9 min | ~$22 |
--changed-since HEAD~5 (typical PR diff) | 5–20 | 15–60 | ~30–60 s | ~$0.25–$1 |
| Single image flagged (deny-list updated) | 1 + ~5 nearby (prompt context shift) | ~18 | ~10 s | ~$0.30 |
The default CLI behaviour is incremental (--new-only implicit). --all exists for the rare full-rebuild case (prompt change, model swap, true first build). Routine builds are effectively free.
6 · Granular CLI scope flags + incremental default
# DEFAULT — incremental. Routine workflow. $0–$5 typical, $0 if nothing changed.
npm run images:score # implicit --new-only
# Equivalent explicit form
npm run images:score -- --new-only
# Git-aware incremental — process anything modified since a ref
npm run images:score -- --changed-since HEAD~5
npm run images:score -- --changed-since main # vs. the last release branch
# Per mission — fastest iteration loop when you know exactly what changed
npm run images:score -- --mission curiosity
npm run images:score -- --mission curiosity --force-score # bypass scoring cache
# Per agency
npm run images:score -- --agency JAXA
npm run images:score -- --agency NASA --skip-crops # only re-score, no variant regen
# Per source
npm run images:score -- --source nasa-images-api
npm run images:score -- --source wikimedia-commons
# Per fleet-asset type
npm run images:score -- --fleet-asset heroes
npm run images:score -- --fleet-asset patches
# Per content segment
npm run images:score -- --segment mission-galleries
# Catch-all — full corpus (~$67 cold; operator must opt in explicitly)
npm run images:score -- --all
npm run images:score -- --all --force-score # nuke cache + full rebuild6.1 · Default = --new-only (incremental)
Running with no flags processes ONLY entries that need processing:
- Images present in
image-provenance.jsonbut absent fromimage-vision.json(newly added). - Images whose source bytes changed since the last cache entry.
- Images whose scoring cache was invalidated by a prompt-version bump, model swap, or curation deny-list update (per §5.4).
If nothing changed, the run completes in ~30 seconds doing zero API calls and zero sharp work.
This is the routine workflow. --all exists, but no longer the only catch-all — operators reach for it only after a prompt-rubric or model change.
6.2 · --changed-since <git-ref>
Resolves to the set of files modified between <git-ref> and HEAD via git diff --name-only <ref>...HEAD, intersected with the keys in image-provenance.json. Useful for CI workflows that want to score only what a PR touched.
# In a GH Action
npm run images:score -- --changed-since "${{ github.event.pull_request.base.sha }}"6.3 · CLI implementation notes
- Each scope flag resolves to a list of image paths by reading
image-provenance.jsonand filtering on the matching field (subject_id,agency,source,asset_type,segment). - The
image-provenance.jsonschema must already carry these fields for filtering to work — verify at v2.0 implementation start. If any field is missing, that's a v2.0 prerequisite (an add-only schema bump in the existing manifest, not a structural change — falls within "evolution" rather than "change"). - Combinations are AND-joined:
--agency NASA --fleet-asset patchesscores NASA mission patches only. --new-onlyis also AND-joinable:--agency NASA --new-onlyprocesses NEW NASA images only (skip any NASA images already in the manifest).
7 · Frontend integration
7.1 · Manifest import
// src/lib/image-vision.ts
import vision from '$lib/../static/data/image-vision.json';
import provenance from '$lib/../static/data/image-provenance.json';
export function getImage(path: string) {
const v = vision.entries[path];
const p = provenance[path];
return {
src: path, // original (lightbox)
variant_1x1: v?.variants['1x1'], // mobile thumbnail
variant_4x3: v?.variants['4x3'], // gallery card
variant_16x9: v?.variants['16x9'], // hero
focal_point: v?.focal_point, // CSS object-position
score: v?.score,
subject: v?.subject, // alt text
rejected: v?.rejected_by !== null,
license: p?.license,
credit: p?.credit,
};
}7.2 · Component selection
| Use | Variant |
|---|---|
| Hero (desktop) | variant_16x9 |
| Hero (mobile) | variant_4x3 (better for portrait viewport) |
| Gallery card (desktop + mobile) | variant_4x3 |
| Mobile thumbnail / fleet gallery row | variant_1x1 |
| Lightbox / full-screen | original (no variant) |
Components apply object-position: {focal_point.x * 100}% {focal_point.y * 100}% when using object-fit: cover. Browser does the cropping at render time using the focal point as the centre.
7.3 · Mobile build picks 1:1 for fleet galleries
vite.config.ts MOBILE=1 branch (RFC-018 §4):
- Fleet gallery components use
variant_1x1instead of source. - Hero components use
variant_4x3instead ofvariant_16x9. - Net: fleet-gallery bucket drops from ~120 MB → ~90 MB on mobile (~30 MB win, success criterion #6 in PRD-018).
7.4 · Runtime NASA API removal (M10)
Components today call fetch('https://images-api.nasa.gov/...') for runtime gallery population. Post-v2.0:
- All gallery imagery is scored + cropped + bundled at build time.
- Runtime fetch removed entirely from gallery components.
- "LIVE" indicator (which signalled the runtime fetch) removed.
- Loss: galleries no longer auto-update with newly-released NASA imagery between deploys. Acceptable trade-off — mission galleries change rarely.
8 · Audit report HTML
Generated by build-image-vision-manifest.ts as static/data/audit-report.html (gitignored — dev-only artefact).
Structure: one section per image-provenance-key, showing all candidates considered (the API may return multiple per slot when variants exist), with score / category / focal-point crosshair overlay / selection status / reject reason / per-image cost / Flag button.
Marko opens locally (open static/data/audit-report.html after a build) to:
- Spot bad picks → click 🚩 → enter reason → paste-run
node scripts/flag-image.ts - Sanity-check focal-point placement on tricky compositions
- Watch the cost ledger build up over a multi-iteration session
The Flag button:
- Opens a small modal overlaid on the audit-report page.
- Pre-fills
image_pathfrom the candidate row. - User types reason + clicks Submit.
- Generates a JSON payload + copies it to the clipboard.
- Operator runs
node scripts/flag-image.ts(which reads stdin), the helper appends toimage-curation.json. - Operator commits the deny-list update.
No server. No write API. Clipboard is the bridge. Keeps the audit report a static file, deployable nowhere, leaks no secrets.
9 · Validate-data integration
scripts/validate-data.ts gains TWO new optional checks (not fail-closed by default — v2 is purely additive):
- Manifest existence + schema check. If
static/data/image-vision.jsonexists, validate it against an ajv schema (scripts/schemas/image-vision.schema.json). Malformed manifest = fail. Missing manifest = warn (v2 not deployed yet). - Variant file existence check. For every entry in the manifest, the three variant paths must exist on disk. Missing variant = fail (manifest references a file that wasn't generated).
- Curation deny-list schema check. If
static/data/image-curation.jsonexists, validate it. Malformed = fail. Missing = warn.
The EXISTING image-provenance check (the fail-closed gate from ADR-047) stays unchanged. v2's checks are NEW additions, not replacements.
10 · Failure modes + handling
| Failure | Detection | Handling |
|---|---|---|
| Anthropic API outage during scoring | HTTP 5xx from VisionProvider.score() | Retry with exponential backoff (3 attempts). Final failure → log + skip that image (cache file marked failed: true). Build continues; failed images get re-tried on next pipeline run. |
| API rate-limit | HTTP 429 | Sleep per Retry-After header, resume. |
| Cost ledger threshold breached during run | post-call check against image-vision-cost-ledger.json | Soft-warn at $50/build (continues); hard-halt at $200/build (pipeline exits non-zero, cache for completed images preserved, operator restarts after investigation). |
| Image bytes corrupt or unreadable | sharp exception during variant generation | Log + skip variant generation for that image (manifest entry retains variants: null); image still gets a score. |
| Invalid JSON returned by vision API | JSON parse fails | Retry with stricter prompt; second failure → mark failed: true; manual review. |
| Source image deleted between fetch and score | fs check before score call | Drop from manifest; warn in audit report. |
| Curation deny-list malformed | ajv validation fails | Pipeline exits with clear error pointing at the bad entry; fix the deny-list, re-run. |
Build never fails closed because of API result quality (only because of structural validation failures or hard cost-cap breach).
11 · Resolved decisions + open questions
Resolved 2026-05-16:
- Manifest model — RESOLVED: New sidecar layer (
image-vision.json) joiningimage-provenance.jsonby image-path key. ADR-047 stays untouched. v2 is purely additive. - Vision model — RESOLVED: Claude Sonnet 4.6. ~$0.05/image; ~$67 first-build for whole corpus.
- Corpus scope — RESOLVED: Whole corpus (1345 entries). Editorial coverage everywhere.
- Smart-crop — RESOLVED: v2.0 ships 1:1 + 4:3 + 16:9 variants via
sharpat build time. Mobile bundle savings (~30 MB) land immediately. - Human curation feedback loop — RESOLVED:
image-curation.jsondeny-list (committed) + recent-5 entries injected into scoring prompt as in-context bias. Audit-report Flag button generates clipboard payload →flag-image.tshelper appends. No server. - Granular pipeline scoping — RESOLVED: 6 explicit CLI flags (
--mission,--agency,--source,--fleet-asset,--segment,--all). No implicit default; opinion-less CLI. - Provider abstraction — RESOLVED:
VisionProviderinterface (mirrors PRD-016TtsProvider). v2.0 ships Anthropic Sonnet 4.6; future swap to OpenAI / Google Vision is config + new implementation, no pipeline rewrite. - Runtime NASA API removal — RESOLVED: Yes (M10). Galleries fully offline post-v2.0.
- Cost-cap policy — RESOLVED: Same as PRD-016 ($50 soft warn, $200 hard halt per build). Image-vision pipeline shares a ledger pattern but a separate file (
image-vision-cost-ledger.json). - Validate-data integration — RESOLVED: NEW optional checks added (manifest + variant existence + curation schema). Existing ADR-047 fail-closed image-provenance check unchanged.
Open follow-ups:
image-provenance.jsonschema fields needed for granular scoping. v2.0 scoping CLI filters onagency,source,asset_type,segment. Verify these fields exist in current schema; if missing, an additive schema bump is a v2.0 prerequisite (within "evolution" framing). Implementation-time check.- Score-threshold calibration. PRD M2 sets threshold ≥ 5. After first scoring pass, Marko reviews the score distribution + adjusts. Implementation-time decision.
- Audit-report retention. v2.0 overwrites on every run. v2.1 candidate: retain last 10 reports for diff comparison.
@anthropic-ai/sdkminimum version supporting Sonnet 4.6. Verify at implementation time (likely>= 0.30.0). 14b. API access prerequisite —ANTHROPIC_API_KEYis operator-managed, NOT bundled with Claude Code. As of 2026-05 Anthropic announced API calls are excluded from Claude Code subscriptions; v2 vision pipeline needs its own paid API key. Setup steps documented indocs/guides/image-pipeline-v2.md. PRD-018 M11 captures this as a hard prerequisite. Same key works for PRD-016 audio (per-account billing).- Sharp memory budget on whole-corpus rebuild. ~4035 crops in series = OK; in parallel = potentially OOMs on lower-tier dev machines. Recommend sequential or small worker pool (≤ 4); confirm at implementation time.
RFC-022 · Orrery · Image Pipeline v2 · Drafted 2026-05-16 · Closes-into-PRD-018