Skip to content

RFC-022 · Image Pipeline v2 — vision-model scoring, smart cropping, curation loop, granular scoping

Status: Draft v0.4 · 2026-05-16 · Closes: PRD-018

Why this is an RFC. The architecture binds every future asset fetch in the corpus (1345 provenance entries today, growing): the vision provider abstraction (so swapping Claude Sonnet → another vision model is config, not a rewrite), the cache-key shape that determines per-build cost ($67 cold vs $0 warm), the smart-crop variant layout that ripples through every hero/card/thumbnail component, the join model with image-provenance.json (ADR-047 stays untouched per Marko's "evolution, new layer on" directive), the human-curation feedback loop (image-curation.json deny-list + in-context model bias), and the granular CLI scope flags (--mission, --agency, --source, --fleet-asset, --segment, --all) that determine whether iteration costs $0.50 or $67. These are interlocking commitments; one wrong cut early forces ugly retrofits later.


1 · Architecture overview

┌──────────────────────────────────────────────────┐
│  scripts/fetch-assets.ts                         │  EXISTING (ADR-016 / ADR-046)
│   Phase 1 — fetch candidates from sources        │  ← unchanged
│   Phase 2 — download to local cache              │  ← unchanged
└──────────────────┬───────────────────────────────┘


┌──────────────────────────────────────────────────┐
│  scripts/score-images.ts                         │  NEW (Phase 3)
│   • read image-curation.json deny-list           │
│   • for each candidate: cache-key check          │
│   • cache-miss → Anthropic Vision API call       │
│   • write per-image score / focal / category     │
│     to .image-cache/{hash}.json                  │
│   • subset of corpus per CLI scope flags         │
└──────────────────┬───────────────────────────────┘


┌──────────────────────────────────────────────────┐
│  scripts/crop-variants.ts                        │  NEW (Phase 4)
│   • for each scored image: read focal_point      │
│   • sharp() crops to 1:1 + 4:3 + 16:9            │
│   • cache-key = SHA-256(source + focal + ratio)  │
│   • output: {base}.{ratio}.jpg next to source    │
└──────────────────┬───────────────────────────────┘


┌──────────────────────────────────────────────────┐
│  scripts/build-image-vision-manifest.ts          │  NEW (Phase 5)
│   • merge per-image cache files                  │
│   • write static/data/image-vision.json          │
│   • write static/data/audit-report.html          │
│   • write static/data/image-vision-cost-ledger.json │
└──────────────────┬───────────────────────────────┘


┌──────────────────────────────────────────────────┐
│  Frontend (Vite static import)                   │
│   • imports image-vision.json + image-provenance │
│   • joins by image-path key                      │
│   • picks variant by container aspect + viewport │
│   • applies object-position from focal_point     │
└──────────────────────────────────────────────────┘

Five new scripts. Zero modifications to image-provenance.json (ADR-047 stays). Zero runtime API calls (ADR-016 stays). All side-effects are file I/O + HTTPS fetches at build time.


2 · Two new sidecar files (the v2 layer)

2.1 · static/data/image-vision.json

Generated by build-image-vision-manifest.ts. Keyed by image-path string (matches the keys in image-provenance.json for the join):

jsonc
{
  "version": 1,
  "generated_at": "2026-05-16T14:32:11Z",
  "vision_provider": "anthropic",
  "vision_model": "claude-sonnet-4-6",
  "prompt_version": "v1.0.0",
  "entries": {
    "static/images/missions/curiosity-hero.jpg": {
      "score": 9,
      "subject": "Curiosity rover wheel + Mars surface, sol 2370",
      "category": "surface",
      "focal_point": { "x": 0.42, "y": 0.58 },
      "variants": {
        "1x1":  "static/images/missions/curiosity-hero.1x1.jpg",
        "4x3":  "static/images/missions/curiosity-hero.4x3.jpg",
        "16x9": "static/images/missions/curiosity-hero.16x9.jpg"
      },
      "rejected_by": null,
      "fallback": false,
      "scored_at": "2026-05-16T14:30:02Z",
      "scoring_cost_usd": 0.0512
    },
    "static/images/missions/saturn-v-launch.jpg": { /* ... */ }
  }
}

Key invariants:

  • The keys are EXACTLY the same paths as image-provenance.json (the join is by string equality).
  • variants paths point to actual files on disk (validated by the new validate-data check, M6).
  • rejected_by is null (model accepted) OR a string ("score-below-threshold", "category-people", "category-diagram", "human"). When rejected_by is set, the entry is still in the manifest — frontend uses this to skip the image AND to surface "rejected" in the audit report.
  • fallback: true when zero acceptable candidates existed and the highest-scoring was used despite threshold.

2.2 · static/data/image-curation.json

Manually maintained (committed to the repo) — the human-feedback deny-list:

jsonc
{
  "version": 1,
  "deny_list": [
    {
      "image_path": "static/images/missions/some-bad-pick.jpg",
      "reason": "Image is a press conference photo; subject is not the spacecraft",
      "flagged_by": "marko",
      "flagged_at": "2026-05-16T14:45:00Z"
    },
    {
      "image_path": "static/images/fleet/wrong-rover-thumbnail.jpg",
      "reason": "Shows Curiosity but caption claims Perseverance — wrong identification",
      "flagged_by": "marko",
      "flagged_at": "2026-05-15T09:12:00Z"
    }
  ]
}

Pipeline reads on every run; flagged images are scored as score: 0, rejected_by: "human" regardless of model output. Recent ~5 entries are injected into the scoring prompt as in-context "avoid this" examples (M16). Older entries stay in the deny-list but rotate out of the prompt context once the list grows past 100.

The audit-report HTML's "🚩 Flag" button doesn't write to disk directly (no server). It generates a clipboard payload (one JSON object) that the operator paste-runs through node scripts/flag-image.ts < /path/to/clipboard to append to the deny-list. Workflow: click flag → fill reason → text copied to clipboard → run helper → commit.

2.3 · Why two sidecars, not one big file

  • image-vision.json is machine-generated, large (~1345 entries), changes on every scoring run, deterministic.
  • image-curation.json is human-authored, small (~10–100 entries over time), changes rarely, intentional.

Mixing them in one file means every model-rerun rewrites the file Marko hand-authored. Separating them keeps the human-edited file diff-friendly and source-controlled cleanly.


3 · Vision provider abstraction

Same pattern as PRD-016/RFC-019 TtsProvider — a thin interface so a future model swap (or escape to OpenAI Vision / Google Vision) is config, not a rewrite.

typescript
// scripts/vision/provider.ts
export interface VisionProvider {
  readonly name: 'anthropic' | 'openai' | 'google';
  readonly model: string;

  score(input: {
    imageBytes: Buffer;
    imagePath: string;
    contextHint?: string;     // e.g. "Mars rover mission, NASA, ACTIVE"
    denyListExamples: string[]; // last ~5 deny-list reasons as in-context bias
  }): Promise<{
    score: number;            // 1–10
    subject: string;
    category: 'spacecraft' | 'surface' | 'launch' | 'orbital'
            | 'hardware' | 'people' | 'diagram' | 'render' | 'other';
    focal_point: { x: number; y: number };
    reject_reason: string | null;
    cost_usd: number;
  }>;
}

v2.0 ships with Anthropic Sonnet 4.6 (scripts/vision/anthropic.ts). Adding a new provider is implementing the interface + config in a future voices.json-style mapping.


4 · Scoring prompt (vision-API call shape)

Each call sends:

  • The image bytes (multipart upload via Anthropic SDK)
  • A structured system prompt describing the scoring rubric
  • A user message with the per-image context hint + deny-list examples

System prompt outline:

You are an editorial image-quality scorer for an interactive space-history product (Orrery).
Your job: rate each image on visual quality + subject relevance for the use case.

Return STRICT JSON ONLY:
{
  "score": <1-10>,
  "subject": "<one sentence describing what's in the frame>",
  "category": "<one of: spacecraft | surface | launch | orbital | hardware | people | diagram | render | other>",
  "focal_point": { "x": <0.0-1.0>, "y": <0.0-1.0> },
  "reject_reason": null OR "<short reason>"
}

SCORING RUBRIC:
  9-10: Iconic, museum-quality. Pristine subject, clean composition, high resolution.
  7-8:  Strong editorial pick. Subject clear and centered or compositional, good resolution.
  5-6:  Acceptable. Subject visible, decent quality, some compositional weakness.
  3-4:  Marginal. Subject hard to read, low resolution, or composition flawed.
  1-2:  Reject. Wrong subject, bad quality, or category we don't surface.

CATEGORY RULES:
  - "people":   reject for general use (we surface space hardware, not press conferences)
  - "diagram":  reject for general use (we have hand-authored diagrams; raw infographics are noise)
  - "render":   ACCEPT for PLANNED missions (artist concepts), REJECT for FLOWN/ACTIVE
  - all others: accept by score

FOCAL POINT:
  Locate the visual center of the subject (rover, rocket, planet, instrument).
  Express as { x: 0.0-1.0 horizontal, y: 0.0-1.0 vertical } from top-left.
  This will be used as the crop anchor for 1:1, 4:3, 16:9 variants.

CONTEXT (mission / asset / agency):
  <context_hint from caller>

EDITORIAL DENY-LIST (recent operator feedback — avoid producing similar results):
  <last 5 deny-list reasons, one per line>

User message:

Image: <inline base64 or attached>
Path: <imagePath>

Returns the JSON shape above. Pipeline parses + writes to per-image cache file.


5 · Cache strategy

5.1 · Per-image cache key

SHA-256(
  imageBytes              // changes when source image changes
  + scoringPromptVersion  // changes when prompt rubric is updated
  + visionModel           // changes when model is swapped
)

Cache file: .image-cache/scores/{hash16}.json (16-hex-char prefix of SHA-256). Contents = exact JSON returned by VisionProvider.score() plus a timestamp.

5.2 · Per-variant cache key

SHA-256(
  imageBytes              // changes when source changes
  + focalPoint            // changes when scoring re-rates the focal point
  + targetAspect          // 1x1 | 4x3 | 16x9
  + sharpVersion          // changes when sharp dependency upgrades — invalidates all crops
)

Cache file: .image-cache/variants/{hash16}.{ratio}.jpg (the cropped output, ready to copy to static/).

5.3 · Curation deny-list cache invalidation

When image-curation.json changes (file mtime newer than the per-image cache), the SCORING cache for the deny-listed image is invalidated AND the recent-5-examples in the prompt change → next run re-scores any images that landed within the prompt-context window. This is intentional — a new "avoid this" example should ripple through similar image scoring.

5.4 · Architectural guarantee — never reprocess unchanged entries

Cache invalidation triggers are EXPLICIT and finite:

TriggerEffect
Source image bytes change (file edited / replaced)Per-image score cache invalidated; per-variant cache invalidated for that image
scoring_prompt_version constant bumped (in scripts/vision/prompt.ts)All score caches invalidated; variant caches unaffected
vision_model config string changes (e.g. claude-sonnet-4-6claude-opus-5-0)All score caches invalidated; variant caches unaffected
Image added to image-curation.json deny-listThat image's score cache invalidated + the ~5 nearest in the prompt-context window are re-scored
sharp major-version upgradeAll variant caches invalidated; score caches unaffected

Time-based invalidation (e.g. "re-score everything monthly", "re-process if older than X") is explicitly NOT a trigger. Routine builds must be effectively free for unchanged entries. The --all --force-score command bypasses the cache entirely; that's the only way to forcibly reprocess unchanged entries, and it's an explicit operator gesture.

5.5 · Cost-per-iteration table

OperationVision API callsSharp cropsWall clockCost
Default (no flags) — incremental, nothing changed00~30 s$0
Default (no flags) — 5 new images added515~20 s~$0.25
Default (no flags) — prompt-version bump~13450~22 min~$67 (rare; explicit prompt edit triggered it)
--all cold (true first build)~1345~4035~25 min~$67
--all cached (no source changes)00~30 s$0
--all --force-score~1345~4035~25 min~$67
--mission curiosity cold~15~45~30 s~$0.75
--mission curiosity cached00~5 s$0
--agency NASA cold~600~1800~12 min~$30
--source nasa-images-api cold~800~2400~16 min~$40
--fleet-asset patches cold~60~180~3 min~$3
--segment fleet-galleries cold~440~1320~9 min~$22
--changed-since HEAD~5 (typical PR diff)5–2015–60~30–60 s~$0.25–$1
Single image flagged (deny-list updated)1 + ~5 nearby (prompt context shift)~18~10 s~$0.30

The default CLI behaviour is incremental (--new-only implicit). --all exists for the rare full-rebuild case (prompt change, model swap, true first build). Routine builds are effectively free.


6 · Granular CLI scope flags + incremental default

bash
# DEFAULT — incremental. Routine workflow. $0–$5 typical, $0 if nothing changed.
npm run images:score                    # implicit --new-only

# Equivalent explicit form
npm run images:score -- --new-only

# Git-aware incremental — process anything modified since a ref
npm run images:score -- --changed-since HEAD~5
npm run images:score -- --changed-since main           # vs. the last release branch

# Per mission — fastest iteration loop when you know exactly what changed
npm run images:score -- --mission curiosity
npm run images:score -- --mission curiosity --force-score   # bypass scoring cache

# Per agency
npm run images:score -- --agency JAXA
npm run images:score -- --agency NASA --skip-crops          # only re-score, no variant regen

# Per source
npm run images:score -- --source nasa-images-api
npm run images:score -- --source wikimedia-commons

# Per fleet-asset type
npm run images:score -- --fleet-asset heroes
npm run images:score -- --fleet-asset patches

# Per content segment
npm run images:score -- --segment mission-galleries

# Catch-all — full corpus (~$67 cold; operator must opt in explicitly)
npm run images:score -- --all
npm run images:score -- --all --force-score                 # nuke cache + full rebuild

6.1 · Default = --new-only (incremental)

Running with no flags processes ONLY entries that need processing:

  • Images present in image-provenance.json but absent from image-vision.json (newly added).
  • Images whose source bytes changed since the last cache entry.
  • Images whose scoring cache was invalidated by a prompt-version bump, model swap, or curation deny-list update (per §5.4).

If nothing changed, the run completes in ~30 seconds doing zero API calls and zero sharp work.

This is the routine workflow. --all exists, but no longer the only catch-all — operators reach for it only after a prompt-rubric or model change.

6.2 · --changed-since <git-ref>

Resolves to the set of files modified between <git-ref> and HEAD via git diff --name-only <ref>...HEAD, intersected with the keys in image-provenance.json. Useful for CI workflows that want to score only what a PR touched.

bash
# In a GH Action
npm run images:score -- --changed-since "${{ github.event.pull_request.base.sha }}"

6.3 · CLI implementation notes

  • Each scope flag resolves to a list of image paths by reading image-provenance.json and filtering on the matching field (subject_id, agency, source, asset_type, segment).
  • The image-provenance.json schema must already carry these fields for filtering to work — verify at v2.0 implementation start. If any field is missing, that's a v2.0 prerequisite (an add-only schema bump in the existing manifest, not a structural change — falls within "evolution" rather than "change").
  • Combinations are AND-joined: --agency NASA --fleet-asset patches scores NASA mission patches only.
  • --new-only is also AND-joinable: --agency NASA --new-only processes NEW NASA images only (skip any NASA images already in the manifest).

7 · Frontend integration

7.1 · Manifest import

typescript
// src/lib/image-vision.ts
import vision from '$lib/../static/data/image-vision.json';
import provenance from '$lib/../static/data/image-provenance.json';

export function getImage(path: string) {
  const v = vision.entries[path];
  const p = provenance[path];
  return {
    src: path,                              // original (lightbox)
    variant_1x1: v?.variants['1x1'],        // mobile thumbnail
    variant_4x3: v?.variants['4x3'],        // gallery card
    variant_16x9: v?.variants['16x9'],      // hero
    focal_point: v?.focal_point,             // CSS object-position
    score: v?.score,
    subject: v?.subject,                     // alt text
    rejected: v?.rejected_by !== null,
    license: p?.license,
    credit: p?.credit,
  };
}

7.2 · Component selection

UseVariant
Hero (desktop)variant_16x9
Hero (mobile)variant_4x3 (better for portrait viewport)
Gallery card (desktop + mobile)variant_4x3
Mobile thumbnail / fleet gallery rowvariant_1x1
Lightbox / full-screenoriginal (no variant)

Components apply object-position: {focal_point.x * 100}% {focal_point.y * 100}% when using object-fit: cover. Browser does the cropping at render time using the focal point as the centre.

7.3 · Mobile build picks 1:1 for fleet galleries

vite.config.ts MOBILE=1 branch (RFC-018 §4):

  • Fleet gallery components use variant_1x1 instead of source.
  • Hero components use variant_4x3 instead of variant_16x9.
  • Net: fleet-gallery bucket drops from ~120 MB → ~90 MB on mobile (~30 MB win, success criterion #6 in PRD-018).

7.4 · Runtime NASA API removal (M10)

Components today call fetch('https://images-api.nasa.gov/...') for runtime gallery population. Post-v2.0:

  • All gallery imagery is scored + cropped + bundled at build time.
  • Runtime fetch removed entirely from gallery components.
  • "LIVE" indicator (which signalled the runtime fetch) removed.
  • Loss: galleries no longer auto-update with newly-released NASA imagery between deploys. Acceptable trade-off — mission galleries change rarely.

8 · Audit report HTML

Generated by build-image-vision-manifest.ts as static/data/audit-report.html (gitignored — dev-only artefact).

Structure: one section per image-provenance-key, showing all candidates considered (the API may return multiple per slot when variants exist), with score / category / focal-point crosshair overlay / selection status / reject reason / per-image cost / Flag button.

Marko opens locally (open static/data/audit-report.html after a build) to:

  • Spot bad picks → click 🚩 → enter reason → paste-run node scripts/flag-image.ts
  • Sanity-check focal-point placement on tricky compositions
  • Watch the cost ledger build up over a multi-iteration session

The Flag button:

  1. Opens a small modal overlaid on the audit-report page.
  2. Pre-fills image_path from the candidate row.
  3. User types reason + clicks Submit.
  4. Generates a JSON payload + copies it to the clipboard.
  5. Operator runs node scripts/flag-image.ts (which reads stdin), the helper appends to image-curation.json.
  6. Operator commits the deny-list update.

No server. No write API. Clipboard is the bridge. Keeps the audit report a static file, deployable nowhere, leaks no secrets.


9 · Validate-data integration

scripts/validate-data.ts gains TWO new optional checks (not fail-closed by default — v2 is purely additive):

  1. Manifest existence + schema check. If static/data/image-vision.json exists, validate it against an ajv schema (scripts/schemas/image-vision.schema.json). Malformed manifest = fail. Missing manifest = warn (v2 not deployed yet).
  2. Variant file existence check. For every entry in the manifest, the three variant paths must exist on disk. Missing variant = fail (manifest references a file that wasn't generated).
  3. Curation deny-list schema check. If static/data/image-curation.json exists, validate it. Malformed = fail. Missing = warn.

The EXISTING image-provenance check (the fail-closed gate from ADR-047) stays unchanged. v2's checks are NEW additions, not replacements.


10 · Failure modes + handling

FailureDetectionHandling
Anthropic API outage during scoringHTTP 5xx from VisionProvider.score()Retry with exponential backoff (3 attempts). Final failure → log + skip that image (cache file marked failed: true). Build continues; failed images get re-tried on next pipeline run.
API rate-limitHTTP 429Sleep per Retry-After header, resume.
Cost ledger threshold breached during runpost-call check against image-vision-cost-ledger.jsonSoft-warn at $50/build (continues); hard-halt at $200/build (pipeline exits non-zero, cache for completed images preserved, operator restarts after investigation).
Image bytes corrupt or unreadablesharp exception during variant generationLog + skip variant generation for that image (manifest entry retains variants: null); image still gets a score.
Invalid JSON returned by vision APIJSON parse failsRetry with stricter prompt; second failure → mark failed: true; manual review.
Source image deleted between fetch and scorefs check before score callDrop from manifest; warn in audit report.
Curation deny-list malformedajv validation failsPipeline exits with clear error pointing at the bad entry; fix the deny-list, re-run.

Build never fails closed because of API result quality (only because of structural validation failures or hard cost-cap breach).


11 · Resolved decisions + open questions

Resolved 2026-05-16:

  1. Manifest model — RESOLVED: New sidecar layer (image-vision.json) joining image-provenance.json by image-path key. ADR-047 stays untouched. v2 is purely additive.
  2. Vision model — RESOLVED: Claude Sonnet 4.6. ~$0.05/image; ~$67 first-build for whole corpus.
  3. Corpus scope — RESOLVED: Whole corpus (1345 entries). Editorial coverage everywhere.
  4. Smart-crop — RESOLVED: v2.0 ships 1:1 + 4:3 + 16:9 variants via sharp at build time. Mobile bundle savings (~30 MB) land immediately.
  5. Human curation feedback loop — RESOLVED: image-curation.json deny-list (committed) + recent-5 entries injected into scoring prompt as in-context bias. Audit-report Flag button generates clipboard payload → flag-image.ts helper appends. No server.
  6. Granular pipeline scoping — RESOLVED: 6 explicit CLI flags (--mission, --agency, --source, --fleet-asset, --segment, --all). No implicit default; opinion-less CLI.
  7. Provider abstraction — RESOLVED: VisionProvider interface (mirrors PRD-016 TtsProvider). v2.0 ships Anthropic Sonnet 4.6; future swap to OpenAI / Google Vision is config + new implementation, no pipeline rewrite.
  8. Runtime NASA API removal — RESOLVED: Yes (M10). Galleries fully offline post-v2.0.
  9. Cost-cap policy — RESOLVED: Same as PRD-016 ($50 soft warn, $200 hard halt per build). Image-vision pipeline shares a ledger pattern but a separate file (image-vision-cost-ledger.json).
  10. Validate-data integration — RESOLVED: NEW optional checks added (manifest + variant existence + curation schema). Existing ADR-047 fail-closed image-provenance check unchanged.

Open follow-ups:

  1. image-provenance.json schema fields needed for granular scoping. v2.0 scoping CLI filters on agency, source, asset_type, segment. Verify these fields exist in current schema; if missing, an additive schema bump is a v2.0 prerequisite (within "evolution" framing). Implementation-time check.
  2. Score-threshold calibration. PRD M2 sets threshold ≥ 5. After first scoring pass, Marko reviews the score distribution + adjusts. Implementation-time decision.
  3. Audit-report retention. v2.0 overwrites on every run. v2.1 candidate: retain last 10 reports for diff comparison.
  4. @anthropic-ai/sdk minimum version supporting Sonnet 4.6. Verify at implementation time (likely >= 0.30.0). 14b. API access prerequisite — ANTHROPIC_API_KEY is operator-managed, NOT bundled with Claude Code. As of 2026-05 Anthropic announced API calls are excluded from Claude Code subscriptions; v2 vision pipeline needs its own paid API key. Setup steps documented in docs/guides/image-pipeline-v2.md. PRD-018 M11 captures this as a hard prerequisite. Same key works for PRD-016 audio (per-account billing).
  5. Sharp memory budget on whole-corpus rebuild. ~4035 crops in series = OK; in parallel = potentially OOMs on lower-tier dev machines. Recommend sequential or small worker pool (≤ 4); confirm at implementation time.

RFC-022 · Orrery · Image Pipeline v2 · Drafted 2026-05-16 · Closes-into-PRD-018

Orrery — architecture documentation · MIT · No tracking