Standardized testing reference guide¶

Companion to RFC-019. Industry methodology and resources for reference-image validation in Chemigram.

Why standardized test targets¶

The photography and imaging industry — Imatest, DxOMark, X-Rite, the ISO TC-42 and IEEE P1858 committees — converges on a single principle: test against inputs with known, published physical properties, then measure the delta between expected and actual. Without known ground truth, any evaluation is subjective opinion.

For Chemigram this means: replace the arbitrary raw-test.NEF (which tells us "did darktable crash?") with standardized chart photographs (which tell us "did the pipeline produce correct color and tone?").

The two reference targets¶

Calibrite ColorChecker Passport Photo 2¶

24 patches: 18 color + 6 neutral grayscale
Published CIE L*a*b* D50 reference values from X-Rite
~€70 from calibrite.com or Amazon
Includes free DNG profiling software
Check manufacturing date (back of chart): formulations changed Nov 2014

What it tests in Chemigram:

Metric	What it catches
Per-patch Delta E	Wrong color matrix, WB error, color-grading vocabulary producing unintended shifts
Mean/max Delta E	Overall color accuracy regression
Grayscale neutrality	Color cast in neutral tones (WB or channel mixer bugs)
Patch-to-patch relationships	Relative color accuracy even when absolute is off

Calibrite ColorChecker Grayscale¶

Neutral step wedge with known optical density values
~€40
Isolates tonal response from color

What it tests in Chemigram:

Metric	What it catches
Tonal linearity (R²)	Broken tone curve, sigmoid misconfiguration
Gamma fit	Wrong contrast/gamma after vocabulary application
Shadow/highlight clipping	Exposure vocabulary pushing values out of range
Noise floor	Denoise vocabulary effectiveness

Shooting protocol¶

One afternoon, controlled conditions:

Camera: Nikon D850 (same body as existing CHEMIGRAM_TEST_RAW)
Lens: prime, 50mm or longer, stopped down to f/8 (sharpest aperture)
Lighting: overcast daylight OR a 5500K LED panel at 45° angle
Chart distance: fill ~⅓ of frame (avoid vignetting)
Exposure: white patch of CC24 just below highlight clipping in RAW histogram
WB: fixed preset (daylight 5500K), not auto
Format: 14-bit lossless compressed NEF
Shoot: 3 frames of each chart, pick the sharpest
Save: two files, ~50 MB each

Store alongside existing test raw, discovered via:

export CHEMIGRAM_TEST_CC24=~/chemigram-reference/cc24.NEF
export CHEMIGRAM_TEST_GRAYSCALE=~/chemigram-reference/grayscale.NEF

Reference data sources¶

Official Lab* values (required)¶

X-Rite CGATS files — the ground truth: - ColorChecker24_Before_Nov2014.txt - ColorChecker24_After_Nov2014.txt - Download: https://www.xrite.com/service-support/new_color_specifications_for_colorchecker_sg_and_classic_charts

Community-validated data (supplementary)¶

BabelColor — averaged spectral data from 30 charts, synthetic images, comparison spreadsheets: - Page 2 (data): https://babelcolor.com/colorchecker-2.htm - RGB coordinates PDF: https://babelcolor.com/tutorials.htm - CxF2 format file with both averaged and X-Rite reference data - Note: BabelColor CT&A and PatchTool are now freeware (since Jan 2025)

Bruce Lindbloom — LAB TIFF + comparison spreadsheets: - https://www.brucelindbloom.com/ColorCheckerRGB.html - Computer-generated L*a*b* TIFF for synthetic testing

RIT Munsell Color Science Lab — independent spectral reflectance measurements: - https://www.rit.edu/science/munsell-color-science-lab

Post-Nov 2014 reference Lab* D50 (for the JSON fixture)¶

These are the values that go into tests/fixtures/reference-targets/colorchecker24_lab_d50.json:

#	Patch	L*	a*	b*
1	Dark Skin	37.54	14.37	14.92
2	Light Skin	65.71	17.64	17.67
3	Blue Sky	49.59	−3.82	−22.54
4	Foliage	43.72	−13.39	22.18
5	Blue Flower	55.47	9.75	−24.79
6	Bluish Green	71.77	−33.13	0.68
7	Orange	62.66	35.83	56.50
8	Purplish Blue	40.56	10.09	−45.17
9	Moderate Red	52.10	48.24	16.23
10	Purple	30.67	21.19	−20.81
11	Yellow Green	72.53	−23.71	57.26
12	Orange Yellow	71.94	19.36	67.86
13	Blue	28.78	15.42	−49.80
14	Green	55.26	−38.34	31.37
15	Red	42.43	51.05	28.62
16	Yellow	82.45	2.41	80.25
17	Magenta	51.98	49.99	−14.57
18	Cyan	50.98	−28.78	−28.35
19	White (.05)*	96.54	−0.43	1.19
20	Neutral 8 (.23)*	81.26	−0.64	−0.34
21	Neutral 6.5 (.44)*	66.77	−0.73	−0.50
22	Neutral 5 (.70)*	50.87	−0.15	−0.27
23	Neutral 3.5 (1.05)*	35.66	−0.42	−1.23
24	Black (1.50)*	20.46	−0.08	−0.97

* Parenthesized values are approximate optical density.

Source: X-Rite, "After November 2014" CGATS file. Illuminant D50, 2° observer.

Delta E 2000 — the primary metric¶

Delta E 2000 (CIE DE2000) is the industry-standard perceptual color difference formula. It accounts for human visual sensitivity: we're more sensitive to hue differences than chroma differences, and more sensitive in low-chroma regions than high-chroma.

Interpretation:

Delta E	Meaning
< 1.0	Not perceptible to the human eye
1.0–2.0	Perceptible through close observation
2.0–3.5	Perceptible at a glance
3.5–5.0	Clear difference
> 5.0	Colors appear noticeably different

Python implementation: the colour-science package (pip install colour-science) provides colour.delta_E(lab1, lab2, method='CIE 2000').

For Chemigram assertions:

import colour
import numpy as np

def compute_delta_e(measured_lab: np.ndarray, reference_lab: np.ndarray) -> np.ndarray:
    """Compute per-patch Delta E 2000."""
    return colour.delta_E(measured_lab, reference_lab, method='CIE 2000')

def assert_color_accuracy(measured, reference, max_mean_de=3.0, max_max_de=6.0):
    de = compute_delta_e(measured, reference)
    mean_de = float(np.mean(de))
    max_de = float(np.max(de))
    passed = mean_de <= max_mean_de and max_de <= max_max_de
    return ColorAccuracyResult(
        passed=passed, mean_de=mean_de, max_de=max_de,
        per_patch=de.tolist()
    )

Vocabulary move assertions — direction and magnitude¶

Beyond absolute accuracy, each vocabulary entry has an expected effect that can be asserted:

Vocabulary entry	Expected effect	Assertion
`expo_plus_0p5`	+0.5 EV exposure	Mean L* of grayscale patches increases by 4–8 units
`expo_minus_0p5`	−0.5 EV exposure	Mean L* decreases by 4–8 units
`wb_warm_subtle`	Warm white balance shift	Mean b* of neutral patches increases (shifts yellow)
`wb_cool_subtle`	Cool white balance shift	Mean b* of neutral patches decreases (shifts blue)
`tone_lift_shadows`	Lift dark values	L* of dark patches (22–24) increases; light patches (19–20) stable
`tone_compress_highlights`	Reduce bright values	L* of bright patches (19–20) decreases; dark patches stable

These are relative assertions: compare "before" vs "after" applying the vocabulary entry to the reference RAW. The direction must be correct; the magnitude should be within a documented range.

This is the key insight: each vocabulary .dtstyle file should eventually carry a companion assertion spec. When someone contributes a new vocabulary entry, they also specify what effect it should have on the reference targets. This makes vocabulary entries testable, not just parseable.

Open-source tools¶

Tool	Use in Chemigram
`colour-science` (Python)	Delta E computation, Lab* conversions, chromatic adaptation
`Pillow` / `rawpy`	Image loading (TIFF for synthetic, rendered output for real)
`numpy`	Patch extraction, histogram stats
ArgyllCMS	ICC profile creation, display calibration (if needed)
DCamProf	DNG/DCP profile creation from CC24 shots
BabelColor CT&A (freeware)	Color measurement and analysis

Synthetic fixture generation¶

For the CI tier, generate synthetic ColorChecker and grayscale TIFFs from the published L*a*b* values:

import colour
import numpy as np
from PIL import Image

# Load reference L*a*b* values
reference_lab = load_json("colorchecker24_lab_d50.json")

# Convert L*a*b* D50 → sRGB
srgb_values = []
for patch in reference_lab:
    lab = np.array([patch["L"], patch["a"], patch["b"]])
    xyz = colour.Lab_to_XYZ(lab, illuminant=colour.CCS_ILLUMINANTS["CIE 1931 2 Degree Standard Observer"]["D50"])
    srgb = colour.XYZ_to_sRGB(xyz)
    srgb_clipped = np.clip(srgb, 0, 1)
    srgb_values.append((srgb_clipped * 255).astype(np.uint8))

# Render as 6×4 grid of 100×100 pixel patches
img = np.zeros((400, 600, 3), dtype=np.uint8)
for i, rgb in enumerate(srgb_values):
    row, col = divmod(i, 6)
    img[row*100:(row+1)*100, col*100:(col+1)*100] = rgb

Image.fromarray(img).save("colorchecker_synthetic_srgb.tiff")

This synthetic image is the "perfect" digital ColorChecker. Passing it through an identity transform should produce Delta E ≈ 0.0 (limited only by sRGB gamut clipping on out-of-gamut patches like Cyan #18).