chemigram.core.xmp¶

chemigram.core.xmp ¶

Parse and write darktable XMP sidecar files.

XMP is darktable's per-image edit state format: RDF/XML with a <rdf:Seq> of history entries (per-module configurations). Calibrated to darktable 5.4.1 (see tests/fixtures/README.md and docs/adr/TA.md contracts/xmp-darktable-history).

Binary blobs (params, blendop_params) are opaque (ADR-008) and are never decoded. defusedxml is used for parsing untrusted input; output uses the standard library's ElementTree (which we control).

Round-trip property: parse_xmp(write_xmp(x, p)) == x for any well-formed Xmp x (semantic equality of the dataclass, not byte identity of the file).

Public API

:func:parse_xmp, :func:write_xmp
:class:Xmp, :class:HistoryEntry — frozen dataclasses
:class:XmpParseError — exception raised on malformed input

XmpParseError ¶

Bases: Exception

Raised when an XMP file cannot be parsed.

HistoryEntry `dataclass` ¶

HistoryEntry(num, operation, enabled, modversion, params, multi_name, multi_name_hand_edited, multi_priority, blendop_version, blendop_params, iop_order=None)

One <rdf:li> entry in a darktable XMP <rdf:Seq>.

Calibrated to darktable 5.4.1. params and blendop_params are opaque blobs (ADR-008) and are never decoded.

iop_order is None in 5.4.1 .dtstyle files and is unnecessary for Path B (per RFC-018 v0.2 empirical evidence). darktable resolves pipeline order from the parent's darktable:iop_order_version + an internal iop_list. The field stays Optional + float because rendered XMP sidecars can carry per-entry iop_order as a float (e.g. 47.4747); the parser must round-trip those.

Xmp `dataclass` ¶

Xmp(rating, label, auto_presets_applied, history_end, iop_order_version, history, raw_extra_fields=())

A parsed darktable XMP file.

First-class fields are those the synthesizer (Issue #3) reads or writes. Everything else on <rdf:Description> (timestamps, hashes, creator metadata, masks_history, etc.) is preserved opaquely in raw_extra_fields for round-trip fidelity.

raw_extra_fields entries are 3-tuples (kind, qname, value):

kind == "attr": an attribute on <rdf:Description>. qname is the prefixed name (e.g. "darktable:xmp_version"); value is the raw attribute string.
kind == "elem": a child element of <rdf:Description>. qname is the prefixed element name; value is the entire subtree serialized as XML text.

parse_xmp ¶

parse_xmp(path)

Parse a darktable XMP sidecar file.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to an XMP file.	required

Returns:

Name	Type	Description
`An`	`Xmp`	class:`Xmp` capturing the file's first-class fields and any
	`Xmp`	unmodeled attributes / nested elements via `raw_extra_fields`.

Raises:

Type	Description
`XmpParseError`	malformed XML; missing `<rdf:Description>`; invalid attribute values (e.g., non-integer `history_end`).
`FileNotFoundError`	`path` does not exist.

Source code in src/chemigram/core/xmp.py

def parse_xmp(path: Path) -> Xmp:
    """Parse a darktable XMP sidecar file.

    Args:
        path: Path to an XMP file.

    Returns:
        An :class:`Xmp` capturing the file's first-class fields and any
        unmodeled attributes / nested elements via ``raw_extra_fields``.

    Raises:
        XmpParseError: malformed XML; missing ``<rdf:Description>``;
            invalid attribute values (e.g., non-integer ``history_end``).
        FileNotFoundError: ``path`` does not exist.
    """
    if not path.exists():
        raise FileNotFoundError(path)

    try:
        tree = _defused_parse(path)
    except _DefusedParseError as exc:
        raise XmpParseError(f"{path}: malformed XML: {exc}") from exc

    root = tree.getroot()
    description = root.find(f".//{{{_NS['rdf']}}}Description")
    if description is None:
        raise XmpParseError(f"{path}: missing rdf:Description")

    return _parse_description_to_xmp(description, path)

parse_xmp_from_bytes ¶

parse_xmp_from_bytes(data, *, source='<bytes>')

Parse an XMP from in-memory bytes.

Counterpart to :func:parse_xmp that avoids a filesystem round-trip. Useful when the bytes already live in memory — e.g., content-addressed reads from :class:~chemigram.core.versioning.repo.ImageRepo.

Parameters:

Name	Type	Description	Default
`data`	`bytes`	UTF-8 encoded XMP bytes.	required
`source`	`str`	Human-readable label used in error messages (e.g., `"sha256:abc..."` for content-addressed reads). Defaults to `"<bytes>"`.	`'<bytes>'`

Returns:

Name	Type	Description
`An`	`Xmp`	class:`Xmp`.

Raises:

Type	Description
`XmpParseError`	malformed XML, invalid UTF-8, or missing `<rdf:Description>`.

Source code in src/chemigram/core/xmp.py

def parse_xmp_from_bytes(data: bytes, *, source: str = "<bytes>") -> Xmp:
    """Parse an XMP from in-memory bytes.

    Counterpart to :func:`parse_xmp` that avoids a filesystem
    round-trip. Useful when the bytes already live in memory — e.g.,
    content-addressed reads from
    :class:`~chemigram.core.versioning.repo.ImageRepo`.

    Args:
        data: UTF-8 encoded XMP bytes.
        source: Human-readable label used in error messages (e.g.,
            ``"sha256:abc..."`` for content-addressed reads). Defaults
            to ``"<bytes>"``.

    Returns:
        An :class:`Xmp`.

    Raises:
        XmpParseError: malformed XML, invalid UTF-8, or missing
            ``<rdf:Description>``.
    """
    try:
        text = data.decode("utf-8")
    except UnicodeDecodeError as exc:
        raise XmpParseError(f"{source}: not valid UTF-8: {exc}") from exc

    try:
        root = _defused_fromstring(text)
    except _DefusedParseError as exc:
        raise XmpParseError(f"{source}: malformed XML: {exc}") from exc

    description = root.find(f".//{{{_NS['rdf']}}}Description")
    if description is None:
        raise XmpParseError(f"{source}: missing rdf:Description")

    return _parse_description_to_xmp(description, Path(source))

write_xmp ¶

write_xmp(xmp, path)

Serialize an :class:Xmp back to an XMP file.

Round-trip property (semantic equality): parse_xmp(write_xmp(x, p)) == x

Field ordering on output: raw_extra_fields attributes come first (in their stored order), then first-class fields (rating, label if non-empty, auto_presets_applied, history_end, iop_order_version), then raw_extra_fields child elements, then the synthesized <darktable:history> if non-empty.

Parameters:

Name	Type	Description	Default
`xmp`	`Xmp`	The :class:`Xmp` to serialize.	required
`path`	`Path`	Destination path. Parent directory must exist; file is overwritten if present.	required

Source code in src/chemigram/core/xmp.py

def write_xmp(xmp: Xmp, path: Path) -> None:
    """Serialize an :class:`Xmp` back to an XMP file.

    Round-trip property (semantic equality):
        parse_xmp(write_xmp(x, p)) == x

    Field ordering on output: ``raw_extra_fields`` attributes come
    first (in their stored order), then first-class fields (rating,
    label if non-empty, auto_presets_applied, history_end,
    iop_order_version), then ``raw_extra_fields`` child elements,
    then the synthesized ``<darktable:history>`` if non-empty.

    Args:
        xmp: The :class:`Xmp` to serialize.
        path: Destination path. Parent directory must exist; file is
            overwritten if present.
    """
    xmpmeta = ET.Element(_clark("x", "xmpmeta"))
    xmpmeta.set(_clark("x", "xmptk"), "XMP Core 4.4.0-Exiv2")
    rdf = ET.SubElement(xmpmeta, _clark("rdf", "RDF"))
    desc = ET.SubElement(rdf, _clark("rdf", "Description"))
    desc.set(_clark("rdf", "about"), "")

    for kind, qname, value in xmp.raw_extra_fields:
        if kind == "attr":
            desc.set(_qname_to_clark(qname), value)

    desc.set(_clark("xmp", "Rating"), str(xmp.rating))
    if xmp.label:
        desc.set(_clark("xmp", "Label"), xmp.label)
    desc.set(
        _clark("darktable", "auto_presets_applied"),
        "1" if xmp.auto_presets_applied else "0",
    )
    desc.set(_clark("darktable", "history_end"), str(xmp.history_end))
    desc.set(_clark("darktable", "iop_order_version"), str(xmp.iop_order_version))

    for kind, _qname, value in xmp.raw_extra_fields:
        if kind == "elem":
            child = _defused_fromstring(value)
            desc.append(child)

    if xmp.history:
        history_elem = ET.SubElement(desc, _clark("darktable", "history"))
        seq = ET.SubElement(history_elem, _clark("rdf", "Seq"))
        for entry in xmp.history:
            ET.SubElement(seq, _clark("rdf", "li"), _history_entry_attrs(entry))

    tree = ET.ElementTree(xmpmeta)
    ET.indent(tree, space=" ", level=0)
    tree.write(path, encoding="utf-8", xml_declaration=True)

synthesize_xmp ¶

synthesize_xmp(baseline, entries)

Compose vocabulary entries onto a baseline XMP (Path A and Path B).

SET semantics (ADR-002, RFC-006 closure / ADR-051): a plugin entry whose (operation, multi_priority) tuple matches a baseline history entry REPLACES that entry in place. num and iop_order are preserved from the baseline slot — Phase 0 finding: SET-replace inherits position implicitly because darktable computes pipeline ordering from the parent iop_order_version and an internal iop_list, not per-<rdf:li> metadata.

Path B (new-instance addition at a previously-unused (operation, multi_priority)) appends a fresh HistoryEntry at num = max(existing) + 1 with iop_order=None. Per RFC-018 v0.2's empirical evidence (tests/fixtures/preflight-evidence/), darktable 5.4.1 resolves pipeline order from the description-level iop_order_version + internal iop_list, so per-entry iop_order is unnecessary. history_end increments to match. Closes RFC-001's iop_order open question (deferred under ADR-051) and supersedes that ADR's NotImplementedError stance.

Among multiple input plugins targeting the same (operation, multi_priority), the last one wins (input order). This deviates from RFC-006's original "synthesizer error" proposal; the closing ADR-051 captures the rationale.

Parameters:

Name	Type	Description	Default
`baseline`	`Xmp`	starting :class:`Xmp`; not mutated.	required
`entries`	`list[DtstyleEntry]`	vocabulary entries; order matters for last-writer-wins among entries that share `(operation, multi_priority)`.	required

Returns:

Type	Description
`Xmp`	A new frozen :class:`Xmp` with synthesized history. Top-level
`Xmp`	metadata (`rating`, `label`, `auto_presets_applied`,
`Xmp`	`iop_order_version`, `raw_extra_fields`) is preserved
`Xmp`	verbatim. `history_end` is recomputed as `len(history)`
`Xmp`	— typically equal to the baseline value for Path A, larger
`Xmp`	for Path B.

Source code in src/chemigram/core/xmp.py

def synthesize_xmp(baseline: Xmp, entries: list[DtstyleEntry]) -> Xmp:
    """Compose vocabulary entries onto a baseline XMP (Path A and Path B).

    SET semantics (ADR-002, RFC-006 closure / ADR-051): a plugin entry
    whose ``(operation, multi_priority)`` tuple matches a baseline
    history entry REPLACES that entry in place. ``num`` and
    ``iop_order`` are preserved from the baseline slot — Phase 0 finding:
    SET-replace inherits position implicitly because darktable computes
    pipeline ordering from the parent ``iop_order_version`` and an
    internal iop_list, not per-``<rdf:li>`` metadata.

    Path B (new-instance addition at a previously-unused
    ``(operation, multi_priority)``) appends a fresh ``HistoryEntry``
    at ``num = max(existing) + 1`` with ``iop_order=None``. Per
    RFC-018 v0.2's empirical evidence
    (``tests/fixtures/preflight-evidence/``), darktable 5.4.1 resolves
    pipeline order from the description-level ``iop_order_version`` +
    internal iop_list, so per-entry ``iop_order`` is unnecessary.
    ``history_end`` increments to match. Closes RFC-001's iop_order
    open question (deferred under ADR-051) and supersedes that ADR's
    NotImplementedError stance.

    Among multiple input plugins targeting the same
    ``(operation, multi_priority)``, the last one wins (input order).
    This deviates from RFC-006's original "synthesizer error" proposal;
    the closing ADR-051 captures the rationale.

    Args:
        baseline: starting :class:`Xmp`; not mutated.
        entries: vocabulary entries; order matters for last-writer-wins
            among entries that share ``(operation, multi_priority)``.

    Returns:
        A new frozen :class:`Xmp` with synthesized history. Top-level
        metadata (``rating``, ``label``, ``auto_presets_applied``,
        ``iop_order_version``, ``raw_extra_fields``) is preserved
        verbatim. ``history_end`` is recomputed as ``len(history)``
        — typically equal to the baseline value for Path A, larger
        for Path B.
    """
    current: list[HistoryEntry] = list(baseline.history)

    for entry in entries:
        for plugin in entry.plugins:
            target_idx: int | None = None
            for i, existing in enumerate(current):
                if (
                    existing.operation == plugin.operation
                    and existing.multi_priority == plugin.multi_priority
                ):
                    target_idx = i
                    break

            if target_idx is None:
                # Path B — new-instance addition. Per RFC-018 v0.2's
                # empirical evidence (tests/fixtures/preflight-evidence/),
                # darktable 5.4.1 resolves pipeline order from the
                # description-level iop_order_version + internal iop_list,
                # so per-entry iop_order stays None. Append a fresh
                # HistoryEntry at num = max(existing) + 1.
                new_num = max((e.num for e in current), default=-1) + 1
                appended = dataclasses.replace(
                    _plugin_to_history(plugin),
                    num=new_num,
                    iop_order=None,
                )
                current.append(appended)
                continue

            replacement = dataclasses.replace(
                _plugin_to_history(plugin),
                num=current[target_idx].num,
                iop_order=current[target_idx].iop_order,
            )
            current[target_idx] = replacement

    return dataclasses.replace(
        baseline,
        history=tuple(current),
        history_end=len(current),
    )

chemigram.core.xmp¶