Skip to content

RFC-005 — Pipeline stage protocol — abstract now or YAGNI

Status · Decided (closed by ADR-052 at Slice 1 gate, 2026-04-28) TA anchor ·/components/render-pipeline Related · ADR-004 Closed by · ADR-052 (PipelineStage Protocol with single v1 stage). The Protocol approach was justified — fake stages make unit-testing trivial, and the seam cost ~50 lines. Why this is an RFC · The architecture (04/8) describes a pipeline with multiple stages, but v1 only has one stage (darktable-cli). YAGNI says "just write the function" — abstraction adds complexity for a non-existent second stage. The architecture says "build the seam now to make Phase 3 cleaner." Both have merit. This RFC argues the choice and locks it.

The question

Should v1's render pipeline be:

  • Option A — A PipelineStage Protocol with one implementation (DarktableCliStage). The framework exists; adding stages later is straightforward.
  • Option B — A direct function call (render(raw, xmp, output)) that shells out to darktable-cli. Simpler. Refactor to a Protocol if a second stage ever lands.

YAGNI principle (Option B) vs preserve-the-architectural-seam (Option A). The architecture document committed to the Protocol shape; this RFC examines whether v1 should ship it.

Use cases

What would a "second stage" look like in practice?

  • A future LocalProcessingStage for custom Python algorithms ("apply this denoise, then darktable-cli for the rest").
  • A ScalingStage that delivers different output sizes from one render.
  • An ExternalServiceStage that sends to a remote service (probably never wanted, but possible).

Are any of these likely to ship in the next 6 months? Probably not. Mode B might want a different stage architecture, but Mode B is itself unbuilt.

Goals

  • v1 ships the simplest correct render pipeline
  • Refactoring to Protocol later (if needed) doesn't require breaking changes
  • The right amount of abstraction at the right time

Constraints

  • TA/components/render-pipeline — render pipeline exists; how it's structured is open
  • ADR-004 — darktable-cli invocation form is fixed; the question is whether it's wrapped in a Protocol or called directly

Proposed approach

Option A: Ship the Protocol. Single stage, but the seam exists.

Concrete code shape:

@dataclass
class StageResult:
    success: bool
    output_paths: dict[str, Path]
    elapsed_seconds: float
    diagnostics: dict


class PipelineStage(Protocol):
    @property
    def inputs(self) -> set[str]: ...
    @property
    def outputs(self) -> set[str]: ...
    def run(self, context: dict) -> StageResult: ...


class DarktableCliStage:
    @property
    def inputs(self) -> set[str]:
        return {"raw_path", "xmp_path"}

    @property
    def outputs(self) -> set[str]:
        return {"image_path"}

    def run(self, context: dict) -> StageResult:
        # ... shell out to darktable-cli ...
        return StageResult(...)


@dataclass
class Pipeline:
    stages: list[PipelineStage]

    def run(self, context: dict) -> StageResult:
        for stage in self.stages:
            result = stage.run(context)
            if not result.success:
                return result
            context.update(result.output_paths)
        return result

Yes, the Protocol adds ~30 lines vs a direct function call. But: - It's consistent with what's committed in the architecture doc, section 8. - The MCP server's render tools can compose stages declaratively (configuring which stages to run for previews vs exports). - It costs almost nothing in v1 to maintain, and earns refactoring slack later.

The chosen path is Option A.

Alternatives considered

  • Option B (YAGNI, direct function): considered carefully. Saves ~30 lines now, but trades that for a future migration if a second stage lands. Migrations are exactly the kind of work LLM agents struggle with — they pattern-match the existing shape. The seam is cheap insurance.

  • Option C (Pipeline framework but only as docs, not code): rejected — having a documented abstraction with no implementation is the worst of both worlds.

  • Option D (Use a real workflow library, e.g., Prefect): rejected — wildly disproportionate to v1 scope. Custom Protocol is small and right-sized.

Trade-offs

  • Option A's main cost is mental overhead for new contributors reading the code: "why is there a Protocol with one implementation?" Mitigated by clear comments and the architecture document.
  • Option A means the simplest "render this image" call goes through more layers: synthesizer → pipeline → stage → subprocess. Each layer is thin (~5 lines), so the overhead is modest, but stack traces are longer. Acceptable.
  • Option A's Protocol shape might turn out to be wrong when a second stage lands. The first stage tells us a lot about pipeline shape; the second tells us whether that shape generalizes. We may need to revisit the Protocol when stage 2 lands. Acceptable: the current shape is informed by 04 thinking, not arbitrary.

Open questions

  • Sync vs async stages. v1 stages are synchronous (subprocess). Future stages might want to be async (network calls, background processing). Should run() be async from the start? Proposed: keep synchronous in v1; if an async stage lands, refactor to async-everywhere then. Mixing sync/async is worse than either consistently.
  • Stage configuration. How does a stage receive its config (path to darktable-cli, default --width, etc.)? Proposed: constructor parameters; stage instances are configured once and reused.
  • Stage error handling. What does a stage emit when darktable-cli returns nonzero? Proposed: StageResult.success = False, diagnostics populated. The caller decides whether to abort the pipeline or fall through.

How this closes

This RFC closes into: - An ADR locking Option A (Protocol with one stage in v1). - The PipelineStage Protocol shape as documented above; subsequent ADRs (or amendments) refine it as second stages land.

  • TA/components/render-pipeline
  • ADR-004 (darktable-cli invocation form)
  • 04/8 (Pipeline architecture)