RFC-005 — Pipeline stage protocol — abstract now or YAGNI¶
Status · Decided (closed by ADR-052 at Slice 1 gate, 2026-04-28) TA anchor ·/components/render-pipeline Related · ADR-004 Closed by · ADR-052 (PipelineStage Protocol with single v1 stage). The Protocol approach was justified — fake stages make unit-testing trivial, and the seam cost ~50 lines. Why this is an RFC · The architecture (04/8) describes a pipeline with multiple stages, but v1 only has one stage (
darktable-cli). YAGNI says "just write the function" — abstraction adds complexity for a non-existent second stage. The architecture says "build the seam now to make Phase 3 cleaner." Both have merit. This RFC argues the choice and locks it.
The question¶
Should v1's render pipeline be:
- Option A — A
PipelineStageProtocol with one implementation (DarktableCliStage). The framework exists; adding stages later is straightforward. - Option B — A direct function call (
render(raw, xmp, output)) that shells out to darktable-cli. Simpler. Refactor to a Protocol if a second stage ever lands.
YAGNI principle (Option B) vs preserve-the-architectural-seam (Option A). The architecture document committed to the Protocol shape; this RFC examines whether v1 should ship it.
Use cases¶
What would a "second stage" look like in practice?
- A future
LocalProcessingStagefor custom Python algorithms ("apply this denoise, then darktable-cli for the rest"). - A
ScalingStagethat delivers different output sizes from one render. - An
ExternalServiceStagethat sends to a remote service (probably never wanted, but possible).
Are any of these likely to ship in the next 6 months? Probably not. Mode B might want a different stage architecture, but Mode B is itself unbuilt.
Goals¶
- v1 ships the simplest correct render pipeline
- Refactoring to Protocol later (if needed) doesn't require breaking changes
- The right amount of abstraction at the right time
Constraints¶
- TA/components/render-pipeline — render pipeline exists; how it's structured is open
- ADR-004 —
darktable-cliinvocation form is fixed; the question is whether it's wrapped in a Protocol or called directly
Proposed approach¶
Option A: Ship the Protocol. Single stage, but the seam exists.
Concrete code shape:
@dataclass
class StageResult:
success: bool
output_paths: dict[str, Path]
elapsed_seconds: float
diagnostics: dict
class PipelineStage(Protocol):
@property
def inputs(self) -> set[str]: ...
@property
def outputs(self) -> set[str]: ...
def run(self, context: dict) -> StageResult: ...
class DarktableCliStage:
@property
def inputs(self) -> set[str]:
return {"raw_path", "xmp_path"}
@property
def outputs(self) -> set[str]:
return {"image_path"}
def run(self, context: dict) -> StageResult:
# ... shell out to darktable-cli ...
return StageResult(...)
@dataclass
class Pipeline:
stages: list[PipelineStage]
def run(self, context: dict) -> StageResult:
for stage in self.stages:
result = stage.run(context)
if not result.success:
return result
context.update(result.output_paths)
return result
Yes, the Protocol adds ~30 lines vs a direct function call. But: - It's consistent with what's committed in the architecture doc, section 8. - The MCP server's render tools can compose stages declaratively (configuring which stages to run for previews vs exports). - It costs almost nothing in v1 to maintain, and earns refactoring slack later.
The chosen path is Option A.
Alternatives considered¶
-
Option B (YAGNI, direct function): considered carefully. Saves ~30 lines now, but trades that for a future migration if a second stage lands. Migrations are exactly the kind of work LLM agents struggle with — they pattern-match the existing shape. The seam is cheap insurance.
-
Option C (Pipeline framework but only as docs, not code): rejected — having a documented abstraction with no implementation is the worst of both worlds.
-
Option D (Use a real workflow library, e.g., Prefect): rejected — wildly disproportionate to v1 scope. Custom Protocol is small and right-sized.
Trade-offs¶
- Option A's main cost is mental overhead for new contributors reading the code: "why is there a Protocol with one implementation?" Mitigated by clear comments and the architecture document.
- Option A means the simplest "render this image" call goes through more layers: synthesizer → pipeline → stage → subprocess. Each layer is thin (~5 lines), so the overhead is modest, but stack traces are longer. Acceptable.
- Option A's Protocol shape might turn out to be wrong when a second stage lands. The first stage tells us a lot about pipeline shape; the second tells us whether that shape generalizes. We may need to revisit the Protocol when stage 2 lands. Acceptable: the current shape is informed by 04 thinking, not arbitrary.
Open questions¶
- Sync vs async stages. v1 stages are synchronous (subprocess). Future stages might want to be async (network calls, background processing). Should
run()be async from the start? Proposed: keep synchronous in v1; if an async stage lands, refactor to async-everywhere then. Mixing sync/async is worse than either consistently. - Stage configuration. How does a stage receive its config (path to darktable-cli, default
--width, etc.)? Proposed: constructor parameters; stage instances are configured once and reused. - Stage error handling. What does a stage emit when darktable-cli returns nonzero? Proposed:
StageResult.success = False,diagnosticspopulated. The caller decides whether to abort the pipeline or fall through.
How this closes¶
This RFC closes into:
- An ADR locking Option A (Protocol with one stage in v1).
- The PipelineStage Protocol shape as documented above; subsequent ADRs (or amendments) refine it as second stages land.
Links¶
- TA/components/render-pipeline
- ADR-004 (darktable-cli invocation form)
- 04/8 (Pipeline architecture)