Skip to content

RFC-009 — Mask provider protocol shape

Status · Decided (closed by ADR-057 at v0.4.0, 2026-04-29); superseded by ADR-076 at v1.5.0, 2026-05-03 — masking-provider Protocol retired in favor of drawn-mask-only architecture. TA anchor ·/components/ai-providers ·/contracts/mcp-tools Related · ADR-007, ADR-021, ADR-022, RFC-004 Closed by · ADR-057 (MaskingProvider Protocol shape); superseded by ADR-076. Why this is an RFC · ADR-007 commits to BYOA via pluggable providers. RFC-004 chooses the v1 default. But the actual MaskingProvider Protocol shape — the parameters, return types, error contracts — is open. This shape will be the hardest thing to change later: every provider implementation is bound to it. Getting it close-to-right now matters; the alternative is a v2 break that disrupts every external provider.

The question

What's the right shape for the MaskingProvider Protocol such that: - The bundled coarse-agent provider implements it cleanly - The sibling SAM-based provider (chemigram-masker-sam) implements it cleanly - A future hosted/cloud provider can implement it without contortions - A future specialist provider (e.g., chemigram-masker-fish, trained on marine life) fits naturally - Caching, retry, and error reporting are clean

Use cases

  • Agent calls generate_mask(image_id, "subject") → engine routes to configured provider → receives a PNG.
  • Agent calls generate_mask(image_id, "fish", prompt="adult tuna in left third of frame") → provider with prompt support uses it; provider without ignores it gracefully.
  • Provider takes 3 seconds; engine returns a placeholder while masking runs in background. (Async or not?)
  • Same target masked twice (session interrupted, resumed); cache returns the prior result.
  • Provider fails (model error, API outage). Engine surfaces the error to the agent without crashing the session.

Goals

  • Protocol fits multiple provider styles (local model, hosted service, agent-vision-coarse, specialist)
  • Caching works without provider-side complexity
  • Error handling is consistent across providers
  • Async-friendly (without forcing async on simple synchronous providers)

Constraints

  • TA/components/ai-providers — pluggable via Protocol
  • TA/constraints/byoa — engine doesn't bundle ML
  • ADR-022 — masks integrate with the mask registry

Proposed approach

Synchronous Protocol with optional async support. Providers that want async can implement the async method too; engine prefers async if available, falls back to sync.

@dataclass
class MaskRequest:
    image_id: str
    render_path: Path           # current preview to mask against
    target: str                 # "subject", "sky", "highlights", etc.
    prompt: str | None
    name: str | None            # optional symbolic name (default: "current_<target>_mask")
    request_id: str             # for caching/dedup


@dataclass
class MaskResult:
    success: bool
    mask_path: Path | None      # PNG file
    diagnostics: dict           # provider name, generation params, cache hit/miss
    quality_estimate: Literal["approximate", "production"] | None
    error_message: str | None   # if success=False


class MaskingProvider(Protocol):
    @property
    def name(self) -> str: ...
    @property
    def supports_prompts(self) -> bool: ...
    @property
    def quality_tier(self) -> Literal["approximate", "production"]: ...

    def generate(self, request: MaskRequest) -> MaskResult: ...

    # Optional async variant. Engine calls this if available.
    async def generate_async(self, request: MaskRequest) -> MaskResult:
        return self.generate(request)

Caching: the engine wraps providers. Cache key = (image_id, target, prompt, render_hash). Cache lives in the mask registry. Provider doesn't see cache; engine asks once, stores result, returns from cache on subsequent same requests.

Provider configuration: ~/.chemigram/config.toml:

[masking]
provider = "sam"   # or "coarse_agent" or "custom"

[masking.sam]
mcp_server = "chemigram-masker-sam"
model = "sam2_hiera_b"

[masking.coarse_agent]
# uses the photo agent's vision; no extra config

Provider registration via MCP: when a provider is an MCP server (e.g., chemigram-masker-sam), it registers itself via MCP service discovery. The engine discovers it from config.toml.

Error reporting: failures return MaskResult(success=False, error_message="...") rather than raising. The agent sees the error in tool result; can choose to fall back to a different provider, retry, or abort the operation.

Alternatives considered

  • Full async-only Protocol: rejected — forces async on simple synchronous providers (e.g., the coarse_agent). Mixing sync calls across the engine adds complexity without clear benefit at v1's scale.

  • Streaming results (provider yields multiple masks): rejected — masks are single results per call. If a provider can produce multiple candidate masks, the engine model is "choose the best one before returning"; if multiple-mask comparison becomes a real use case, revisit later.

  • Promote prompt to required parameter: rejected — forces providers without prompt support to implement a no-op. Optional with supports_prompts introspection is cleaner.

  • Embed caching in the provider Protocol (each provider implements its cache): rejected — gives providers more responsibility than they should have. Engine-side cache is one place; cache invalidation is one logic path.

  • Use a different approach entirely (provider chain, pipeline of maskers): considered for future. v1 has one configured provider per request; chains can be added later if a clear use case emerges.

Trade-offs

  • The Protocol adds one method (generate) plus three properties (name, supports_prompts, quality_tier). Slight verbosity for simple providers, but the introspection is what makes the engine's caching and reporting correct.
  • MaskResult.diagnostics is a free dict; provider-specific fields can land there. The agent reads it as opaque metadata. Mild typing weakness; acceptable for evolution speed.
  • Async-optional means the engine has two code paths (call sync; call async). Mitigated: Python's asyncio.iscoroutinefunction checks make this cleanly conditional.

Open questions

  • Is quality_estimate per-mask or per-provider? Provider-level (declared) and per-mask (in diagnostics) both have value. Proposed: provider-level via quality_tier, per-mask via diagnostics["confidence"] if the provider supports it.
  • Versioning the Protocol. When the Protocol shape evolves (e.g., future v2 adds streaming or chains), how do existing providers handle it? Proposed: providers declare protocol_version; engine warns or rejects mismatched providers.
  • MCP-server providers vs in-process providers. Both need to work. Proposed: in-process for the bundled coarse_agent; MCP-server for any external provider. Both implement the same Python Protocol; the MCP wrapper layer translates.
  • Mask file format. Specified as PNG (8-bit grayscale, single channel). Should we allow other formats (16-bit, alpha)? Proposed: PNG 8-bit grayscale is sufficient for v1. Document limitations.
  • Render hash for caching. The cache key includes render_hash — but rendering is non-deterministic with --apply-custom-presets false? Proposed: hash of the synthesized XMP serves as the render hash (each XMP produces one render; same XMP → same render).

How this closes

This RFC closes into: - An ADR locking the MaskingProvider Protocol shape as proposed. - An ADR for the engine-side caching mechanism (cache key, lifecycle, invalidation). - An ADR or amendment to ADR-022 specifying how the registry integrates with the Protocol.

  • TA/components/ai-providers
  • ADR-007, ADR-021, ADR-022
  • RFC-004 (default masking provider)