RFC-009 — Mask provider protocol shape¶
Status · Decided (closed by ADR-057 at v0.4.0, 2026-04-29); superseded by ADR-076 at v1.5.0, 2026-05-03 — masking-provider Protocol retired in favor of drawn-mask-only architecture. TA anchor ·/components/ai-providers ·/contracts/mcp-tools Related · ADR-007, ADR-021, ADR-022, RFC-004 Closed by · ADR-057 (MaskingProvider Protocol shape); superseded by ADR-076. Why this is an RFC · ADR-007 commits to BYOA via pluggable providers. RFC-004 chooses the v1 default. But the actual
MaskingProviderProtocol shape — the parameters, return types, error contracts — is open. This shape will be the hardest thing to change later: every provider implementation is bound to it. Getting it close-to-right now matters; the alternative is a v2 break that disrupts every external provider.
The question¶
What's the right shape for the MaskingProvider Protocol such that:
- The bundled coarse-agent provider implements it cleanly
- The sibling SAM-based provider (chemigram-masker-sam) implements it cleanly
- A future hosted/cloud provider can implement it without contortions
- A future specialist provider (e.g., chemigram-masker-fish, trained on marine life) fits naturally
- Caching, retry, and error reporting are clean
Use cases¶
- Agent calls
generate_mask(image_id, "subject")→ engine routes to configured provider → receives a PNG. - Agent calls
generate_mask(image_id, "fish", prompt="adult tuna in left third of frame")→ provider with prompt support uses it; provider without ignores it gracefully. - Provider takes 3 seconds; engine returns a placeholder while masking runs in background. (Async or not?)
- Same target masked twice (session interrupted, resumed); cache returns the prior result.
- Provider fails (model error, API outage). Engine surfaces the error to the agent without crashing the session.
Goals¶
- Protocol fits multiple provider styles (local model, hosted service, agent-vision-coarse, specialist)
- Caching works without provider-side complexity
- Error handling is consistent across providers
- Async-friendly (without forcing async on simple synchronous providers)
Constraints¶
- TA/components/ai-providers — pluggable via Protocol
- TA/constraints/byoa — engine doesn't bundle ML
- ADR-022 — masks integrate with the mask registry
Proposed approach¶
Synchronous Protocol with optional async support. Providers that want async can implement the async method too; engine prefers async if available, falls back to sync.
@dataclass
class MaskRequest:
image_id: str
render_path: Path # current preview to mask against
target: str # "subject", "sky", "highlights", etc.
prompt: str | None
name: str | None # optional symbolic name (default: "current_<target>_mask")
request_id: str # for caching/dedup
@dataclass
class MaskResult:
success: bool
mask_path: Path | None # PNG file
diagnostics: dict # provider name, generation params, cache hit/miss
quality_estimate: Literal["approximate", "production"] | None
error_message: str | None # if success=False
class MaskingProvider(Protocol):
@property
def name(self) -> str: ...
@property
def supports_prompts(self) -> bool: ...
@property
def quality_tier(self) -> Literal["approximate", "production"]: ...
def generate(self, request: MaskRequest) -> MaskResult: ...
# Optional async variant. Engine calls this if available.
async def generate_async(self, request: MaskRequest) -> MaskResult:
return self.generate(request)
Caching: the engine wraps providers. Cache key = (image_id, target, prompt, render_hash). Cache lives in the mask registry. Provider doesn't see cache; engine asks once, stores result, returns from cache on subsequent same requests.
Provider configuration: ~/.chemigram/config.toml:
[masking]
provider = "sam" # or "coarse_agent" or "custom"
[masking.sam]
mcp_server = "chemigram-masker-sam"
model = "sam2_hiera_b"
[masking.coarse_agent]
# uses the photo agent's vision; no extra config
Provider registration via MCP: when a provider is an MCP server (e.g., chemigram-masker-sam), it registers itself via MCP service discovery. The engine discovers it from config.toml.
Error reporting: failures return MaskResult(success=False, error_message="...") rather than raising. The agent sees the error in tool result; can choose to fall back to a different provider, retry, or abort the operation.
Alternatives considered¶
-
Full async-only Protocol: rejected — forces async on simple synchronous providers (e.g., the coarse_agent). Mixing sync calls across the engine adds complexity without clear benefit at v1's scale.
-
Streaming results (provider yields multiple masks): rejected — masks are single results per call. If a provider can produce multiple candidate masks, the engine model is "choose the best one before returning"; if multiple-mask comparison becomes a real use case, revisit later.
-
Promote
promptto required parameter: rejected — forces providers without prompt support to implement a no-op. Optional withsupports_promptsintrospection is cleaner. -
Embed caching in the provider Protocol (each provider implements its cache): rejected — gives providers more responsibility than they should have. Engine-side cache is one place; cache invalidation is one logic path.
-
Use a different approach entirely (provider chain, pipeline of maskers): considered for future. v1 has one configured provider per request; chains can be added later if a clear use case emerges.
Trade-offs¶
- The Protocol adds one method (
generate) plus three properties (name,supports_prompts,quality_tier). Slight verbosity for simple providers, but the introspection is what makes the engine's caching and reporting correct. MaskResult.diagnosticsis a free dict; provider-specific fields can land there. The agent reads it as opaque metadata. Mild typing weakness; acceptable for evolution speed.- Async-optional means the engine has two code paths (call sync; call async). Mitigated: Python's
asyncio.iscoroutinefunctionchecks make this cleanly conditional.
Open questions¶
- Is
quality_estimateper-mask or per-provider? Provider-level (declared) and per-mask (indiagnostics) both have value. Proposed: provider-level viaquality_tier, per-mask viadiagnostics["confidence"]if the provider supports it. - Versioning the Protocol. When the Protocol shape evolves (e.g., future v2 adds streaming or chains), how do existing providers handle it? Proposed: providers declare
protocol_version; engine warns or rejects mismatched providers. - MCP-server providers vs in-process providers. Both need to work. Proposed: in-process for the bundled
coarse_agent; MCP-server for any external provider. Both implement the same Python Protocol; the MCP wrapper layer translates. - Mask file format. Specified as PNG (8-bit grayscale, single channel). Should we allow other formats (16-bit, alpha)? Proposed: PNG 8-bit grayscale is sufficient for v1. Document limitations.
- Render hash for caching. The cache key includes
render_hash— but rendering is non-deterministic with--apply-custom-presets false? Proposed: hash of the synthesized XMP serves as the render hash (each XMP produces one render; same XMP → same render).
How this closes¶
This RFC closes into:
- An ADR locking the MaskingProvider Protocol shape as proposed.
- An ADR for the engine-side caching mechanism (cache key, lifecycle, invalidation).
- An ADR or amendment to ADR-022 specifying how the registry integrates with the Protocol.
Links¶
- TA/components/ai-providers
- ADR-007, ADR-021, ADR-022
- RFC-004 (default masking provider)