ADR-052: Separate GIL and KG Artifact Layers¶
- Status: Accepted
- Date: 2026-04-03
- Authors: Podcast Scraper Team
- Related RFCs: RFC-049, RFC-055
- Related PRDs: PRD-017, PRD-019
Context & Problem Statement¶
The system produces two kinds of structured graph data per episode: Grounded Insights (GIL) — evidence-backed takeaways with supporting quotes — and Knowledge Graph (KG) — entities, topics, and their relationships. Both use a nodes-and-edges format, which creates a temptation to merge them into a single artifact. The decision of whether to keep them separate or combined affects contracts, feature flags, CLI namespaces, and every downstream consumer.
Decision¶
We maintain GIL and KG as separate, independent artifact layers:
- Separate files:
*.gi.jsonand*.kg.jsonwith distinct schemas andschema_versionfields. - Separate feature flags:
generate_giandgenerate_kgare independent boolean config fields. Either can be enabled without the other. - Separate CLI namespaces:
podcast_scraper gifor GIL operations,kgfor KG operations. - Separate ontology docs:
docs/architecture/gi/ontology.mdanddocs/architecture/kg/ontology.md. - No cross-references required in v1: Optional links between GIL and KG nodes
(e.g.
insight_idin a KG entity) are permitted but not required.
Rationale¶
- Different product questions: GIL answers "what are the evidence-backed insights?" KG answers "what entities and topics are discussed and how do they relate?" Merging conflates trust (grounding) with structure (linking).
- Independent evolution: GIL and KG have different maturity timelines, different extraction models, and different quality contracts. Coupling them forces synchronized releases.
- Consumer clarity: A consumer that only needs entities (KG) should not parse insight-specific fields, and vice versa.
- Schema stability: Changes to KG node types don't break GIL consumers.
Alternatives Considered¶
- Single merged
graph.json: Rejected; conflates two distinct product contracts, forces consumers to handle all node types, couples release cycles. - KG as extension of GIL (
gi.jsonwith KG section): Rejected; RFC-055 explicitly states "must not store KG as primary payload insidegi.json." Violates schema stability. - Shared node namespace with layer tags: Rejected; adds complexity without benefit. Separate files are simpler and more robust.
Consequences¶
- Positive: Clean separation of concerns. Independent feature flags, schemas, CLI commands, and evolution paths. Consumers opt into what they need.
- Negative: Cross-layer queries (e.g. "insights about entity X") require joining data from two files. Viewer merge logic handles this (RFC-062).
- Neutral: Naming convention (
givskg) must be consistent across CLI, config, docs, and file extensions.
Implementation Notes¶
- Config:
config.Config.generate_gi(bool),config.Config.generate_kg(bool) - CLI:
podcast_scraper gi <subcommand>,podcast_scraper kg <subcommand> - Pipeline: GIL and KG extraction stages run independently; both can run in the same pipeline invocation
- Viewer: Merge logic unifies episodes across layers with prefixed IDs (
g:,k:)
References¶
- RFC-049: GIL Core
- RFC-055: KG Core
- RFC-062: Viewer v2 — cross-layer merge logic