RFC-056: Knowledge Graph Layer – Use Cases & End-to-End Consumption¶
- Status: Draft
- Authors: Podcast Scraper Team
- Stakeholders: Core team, downstream consumers, integrators
- Execution Timing: Parallel with RFC-055 implementation — Consumption patterns,
CLI contracts, and query shapes evolve as the KG artifact and
kgnamespace land.
Depends on RFC-055 for per-episode KG JSON and ontology.
- Related PRDs:
docs/prd/PRD-019-knowledge-graph-layer.md(Knowledge Graph Layer — KG)docs/prd/PRD-017-grounded-insight-layer.md(separate — GIL /gi; not KG)docs/prd/PRD-018-database-projection-gil-kg.md(optional relational serving via RFC-051)- Related RFCs:
-
docs/rfc/RFC-055-knowledge-graph-layer-core.md(primary dependency — artifact, schema, config,kgvsgiseparation) -
docs/rfc/RFC-051-database-projection-gil-kg.md(Postgres projection for KG tables when enabled) -
docs/rfc/RFC-053-adaptive-summarization-routing.md(optionalroute_kg_extractionalignment with episode profile) -
docs/rfc/RFC-004-filesystem-layout.md(output layout and run scoping) - Related Documents:
docs/architecture/kg/ontology.md— Human-readable ontology (v1 frozen, GitHub #464)docs/architecture/kg/kg.schema.json— Normative JSON Schema (v1 frozen, #464)docs/guides/KNOWLEDGE_GRAPH_GUIDE.md— User-facing guide (living document)
Abstract¶
This RFC defines how the Knowledge Graph Layer (KG) delivers user value after
per-episode extraction. RFC-055 specifies what is stored (nodes, edges, provenance);
this RFC specifies how operators and developers consume that data: exploration
patterns, export and merge strategies, optional database-backed queries, and the kg
CLI surface (aligned with PRD-019).
Relationship to GIL (RFC-050): GIL optimizes for insights + verbatim evidence
(trust and navigation). KG optimizes for entities, topics, and typed relationships
(linking and discovery). Consumption shapes differ: GIL responses center on
insights[] and supporting_quotes[]; KG responses center on graph traversals
and aggregations over nodes and edges (see §Output contracts).
Problem Statement¶
Without explicit consumption design:
- Integrators do not know how to merge episode files into a corpus view
- CLI users lack a documented
kgworkflow comparable togidiscoverability - Success criteria for “KG is useful” stay vague compared to GIL’s Insight Explorer narrative
This RFC closes that gap at the specification level; implementation fills in exact commands and flags.
Goals¶
- Define KG-centric use cases that are distinct from GIL (no grounding requirement)
-
Specify query and aggregation patterns that work over per-episode files (and optionally RFC-051 tables)
-
Establish illustrative output contracts for common operations (not every future query)
-
Align CLI naming with RFC-055:
kgfor graph operations,giunchanged for grounded insights
Non-Goals (this RFC)¶
- Replacing or embedding KG inside
gi.json - Perfect entity resolution across the open web (see PRD-019 non-goals)
- Natural-language query translation (post-v1; structured access first)
Design Principles¶
-
Episode-local production, global consumption: Same pattern as GIL — artifacts are written per episode; consumers build logical union or DB views.
-
Structure over evidence: User value is who/what links to what, not mandatory quote spans (those remain GIL’s contract).
-
Stable IDs within scope: Episode-local or feed-scoped identifiers are acceptable for v1; cross-corpus merging rules live in RFC-055 / ontology.
-
Optional join to GIL: Linking KG nodes to
insight_idis out of scope for v1 unless explicitly added later.
Minimal v1 Use Cases¶
UC1: Cross-episode theme and entity exploration¶
User intent: See which entities or topics recur across episodes from the same show (or run), to prioritize listening or analysis.
Consumption pattern:
-
Scan per-episode KG JSON under the run output tree or query KG tables if PRD-018 / RFC-051 export is enabled.
-
Aggregate by node label or normalized key (exact strategy per ontology).
- Present counts and episode lists; optional co-occurrence via edges.
Illustrative response shape (logical — not a mandatory wire format):
json
{
"scope": "podcast:planet-money",
"entity": {
"id": "kg:entity:federal-reserve",
"label": "Federal Reserve",
"type": "organization"
},
"episode_count": 12,
"episodes": [
{
"episode_id": "episode:abc123",
"title": "Why the Fed raised rates",
"mention_count": 3
}
]
}python
Success criteria:
- User can answer “where did this entity show up?” across processed episodes
- Results are reproducible from files alone (no DB required)
UC2: Structured export for downstream tools¶
User intent: Load KG JSON into RAG, search, or visualization tools with a versioned schema.
Consumption pattern:
- Read
docs/architecture/kg/kg.schema.json-valid artifacts (when published) - Optionally run
kg export(or equivalent) to emit NDJSON, single merged graph snapshot, or SQL insert stubs — exact flags specified at implementation time
Success criteria:
-
Documented path from disk artifact → consumer pipeline (see
docs/guides/KNOWLEDGE_GRAPH_GUIDE.md) -
Schema validation in CI when KG is exercised in tests (per RFC-055)
UC3: Parallel operation with GIL¶
User intent: Same pipeline run produces gi.json (GIL) and KG artifact when
both flags are on, without cross-contamination.
Consumption pattern:
- Operators enable
generate_giandgenerate_kgindependently or together - Downstream jobs read two files per episode; routing and naming stay
givskg(RFC-055)
Success criteria:
- Disabling one feature does not alter the other’s semantics (PRD-019 FR1)
UC4: Operator inspection (CLI)¶
User intent: Inspect KG contents from the terminal the same way operators inspect GIL.
Consumption pattern (implemented in CLI):
kg validate: Validate KG JSON against published schema (--strictfor full JSON Schema).kg inspect: Summarize nodes/edges for one episode or path.kg export: NDJSON or merged JSON bundle over a run output tree.kg entities/kg topics: File-based roll-up and topic co-occurrence (RFC-056 query patterns).
Exact flags are documented in docs/guides/KNOWLEDGE_GRAPH_GUIDE.md and docs/api/CLI.md.
Subcommands live under the kg namespace per RFC-055.
Query Patterns¶
| Pattern | Description | Typical inputs |
|---|---|---|
| Entity roll-up | Count episodes and mentions per entity | Entity id or label |
| Topic co-occurrence | Pairs of topics that appear in same episode | Optional minimum support |
| Subgraph slice | Nodes/edges for one episode | episode_id |
| Feed corpus view | Union of episodes under one podcast/run | Output root or DB |
Natural-language or embedding-based search over KG is out of scope for v1 unless added in a later RFC.
Relational consumption (optional)¶
When RFC-051 is enabled, consumers may use SQL for roll-ups and joins instead of scanning JSON. Table shapes and migrations belong in RFC-051 / PRD-018; this RFC only requires that semantic separation between GIL and KG projections is preserved (separate tables or namespaces).
Testing & acceptance hooks¶
- Unit: Node/edge builders, ID stability, schema validation (RFC-055)
- Integration: Transcript → KG artifact → validate
- E2E (optional): Config path with
generate_kg: trueand artifact assertions
Consumer-focused acceptance tests may mirror GIL acceptance layout under
config/acceptance/ when KG configs exist (directory name kg/ recommended for
symmetry with gi/).
Rollout¶
- Keep
docs/guides/KNOWLEDGE_GRAPH_GUIDE.mdupdated as commands and paths stabilize - Link RFC-056 from PRD-019 and RFC-055
- When CLI is live, update
docs/api/CLI.mdanddocs/api/CONFIGURATION.md