ADR-008: Database-Agnostic Metadata Schema¶
- Status: Accepted
- Date: 2026-01-11
- Authors: Podcast Scraper Team
- Related RFCs: RFC-011
- Related PRDs: PRD-004
Context & Problem Statement¶
Metadata is consumed by various downstream systems: PostgreSQL (JSONB), MongoDB, ClickHouse, and simple web dashboards. We need a format that is rich enough for all but requires zero transformation for ingestion.
Decision¶
Metadata documents follow a Database-Agnostic Schema:
- Format: Strictly valid JSON.
- Naming:
snake_casefor all fields (SQL-friendly). - Timestamps: ISO 8601 strings (universally parsable).
- Structure: Flat logical groups (
feed,episode,content,processing).
Rationale¶
- Zero-ETL Ingestion: Allows piping JSON files directly into Postgres or Mongo without writing custom "adapter" code.
- Human Readability: JSON remains readable for debugging while providing the structure needed for scale.
- Consistency: Centralizing this in a Pydantic model (RFC-008) ensures all episodes adhere to the same contract.
Alternatives Considered¶
- SQLite Database per Run: Rejected as it's harder to version control and diff than text files.
- YAML: Rejected as the primary metadata format due to poor native support in many database ingestion tools (though kept as an optional human-only export).
Consequences¶
- Positive: High interoperability; rapid development of dashboards and analytics.
- Negative: JSON is slightly more verbose than binary formats (though negligible at this scale).