ADR-007: Universal Episode Identity¶
- Status: Accepted
- Date: 2026-01-11
- Authors: Podcast Scraper Team
- Related RFCs: RFC-011
- Related PRDs: PRD-004
Context & Problem Statement¶
To enable long-term archival and database integration, we need a stable way to identify an episode across multiple runs, even if the title or RSS feed structure changes slightly.
Decision¶
We adopt a multi-tiered Universal Episode Identity strategy:
- Primary: Use the RSS
<guid>tag. This is the official, stable identifier provided by the feed. - Secondary (Fallback): If no GUID is present, generate a deterministic SHA-256 hash based on:
feed_url+episode_title+published_date.
Rationale¶
- Interoperability: Using GUIDs allows our metadata to be joined with other podcasting datasets.
- Stability: Content hashes provide a reliable backup that resists title drifts (provided the date remains).
- Relational Integrity: Stable IDs enable the "AI Quality Platform" to track model improvements against the same episode over months of testing.
Alternatives Considered¶
- Random UUIDs: Rejected as they are not reproducible across different machines or runs.
- Filenames as IDs: Rejected as filenames are sanitized and truncated, making them poor unique keys.
Consequences¶
- Positive: Reliable database primary keys; easy detection of duplicates in large archives.
- Negative: Requires careful normalization of URLs and dates before hashing.