ADR-060: VectorStore Protocol with Backend Abstraction¶
- Status: Accepted
- Date: 2026-04-03
- Authors: Podcast Scraper Team
- Related RFCs: RFC-061
- Related PRDs: PRD-021
Context & Problem Statement¶
Semantic Corpus Search (RFC-061) requires a vector index over GIL insights, quotes, summary bullets, and transcript chunks. The project needs to support FAISS for CLI/local use (Phase 1) and Qdrant for platform/service mode (Phase 2). Consumers of the search API (CLI, server, viewer) should not be coupled to a specific vector database implementation.
Decision¶
We define a VectorStore protocol (PEP 544) with the following interface:
- Core methods:
upsert(),batch_upsert(),search(),delete(),persist(),stats(). - Standard result types:
SearchResult(doc_id, score, metadata) andIndexStats(total_vectors, doc_type_counts, feeds_indexed, embedding_model, etc.). - Backend implementations:
FaissVectorStore(Phase 1),QdrantVectorStore(Phase 2). Both implement the same protocol. - Metadata as flat dict: Known keys (
doc_type,episode_id,feed_id,publish_date,speaker_id,grounded,char_start,char_end,timestamp_start_ms). Filtering is backend-specific (post-filter for FAISS, native for Qdrant).
Rationale¶
- Decoupling: CLI, server, viewer, and future digest all call
VectorStore.search()without knowing which backend is active. - Migration path: Switching from FAISS to Qdrant is a config change, not a code rewrite. The protocol is ~20 lines.
- Testability: Unit tests mock
VectorStoreprotocol; integration tests test specific backends. - Consistency with project patterns: Follows ADR-020 (Protocol-Based Provider Discovery) — same PEP 544 approach used for transcription/summarization providers.
Alternatives Considered¶
- Raw FAISS API directly: Rejected; locks in FAISS, no migration path to Qdrant or platform mode.
- ChromaDB as all-in-one: Rejected; heavier than FAISS, SQLite-based storage adds fragility, less mature at scale.
- Postgres pgvector (via RFC-051): Rejected for Phase 1; requires Postgres server, violates CLI-first constraint. Good for Phase 3 (platform).
- LangChain/LlamaIndex abstractions: Rejected; too heavy, too opinionated, pulls in large dependency tree for a thin protocol.
Consequences¶
- Positive: Clean backend swap. Consumers write to one interface. Testable with mocks. Aligns with existing protocol patterns.
- Negative: Slight abstraction overhead (~20 lines of protocol code). Metadata filtering differs between backends (post-filter for FAISS, native for Qdrant).
- Neutral: New
faiss-cpudependency (~20 MB) for Phase 1.
Implementation Notes¶
- Module:
src/podcast_scraper/search/protocol.py—VectorStore,SearchResult,IndexStats - FAISS:
src/podcast_scraper/search/faiss_store.py—FaissVectorStore - Pattern: PEP 544 Protocol (same as ADR-020 provider protocols)
- Config:
vector_backend: Literal["faiss", "qdrant"] = "faiss"