Skip to content

Architecture

This directory contains architectural documentation for podcast_scraper — the current system design, quality constraints, testing approach, data contracts, and the platform vision for where the system is heading.

Current state

Document Purpose
Architecture System design — pipeline flow, module map, configuration, ways to run, ADR index
Hosting and infrastructure Always-on VPS, Tailscale, OpenTofu, GitHub Actions, Compose on host — narrative companion to infra ADRs and RFC-082
Non-Functional Requirements Quality constraints — performance, security, reliability, observability, maintainability, scalability
Testing Strategy Test pyramid, patterns, decision criteria, CI integration
Tech Debt Recognised technical debt -- current coping strategy, options, and triggers to revisit

HTTP / viewer: Not a separate architecture doc — the FastAPI surface, /api/* (including Corpus Library, Corpus Digest, semantic search, and index management endpoints), and OpenAPI /docs are specified in the Server Guide (see also Architecture — Ways to run).

Semantic search: FAISS-based vector search over transcript chunks is documented in Architecture — Phase 5a and the Server Guide.

Target state

Document Purpose
Platform Architecture Blueprint Platform vision — multi-tenant platform, distributed ML, two-tier deployment, observability, deployment lifecycle. Concrete RFCs are broken out from individual sections as implementation begins.

Data contracts (ontology specifications)

Folder Contents
gi/ Grounded Insight Layer (GIL) ontology — node/edge types, grounding contract, gi.schema.json
kg/ Knowledge Graph (KG) ontology — entities, topics, relationships, kg.schema.json

Diagrams

Generated architecture visualizations. See diagrams/ for the full list and regeneration instructions.