Architecture¶

This directory contains architectural documentation for podcast_scraper — the current system design, quality constraints, testing approach, data contracts, and the platform vision for where the system is heading.

Current state¶

Document	Purpose
Architecture	System design — pipeline flow, module map, configuration, ways to run, ADR index
Hosting and infrastructure	Always-on VPS, Tailscale, OpenTofu, GitHub Actions, Compose on host — narrative companion to infra ADRs and RFC-082
Non-Functional Requirements	Quality constraints — performance, security, reliability, observability, maintainability, scalability
Testing Strategy	Test pyramid, patterns, decision criteria, CI integration
Tech Debt	Recognised technical debt -- current coping strategy, options, and triggers to revisit

HTTP / viewer: Not a separate architecture doc — the FastAPI surface, /api/* (including Corpus Library, Corpus Digest, semantic search, and index management endpoints), and OpenAPI /docs are specified in the Server Guide (see also Architecture — Ways to run).

Semantic search: FAISS-based vector search over transcript chunks is documented in Architecture — Phase 5a and the Server Guide.

Target state¶

Document	Purpose
Platform Architecture Blueprint	Platform vision — multi-tenant platform, distributed ML, two-tier deployment, observability, deployment lifecycle. Concrete RFCs are broken out from individual sections as implementation begins.

Data contracts (ontology specifications)¶

Folder	Contents
gi/	Grounded Insight Layer (GIL) ontology — node/edge types, grounding contract, `gi.schema.json`
kg/	Knowledge Graph (KG) ontology — entities, topics, relationships, `kg.schema.json`

Diagrams¶

Generated architecture visualizations. See diagrams/ for the full list and regeneration instructions.