Architecture¶
This directory contains architectural documentation for podcast_scraper — the current system design, quality constraints, testing approach, data contracts, and the platform vision for where the system is heading.
Current state¶
| Document | Purpose |
|---|---|
| Architecture | System design — pipeline flow, module map, configuration, ways to run, ADR index |
| Hosting and infrastructure | Always-on VPS, Tailscale, OpenTofu, GitHub Actions, Compose on host — narrative companion to infra ADRs and RFC-082 |
| Non-Functional Requirements | Quality constraints — performance, security, reliability, observability, maintainability, scalability |
| Testing Strategy | Test pyramid, patterns, decision criteria, CI integration |
| Tech Debt | Recognised technical debt -- current coping strategy, options, and triggers to revisit |
HTTP / viewer: Not a separate architecture doc — the FastAPI surface, /api/* (including Corpus Library, Corpus Digest, semantic search, and index management endpoints), and OpenAPI /docs are specified in the Server Guide (see also Architecture — Ways to run).
Semantic search: FAISS-based vector search over transcript chunks is documented in Architecture — Phase 5a and the Server Guide.
Target state¶
| Document | Purpose |
|---|---|
| Platform Architecture Blueprint | Platform vision — multi-tenant platform, distributed ML, two-tier deployment, observability, deployment lifecycle. Concrete RFCs are broken out from individual sections as implementation begins. |
Data contracts (ontology specifications)¶
| Folder | Contents |
|---|---|
| gi/ | Grounded Insight Layer (GIL) ontology — node/edge types, grounding contract, gi.schema.json |
| kg/ | Knowledge Graph (KG) ontology — entities, topics, relationships, kg.schema.json |
Diagrams¶
Generated architecture visualizations. See diagrams/ for the full list and regeneration instructions.