Skip to content

Guides

Practical guides for using and developing Podcast Scraper.

Quick Start

Guide Description
Quick Reference Common commands cheat sheet
Troubleshooting Common issues and solutions
Glossary Key terms and concepts

Development

Guide Description
Development Guide Development environment setup, workflow, and GI/KG browser viewer (make serve-gi-kg-viz)
Pipeline and Workflow Guide Pipeline flow, module roles, quirks, run tracking
Git Worktree Guide Git worktree-based development workflow
Dependencies Guide Third-party dependencies and rationale
Markdown Linting Markdown style and linting practices

Testing

Guide Description
Testing Guide Test execution and overview
Unit Testing Guide Unit test patterns and mocking
Integration Testing Guide Integration test guidelines
E2E Testing Guide End-to-end test infrastructure
Critical Path Testing Guide Test prioritization

Provider System

Guide Description
AI Provider Comparison Compare all 9 providers: cost, quality, speed, privacy
ML Model Comparison Compare ML models: Whisper, spaCy, Transformers (BART/LED)
Provider Configuration Quick provider configuration reference
Ollama Provider Guide Ollama installation, setup, troubleshooting, and testing
Provider Implementation Implementing new providers
ML Provider Reference Technical reference for local ML models
Protocol Extension Extending protocols

Features

Guide Description
Semantic Search RFC-061 corpus vector index: config (vector_search), search / index CLIs, semantic gi explore --topic
Grounded Insights Grounded insights (insights + evidence quotes), enabling GIL, gi.json, CLI, schema; optional browser viewer
Knowledge Graph KG (entities, topics, relationships): PRD-019 / RFC-055–056, artifacts, kg CLI; same browser viewer for kg.json
Preprocessing Profiles Understanding and using preprocessing profiles for transcript cleaning
Docker Service Guide Running podcast_scraper as a service-oriented Docker container
Docker Variants Guide LLM-only vs ML-enabled Docker image variants

AI Coding

Guide Description
Cursor AI Best Practices AI-assisted development
Documentation Agent Guide Documentation workflows