Agent-Browser Closed Loop Guide¶
Status: Reference Guide Applies to: Local development, agent-assisted UI debugging, E2E test workflows Last updated: April 2026 (default MCP: DevTools first; symmetry rule unchanged)
Overview¶
This guide documents the closed loop between an AI coding agent and the browser for UI development in this project. The loop operates in two modes:
- Automated mode — the agent runs
make test-ui-e2e(Playwright, headless), reads structured feedback, fixes code, and re-runs until green. The browser equivalent ofmake testfor Python. - Live co-development mode — you and the agent share the same Chrome session. You direct ("fix this", "add a spinner here", "why is this column empty"), the agent sees the DOM/console/network, edits code, Vite hot-reloads, and both of you see the result instantly. Bug fixing and iterative feature work are the same loop — only the starting prompt differs.
Both modes feed back into each other: automated catches regressions; live co-development catches UX issues and lets you steer the agent with visual context.
Companion guide: Agent-Pipeline Feedback Loop Guide
covers the Python-side loop (make ci, acceptance tests, --monitor, metrics.json).
Same principle — give the agent direct access to structured feedback — applied to
the pipeline instead of the browser.
Two tools, two jobs¶
| Tool | Job | Analogy |
|---|---|---|
| Playwright MCP | Driving — navigate, click, fill, assert | Automated QA engineer |
| Chrome DevTools MCP | Debugging — network tab, console, DOM, traces | Human dev with DevTools open |
Default MCP choice (agents — do not flip arbitrarily)¶
Pick one channel per bug and keep reproduce → fix → validate on that channel (see Symmetry rule below).
| Situation | Use |
|---|---|
Default for in-browser reproduce / validate on the viewer (navigate, snapshot, evaluate_script, console/network) |
Chrome DevTools MCP (Cursor: often user-devtools) |
Playwright MCP (Cursor: often user-playwright) |
Only when something clearly fits scripted automation better (e.g. multi-tab drive, file upload, trace capture). If you use it, say in one line why and that validation ran there — so the channel stays explicit. |
make test-ui-e2e / CI Playwright is unchanged: that is the test suite, not a substitute for picking DevTools vs Playwright MCP for a given interactive repro.
For a full closed loop you still use both modes: Playwright runs the automated suite; DevTools MCP is the default for live / agent-driven in-Chrome loops unless Playwright MCP is the better fit for that specific task.
Why MCP?¶
MCP (Model Context Protocol) is the communication layer between the agent and the browser tool. It has nothing to do with remote vs local — the browser, the MCP server, and your app all run on your machine. Without MCP, the agent has no formalized way to invoke browser actions and get structured feedback. MCP is the glue.
Browser choice¶
Use Chrome (or Chromium/Edge) for manual dev debugging. Firefox for automated E2E.
The manual "agent attaches to your live session" workflow requires Chrome DevTools Protocol (CDP), which is Chrome-only. Firefox has no equivalent.
This project's Playwright E2E suite (make test-ui-e2e) runs Firefox headlessly
for cross-browser coverage (see web/gi-kg-viewer/playwright.config.ts). That is
separate from the agent-browser loop described here.
| Context | Browser | Why |
|---|---|---|
make test-ui-e2e (CI, automated) |
Firefox | Cross-browser coverage, existing config |
| Agent-driven exploration (MCP) | Chrome (DevTools MCP, default) or Chromium (Playwright MCP) | DevTools attaches to your Chrome; Playwright MCP launches its own Chromium when you chose that channel |
| Live co-development | Chrome | CDP required for DevTools MCP attachment |
Automated mode¶
How it works¶
For Playwright-driven MCP checks (when you chose that channel), the agent opens a headless Chromium browser via Playwright MCP, navigates your app, and gets structured feedback after each action — not just pass/fail at the end. Default interactive repro/validate remains Chrome DevTools MCP (see Default MCP choice above).
What the agent sees is an accessibility snapshot, not a screenshot:
- heading "Podcast Intelligence Platform" [level=1]
- tab "Digest" [selected]
- tab "Library"
- tab "Graph"
- textbox "Search corpus…" [ref=e5]
- button "Search" [ref=e12]
It uses ref=e12 to click — no CSS selectors, no pixel coordinates, no fragile
locators.
With the devtools capability enabled, the agent also sees:
- Console messages — JS errors, warnings,
console.logoutput - Network requests — URL, method, status, request/response payload
- Performance traces — via Chrome DevTools MCP
Typical flow¶
You: "Navigate to the viewer, load the graph, search for 'machine learning',
and check what API calls fire"
Agent: browser_navigate → http://127.0.0.1:5174
browser_click → "Graph" tab
browser_fill → search box
browser_click → "Search" button
browser_network_requests → [sees /api/search?q=machine+learning, 200, payload]
browser_console_messages → [no errors]
→ reports findings
E2E as a validation gate (the primary workflow)¶
make test-ui-e2e is the browser equivalent of make test. After every viewer
change the agent runs it, reads failures, fixes code, and re-runs until green — the
same loop you already have for Python.
| Python workflow | Browser workflow |
|---|---|
Edit src/ |
Edit web/gi-kg-viewer/src/ |
make test (pytest) |
make test-ui-e2e (Playwright) |
| Read failure → fix → re-run | Read failure → fix → re-run |
| Green → done | Green → done |
The E2E suite for the GI/KG viewer:
| Aspect | Value |
|---|---|
| Run command | make test-ui-e2e |
| Config | web/gi-kg-viewer/playwright.config.ts |
| Browser | Firefox (headless) |
| Port | 127.0.0.1:5174 (dedicated, avoids dev server on 5173) |
| Specs | web/gi-kg-viewer/e2e/*.spec.ts |
| Surface contract | web/gi-kg-viewer/e2e/E2E_SURFACE_MAP.md |
When a UI feature is done, the agent should:
- Update or write specs — add/modify
e2e/*.spec.tsto cover the new behavior. TheE2E_SURFACE_MAP.mdlists every surface, its selectors, and which spec owns it — the agent reads this to know what vocabulary to use and which spec to extend. - Run
make test-ui-e2e— the full suite, not just the new spec. Catches regressions in other surfaces. - Read failures, fix, re-run — use the three artifacts below.
- Green → commit — the feature is validated.
This is the bread-and-butter loop. Everything else in this guide (live co-development, MCP exploration) builds on top of it.
What the agent reads after make test-ui-e2e¶
Three artifacts are generated on every run. The agent reads them directly as files — no copy-paste needed.
1. Terminal output (always available)
The list reporter prints pass/fail per test with assertion errors inline. The agent
sees this directly from the Shell tool output. For most failures, this is enough.
2. JSON results — web/gi-kg-viewer/e2e-results.json
A structured JSON file with per-test title, status (passed/failed/timedOut),
duration (ms), and on failure: error.message, error.stack, and the failing
location (file + line). The agent can read this to get a machine-parseable summary
without scraping terminal text.
Example of what the agent sees for a failure:
{
"title": "shows corpus summary counts when API and corpus path are available",
"status": "unexpected",
"results": [{
"status": "failed",
"duration": 15023,
"error": {
"message": "Expected: visible\nReceived: hidden",
"location": { "file": "dashboard.spec.ts", "line": 52 }
}
}]
}
3. Trace files — web/gi-kg-viewer/test-results/ (on failure)
When a test fails locally, Playwright records a trace zip containing DOM snapshots, network requests, console logs, and screenshots at every step. The agent can:
- Read the trace directory listing to find which tests failed
- Run
npx playwright show-trace <path>/trace.zipto open the trace viewer - Or describe the trace path to you so you can open it in a browser
Traces are only persisted for failing tests (locally: retain-on-failure; CI:
on-first-retry). Passing tests do not leave trace files.
4. HTML report — web/gi-kg-viewer/playwright-report/
A rich HTML report with screenshots, trace links, and error details. The agent cannot read HTML directly, but can tell you to open it:
npx playwright show-report web/gi-kg-viewer/playwright-report
How to instruct the agent¶
You don't need special prompts — the agent already runs make test-ui-e2e and reads
the terminal output. But you can get more out of it:
Basic (already works):
"Run make test-ui-e2e and fix any failures"
The agent runs the command, reads terminal output, diagnoses failures, fixes code, re-runs.
With JSON results (richer analysis):
"Run make test-ui-e2e, then read web/gi-kg-viewer/e2e-results.json
and give me a summary: how many passed, failed, total duration,
and details on any failures"
The agent reads the structured JSON and reports a clean summary with test names, durations, and error messages.
With trace inspection (deep debugging):
"Run make test-ui-e2e. If anything fails, check the trace files
in web/gi-kg-viewer/test-results/ — look at the DOM snapshot
and network requests at the point of failure"
The agent lists the trace directory, reads the trace metadata, and reports what the DOM and network looked like when the assertion failed.
Full loop:
"I just finished the Digest topic bands. Update the E2E specs to cover
topic band rendering and click-to-search. Run make test-ui-e2e.
If failures, read e2e-results.json for details, fix, and re-run
until green."
Interactive exploration (beyond the test suite)¶
The agent-browser loop via Playwright MCP complements the test suite — it does not replace it. Use MCP when:
- You need the agent to explore an unfamiliar UI state interactively
- You want to write a new spec and need the agent to discover selectors first
- A test failure is confusing and you want the agent to step through the flow with console and network visibility
E2E surface map as a debugging aid (not only Playwright)¶
web/gi-kg-viewer/e2e/E2E_SURFACE_MAP.md
is the Playwright automation contract, but it is also the best single reference for expected
accessible names, regions, entry paths, and disambiguation (for example, scoping the semantic
Search submit under the Semantic search section (or #search-q) so it does not collide with
the collapsed-left-column Search shortcut button or other Search-labeled controls). Playwright MCP, Chrome DevTools MCP, and human manual repro all read essentially the
same accessibility tree as the test suite. When a spec fails, when an agent mis-clicks in a
snapshot, or when you need a checklist for “what should appear next,” start from the surface map,
then open the owning spec listed there. It does not replace UXS for visual design or the
Server Guide for HTTP; it complements Network / Console / Vue DevTools for label
and flow ground truth.
Live co-development mode¶
You and the agent share the same Chrome session. The agent is attached via Chrome DevTools MCP and can see everything you see — DOM, console, network. You direct changes in natural language, the agent edits code, Vite hot-reloads the change in your browser, and the agent observes the result through the same session. You react, the agent iterates. The loop is continuous.
Bug fixing and feature work are the same workflow — only the opening prompt differs:
- Bug: "The Library episode list shows no publish dates — check the API response"
- Feature: "Add a loading spinner to the Library while episodes are fetching"
- Polish: "Make the topic pills wrap instead of overflow, and use the accent color"
In all three cases the mechanics are identical: agent sees the current state, edits code, sees the hot-reloaded result, you confirm or redirect.
Setup: launch Chrome with debugging port¶
# macOS — launch Chrome with CDP enabled
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-debug-profile
Add a shell alias for convenience:
# In your .zshrc
alias chrome-dev='/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug-profile'
Connect DevTools MCP to this Chrome (Chrome 136+ / macOS)¶
From Chrome 136 onward, remote debugging with the default user data directory is
restricted for security. Chrome may not create DevToolsActivePort under
~/Library/Application Support/Google/Chrome/, and nothing may listen on port 9222
if you only pass --remote-debugging-port on your normal profile.
chrome-devtools-mcp with --autoConnect often discovers the browser by reading
that default path. On a hardened macOS Chrome, you then see errors like
Could not find DevToolsActivePort even when Chrome is running.
What works reliably here: always launch chrome-dev (non-default
--user-data-dir and --remote-debugging-port=9222), then point the MCP server at the
HTTP debug endpoint with --browserUrl http://127.0.0.1:9222 and omit
--autoConnect. After editing MCP config, restart the devtools MCP server (or
reload Cursor) so the new args take effect.
Quick check before relying on the agent:
curl -s http://127.0.0.1:9222/json/version
You should get JSON including webSocketDebuggerUrl. If curl fails, the agent cannot
attach either.
Alternative: --autoConnect plus one-time enablement at
chrome://inspect/#remote-debugging can work on some setups (browser approval flow).
If you use it and still see DevToolsActivePort errors, switch to chrome-dev +
--browserUrl as above.
How the loop works¶
1. You open Chrome (chrome-dev), navigate to http://127.0.0.1:5174
2. Vite dev server is running (make serve or npm run dev in web/gi-kg-viewer)
3. DevTools MCP is configured with --browserUrl http://127.0.0.1:9222 (see IDE setup);
agent attaches — now shares your session
4. You direct: "The episode metadata row is too cramped — add spacing
between the feed name and the publish date"
5. Agent sees:
- The DOM layout you are looking at
- The CSS computed styles on the metadata row
- Any console warnings
6. Agent edits the Vue component (e.g. LibraryView.vue)
7. Vite hot-reloads → your browser updates in place
8. Agent sees the new DOM state through the same attached session
9. Agent confirms: "Spacing added — 8px gap between feed and date.
No console errors. Does this look right?"
10. You: "Good, but also truncate long feed names with ellipsis"
11. Agent edits again → hot-reload → agent verifies → you confirm
12. When satisfied: agent runs make test-ui-e2e to catch regressions
Obligatory validation when fixing a reported UI bug¶
When the user says something in the GI/KG viewer (or any dev-server UI you can reach) is broken — wrong data, broken link, 404 in a new tab, console errors — treat “I fixed it” as a claim that requires evidence. Do not hand the proof obligation back to the user unless you are blocked (see below).
Symmetry rule (non-negotiable): Whatever channel you use to reproduce the bug is the same channel
you must use to confirm the fix, unless that channel becomes impossible after the fix (say so explicitly).
If you reproduced with Chrome DevTools MCP (snapshot, new_page on the failing URL, network row),
you must re-run that same MCP flow after the fix (and after any required API restart). Passing
pytest or make test-ui alone does not replace that check; tests are additional regression
locks, not a substitute for re-checking the reproduction you already ran.
MCP server id: Cursor may expose Chrome DevTools as user-devtools, devtools, or another key—if
call_mcp_tool fails with “server does not exist”, try those names or read the MCP descriptor list; do
not assume the key matches mcp.json verbatim.
Standard sequence (agent runs this, not the user):
- Attach and inspect — Use Chrome DevTools MCP with the user’s live session when available:
list_pages→select_page→take_snapshotto see the relevant control, linkurl, and layout. Read the tool schema in the MCP descriptor folder before the first call (project policy). For expected control names, regions, and multi-step entry paths (and known duplicate labels), cross-checke2e/E2E_SURFACE_MAP.md— same contract as Playwright and a11y snapshots. - Reproduce the failure — For same-tab behavior: interact via snapshot
uids (click, etc.) if needed; uselist_network_requests/get_network_requestfor status codes and response bodies. For “open in new tab” links: usenew_pagewith the exacthref(ornavigate_pageon the selected tab) and read the resulting snapshot (e.g. raw JSON error) and network row (e.g. 404 vs 400). This removes guesswork about encoding, proxy path, or server validation. - Fix — Change viewer (
web/gi-kg-viewer/) and/or API (src/podcast_scraper/server/) as needed. Prefer automated tests as the durable proof: Vitest (make test-ui) for URL builders / pure TS; FastAPITestClienttests undertests/integration/server/(and unit tests undertests/unit/podcast_scraper/server/) for HTTP behavior. Add or extend Playwright specs when the surface is E2E-stable (e2e/E2E_SURFACE_MAP.md). - Re-validate — First: repeat the same reproduction path as step 2 (same MCP tools, same
URL or clicks, same tab vs new tab). Restart long-lived processes you changed (e.g. API after
Python edits) before this repeat. Then: run Makefile targets that cover your edits (e.g.
make test-ui, targeted integration pytest). Treat automated tests as required extras, not as replacing the MCP/browser confirmation when you used MCP/browser to reproduce. Do not say “all good” from tests alone if you reproduced the bug in Chrome and never re-checked Chrome afterward. - Report — Summarize what you verified (MCP observation and/or command output), not only what you changed.
When you are blocked
- DevTools MCP is not enabled, Chrome is not on the debugging port, or no reproducible URL/state — state that explicitly and say what the user must enable or provide.
- The bug requires credentials or production data you cannot access — say so; narrow the claim to what tests prove.
Relationship to E2E
This subsection is the interactive half of the loop; make test-ui-e2e remains the regression
gate before commit. MCP validation catches issues that specs do not yet cover; new or updated specs
lock the behavior in CI.
Graph canvas — neighbourhood dimming and delayed “full brightness”¶
Some bugs only appear after layout or a second store tick (~1–3s). A quick immediate snapshot is not enough.
Typical reproduction (GI/KG viewer): Digest → Open graph and episode details (an episode row) → in the episode rail, Open in graph → wait ≥2.5s before concluding pass/fail.
What to measure in Chrome (DevTools MCP evaluate_script):
- If exposed:
window.__GIKG_CY_DEV__as the Cytoscape instancecy. Compare counts before and after the wait, for examplecy.nodes('.graph-dimmed').length,cy.nodes(':selected').length, and relatedgraph-*classes your change touches. - Cross-check Pinia:
graphConnectionsCyIdon the subject store andpendingFocusNodeIdon graphNavigation (web/gi-kg-viewer/src/stores/subject.ts,graphNavigation.ts). “All bright” with zero dimmed nodes often correlates with focus handoff not sticking (graphConnectionsCyIdcleared or never set).
Implementation reminders:
- Corpus
metadata_relative_pathstrings may not match graph Episode row metadata text; use a stable episode id as a fallback when mapping corpus → Cytoscape id (seeweb/gi-kg-viewer/src/utils/graphEpisodeMetadata.ts). - Any path that clears episode representative state when metadata resolution fails must not wipe a valid
graphConnectionsCyIdalready pointing at an Episode node on the active core graph.
Pipeline and environment pitfalls (viewer bugs)¶
Lessons from fixing merged-graph / filtered-graph behavior (e.g. transcript links, corpus-relative paths):
-
Agent
curltolocalhostis not proof the app is down. The agent environment may not reach your machine’s loopback. If the user says the dev server is up, prefer Chrome DevTools MCP (list_pages,select_page) attached to their session instead of declaring “nothing listening” from a sandbox probe alone. -
Reproduce the same data shape as the user. Single-file load, merged multi-file load, filter toggles, and ego / one-hop focus can hit different code paths. A fix that works for one shape can still fail for another; extend the reproduction (and Vitest) to match how the user loads the graph.
-
Trace the full prop/store pipeline before closing. Example:
displayArtifact→applyGraphFilters→viewWithEgo/filterArtifactEgoOneHop→ railNodeDetail. If any step rebuilds a model object (newParsedArtifact, cloned graph state), grep for constructors that copy only a subset of fields. New metadata fields (paths, maps, ids) must be passed through on every rebuild, or the UI will look “fixed” in unit tests on the raw parse while the rail still sees stripped data. -
Ground-truth the broken URL in the browser. For API-backed links, read the actual
hrefornew_pageURL and decoderelpath(e.g. must start withfeeds/…when the artifact lives under a feed run). That catches encoding issues and wrong-relative paths in one step.
What makes this work¶
- Shared session — the agent inherits your exact browser context (navigation state, loaded corpus, scroll position). No reproduction needed.
- Vite HMR — code changes appear in your browser within milliseconds. The agent does not need to refresh or re-navigate.
- Agent observes the effect — after each edit, the agent reads the updated DOM, console, and network through the still-attached DevTools session. It can self-correct before you even react.
- You stay in control — the agent never navigates away or changes your browser state. It only edits code; the browser reflects the change via hot-reload.
The full closed loop¶
The two modes feed into each other naturally:
┌─────────────────────────────────────────────────┐
│ LIVE CO-DEVELOPMENT (you + agent, same Chrome) │
│ │
│ You direct → agent edits → Vite reloads │
│ → agent sees result → you react → repeat │
│ │
│ When satisfied ↓ │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ AUTOMATED VALIDATION (agent alone, headless) │
│ │
│ Agent updates e2e/*.spec.ts │
│ → make test-ui-e2e (full suite, Firefox) │
│ → failures? fix + re-run │
│ → green [ok] │
└─────────────────────────────────────────────────┘
↓
Ready to commit
↓
(or back to live co-dev if
you spot something new)
Console and network access¶
With the devtools capability enabled, the agent has access to:
| Channel | What it sees |
|---|---|
| Console | JS errors, warnings, your console.log output |
| Network | URL, method, status code, request payload, response body |
| Performance | Traces via Chrome DevTools MCP (performance_start_trace, performance_analyze_insight) |
Example prompt that exercises all three:
"Navigate to /dashboard, trigger a data refresh, then check:
1. Did the /api/corpus/stats request return 200?
2. What payload did it return?
3. Any console errors during the refresh?"
The agent answers all three from a single flow — no manual DevTools needed.
MCP vs Playwright CLI¶
Playwright ships both an MCP server and a standalone CLI. The tradeoff is token cost vs reasoning depth:
| MCP | CLI | |
|---|---|---|
| Token cost | Higher (full accessibility tree in context) | ~4x cheaper |
| Best for | Reasoning about unknown page structure, persistent browser state | Well-defined, repeatable automation |
| Session model | Browser stays alive across turns | Command-per-invocation |
Practical rule: start with CLI for known flows (navigate, click, assert). Switch to MCP when the agent needs to reason about what it finds — exploratory debugging, unfamiliar UI states, complex multi-step reproduction.
Use cases for this project¶
UC-1: Feature done → write specs → green gate¶
Problem: You finish a UI feature (new Episode subject rail, Dashboard chart, search
filter). You need to validate it works and doesn't break anything — the same way
make test validates Python changes.
Tool: make test-ui-e2e (the gate) + Playwright MCP (optional exploration).
Flow:
- Agent finishes the Vue component / store / API integration
- Agent reads
E2E_SURFACE_MAP.mdto find the owning spec and selectors - Agent updates or creates
e2e/*.spec.tsto cover the new behavior - Agent runs
make test-ui-e2e— full suite - Failures → agent reads Playwright output, diagnoses, fixes code or spec, re-runs
- Green → ready to commit
When the spec itself is hard to write: the agent can switch to Playwright MCP
to interactively explore the page — discover what the accessibility tree looks like,
which ref values to use, what network calls fire — then translate that into a
proper spec.
Prompt example:
"I just finished the Episode detail rail. Update the E2E specs to cover:
- clicking an episode in Library opens the rail
- the rail shows episode title, metadata, and key points
- clicking 'Open in graph' switches to the Graph tab
Run make test-ui-e2e and fix any failures."
UC-2: Live co-development on the GI/KG viewer¶
Scenario A — bug: The graph loads but search results don't highlight nodes.
- You're in Chrome, looking at the graph after a search
- "Attach to my browser. I searched 'machine learning' — 5 hits but no nodes highlight. Check the search API response and console."
- Agent inspects
/api/search?q=machine+learningpayload, console,.graph-canvas - Agent finds the mismatch, edits the Vue component, Vite reloads
- You see nodes highlight — "good, but the highlight color is too faint"
- Agent adjusts the CSS, reload, you confirm
- Agent runs
make test-ui-e2eto lock it in
Scenario B — feature: You want to add a "Prefill search" button to the Episode detail rail.
- You're in Chrome, looking at the Episode subject rail in the Library tab
- "Add a 'Prefill search' button below the episode title. When clicked, it should fill the search box with the episode title."
- Agent sees the rail DOM, edits
EpisodeDetailPanel.vue, adds the button + store wiring - Vite reloads — button appears in your browser
- You click it — search box fills — "works, but put it next to 'Open in graph'"
- Agent moves it, reload, you confirm
- Agent updates the E2E spec, runs
make test-ui-e2e→ green
Both scenarios are the same loop: you direct, agent edits, hot-reload, agent observes, you react.
UC-3: Exploratory validation of new UI (agent drives browser)¶
Problem: You want the agent to click through the app and report what it sees — a quick smoke test beyond what the spec suite covers.
Tool: Playwright MCP (automated mode).
Flow:
- Agent navigates to
http://127.0.0.1:5174 - Clicks through each tab — Digest, Library, Graph, Dashboard
- Checks accessibility snapshot after each click (did the tab render content?)
- Checks console for JS errors
- Checks network: did the right API endpoints fire?
- Reports pass/fail per tab with evidence
Prompt example:
"Navigate to localhost:5174, click each main tab in sequence. After each one:
- confirm the tab content rendered (not blank)
- check for console errors
- list any network calls made
Report a summary table."
UC-4: Run comparison tool (Streamlit) — network profiling¶
Problem: The Streamlit comparison tool (make run-compare, port 8501) loads
slowly or shows stale data. You don't know if it's a slow API call, a large payload,
or a rendering issue.
Tool: Chrome DevTools MCP attached to your live Chrome session.
Flow:
- Open the Streamlit app in Chrome (
chrome-dev), navigate to the Performance tab - Trigger the slow action (load a comparison, switch pages)
- Tell the agent: "Profile the network calls that just fired — which took longest, what did they return, any errors?"
- Agent reports: timing per request, payload sizes, any 4xx/5xx
- If a specific call is the culprit: "Trace that back to the Python handler and tell me where the bottleneck is"
Boundary: this covers browser-visible HTTP traffic only. If the bottleneck is
inside a Python function that Streamlit calls synchronously (e.g., a slow model
inference), the agent sees a slow response but not why it's slow internally — that
is py-spy territory (RFC-064).
UC-5: Catching silent failures in pipeline output UI¶
Problem: The pipeline runs, produces output, the UI shows "success" — but the displayed data is wrong or incomplete. No exception was raised.
Tool: Chrome DevTools MCP (live co-development mode).
Flow:
- You see wrong data in the viewer ("episode 42 shows 0 entities extracted")
- Tell the agent: "Look at my browser — the entities count for episode 42 is wrong. Check what the API returned for that row."
- Agent inspects the network response for that specific data fetch
- Finds: API returned correct data but the Vue component transformation mangled it — or: API itself returned 0 (upstream pipeline issue)
- Narrows the bug to frontend vs backend in one step
- If frontend: agent fixes the component, Vite reloads, you see the correct count
This is the "network tab tells you why, not just what" pattern.
UC-6: Post-run UI validation (future)¶
Problem: After a pipeline run, results are written and the viewer should reflect the new data. You want an automated check that the UI actually updated, not just that the files were written.
Tool: Playwright MCP or CLI (headless, scriptable).
Flow:
Post-run hook:
→ Agent navigates to viewer, checks corpus stats
→ Compares: did the episode count increase? Did the latest run appear?
→ Reports UI validation result alongside pipeline output
This closes a gap: you currently validate pipeline output quality, but not whether the reporting layer correctly reflects it. A headless Playwright check after each run catches silent UI regressions without manual review.
Status: Aspirational — not yet implemented.
Playwright test agents (future)¶
Playwright ships three composable agents that can be chained for test generation:
| Agent | Job |
|---|---|
planner |
Explores your app, produces a Markdown test plan |
generator |
Transforms the plan into Playwright test files |
healer |
Executes the suite, automatically repairs failing tests |
Initialize:
npx playwright init-agents
These work on top of Playwright MCP — they are higher-level orchestration, not a replacement. Not yet used in this project; listed here as a future option for accelerating test creation.
CI integration¶
The automated mode is already in CI — the viewer-e2e workflow job runs
make test-ui-e2e headlessly on every PR. Live co-development is inherently local.
What CI provides today:
| What | Status |
|---|---|
make test-ui-e2e as PR gate |
Done (viewer-e2e job) |
| Playwright traces on first retry | Done (trace: 'on-first-retry' in config) |
| Trace/report upload as CI artifacts | Not yet — see backlog §1 below |
Practical next step: upload Playwright trace zips and the HTML report as CI artifacts on failure. When a CI-only failure is hard to reproduce locally, the agent can download the trace and inspect DOM snapshots, network calls, and console logs post-mortem — closing the loop between CI and the local agent workflow.
CI enhancement backlog (optional)¶
Ideas not yet implemented in .github/workflows/; pick up when useful.
| # | Idea | Effort | Value | Suggested order |
|---|---|---|---|---|
| 1 | Upload HTML report + test-results/ (traces) as artifacts on failure |
Low | High | First |
| 2 | Global page.on('console') gate for unexpected console.error |
Low | Medium | Second |
| 3 | toHaveScreenshot visual regression on 4–6 key surfaces |
Medium | Medium-high | Third |
| 4 | axe-core scans on a few key states (@axe-core/playwright) |
Medium | Medium | Fourth |
1 — Artifact upload (sketch) — in the viewer E2E workflow, on failure():
- name: Upload Playwright report on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: web/gi-kg-viewer/playwright-report/
retention-days: 14
- name: Upload Playwright traces on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-traces
path: web/gi-kg-viewer/test-results/
retention-days: 14
Local inspection: npx playwright show-trace test-results/.../trace.zip.
2 — Console error gate — in shared fixtures, collect console.error, allowlist
known benign noise, then annotate or fail in afterEach. Start with annotations
before hard-failing; Vue dev warnings need filtering.
3 — Screenshot diffs — await expect(page).toHaveScreenshot('name.png', {
maxDiffPixelRatio: 0.01, animations: 'disabled' }); commit baselines from Linux CI;
mock APIs for deterministic pixels (see existing mock specs).
4 — Accessibility audit — AxeBuilder with wcag2a / wcag2aa; triage initial
violations, allowlist minor issues, then enforce no new critical regressions on a
small set of routes/states.
Boundary: what these tools cover¶
| Concern | Right tool | Guide |
|---|---|---|
| Browser-visible HTTP traffic | Chrome DevTools MCP | This guide |
| UI rendering, console errors, DOM state | Chrome DevTools MCP / Playwright MCP | This guide |
| Regression testing UI flows | make test-ui-e2e / Playwright MCP |
This guide |
| Python pipeline timing and resources | .monitor.log, metrics.json |
Agent-Pipeline Loop |
| CI failure diagnosis | Terminal output, test logs | Agent-Pipeline Loop |
| Python process CPU/memory profiling | py-spy, memray, RFC-064 profiles |
Agent-Pipeline Loop |
The clean split: anything that crosses the HTTP boundary → this guide (browser tools). Anything inside a Python process → Agent-Pipeline Feedback Loop Guide.
IDE-specific setup¶
Cursor¶
Cursor provides a built-in browser-use subagent type that can navigate pages,
interact with elements, fill forms, and take screenshots. This is available in agent
mode without additional MCP configuration.
For Playwright MCP and Chrome DevTools MCP, add them to your project or user MCP
config (.cursor/mcp.json).
Recommended (live co-development with chrome-dev, Chrome 136+ on macOS): use
--browserUrl so the server talks to port 9222 instead of scanning the default
profile for DevToolsActivePort. Start chrome-dev before the agent uses DevTools
MCP. After changing mcp.json, restart the devtools MCP server or Cursor.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest", "--caps", "devtools"],
"env": {
"PLAYWRIGHT_MCP_CONSOLE_LEVEL": "warning"
}
},
"devtools": {
"command": "npx",
"args": [
"-y",
"chrome-devtools-mcp@latest",
"--browserUrl",
"http://127.0.0.1:9222"
]
}
}
}
Console level options: error | warning | info | debug (each includes more
severe levels).
Alternative — autoConnect: some setups can use --autoConnect instead of
--browserUrl. Enable remote debugging in Chrome at
chrome://inspect/#remote-debugging and approve when prompted. If you use Chrome Dev
channel as your daily browser, add "--channel=dev" to the args array so the
server matches the correct channel. On macOS with recent stable Chrome, autoConnect
often still fails with DevToolsActivePort unless you use chrome-dev +
--browserUrl as above.
{
"mcpServers": {
"devtools": {
"command": "npx",
"args": ["-y", "chrome-devtools-mcp@latest", "--autoConnect"]
}
}
}
Use the npm package name chrome-devtools-mcp (unscoped). The name
@chrome-devtools/mcp is not published and will make npx fail.
Claude Code¶
Add MCP servers via the CLI:
# Playwright MCP with devtools capability
claude mcp add playwright -- npx @playwright/mcp@latest --caps devtools
# Chrome DevTools MCP (recommended: chrome-dev + browserUrl on Chrome 136+ / macOS)
claude mcp add devtools -- npx -y chrome-devtools-mcp@latest \
--browserUrl http://127.0.0.1:9222
# Alternative: autoConnect (chrome://inspect/#remote-debugging) when it works for you
claude mcp add devtools -- npx -y chrome-devtools-mcp@latest --autoConnect
# Verify
claude mcp list
Prerequisites (both IDEs)¶
# Node.js 18+ required
node --version
# Install Playwright browsers (Chromium for MCP, Firefox for E2E suite)
npx playwright install chromium
npx playwright install firefox
Quick reference¶
# Launch Chrome with debugging port (live co-development)
chrome-dev # alias from .zshrc setup above
# Run existing E2E suite (automated, Firefox)
make test-ui-e2e
# Start the GI/KG viewer dev server (for manual browsing)
make serve
# Start the Streamlit comparison tool
make run-compare
Related documentation¶
| Document | Relationship |
|---|---|
| E2E Testing Guide | Playwright E2E suite details, surface map, spec conventions |
| Testing Strategy | Where browser E2E fits in the test pyramid |
| Development Guide | GI/KG viewer dev workflow, make serve |
| Server Guide | FastAPI /api/* endpoints the agent inspects |
| E2E Surface Map | Playwright automation contract — surfaces, selectors, specs |
| VIEWER_IA.md + UXS-001 + feature UXS | Shell IA (VIEWER_IA); shared tokens (UXS-001); per-surface specs (Digest, Library, Graph, Search, Dashboard, …) |
| Polyglot Repo Guide | Python root vs web/gi-kg-viewer/ layout |
| Agent-Pipeline Feedback Loop | Python-side companion: CI, acceptance, --monitor, metrics.json |
External references¶
- Driving vs. Debugging the Browser — Steve Kinney — conceptual breakdown, April 2026
- Chrome DevTools MCP vs Playwright MCP vs CLI — test-lab.ai — decision guide
- Playwright MCP vs CLI — Shipyard — token efficiency tradeoffs
- Playwright MCP official docs
- Chrome DevTools MCP — GitHub