AgentsInFlow
AgentsInFlow

Browser Automation

Control the user's browser through Chrome DevTools Protocol — take screenshots, inspect the DOM, monitor network traffic, and run Lighthouse audits, all from your AI agent.

Overview

Browser automation turns your AI agent from an advisor into an operator. Instead of telling you what to click, the agent clicks it. AgentsInFlow exposes Chrome DevTools Protocol (CDP) through two dedicated CLI bridges — one for the user's Chrome browser and one for the Electron app itself.

[Screenshot: Agent filling a web form through the browser bridge while the terminal shows CDP commands]
BridgeTargetCLIUse Case
Browser BridgeChrome / Chromium tabsaif-browserWeb app interaction, form filling, scraping
Electron BridgeAgentsInFlow rendereraif-electronApp introspection, UI debugging, self-testing

Both bridges require the AgentsInFlow Browser Chrome extension installed and the relevant debug port open. See the MCP & Browser page for initial setup.


Chrome DevTools MCP

AgentsInFlow connects to Chrome through the Chrome DevTools Protocol using the chrome.debugger extension API. The browser bridge extension attaches to tabs on demand, exposing CDP domains like Page, DOM, Network, and Runtime to the agent.

[Screenshot: Chrome DevTools Protocol connection flow — extension attaches to tab, CDP messages flow to the agent CLI]

How It Works

1

The Chrome extension detects the active tab and attaches the debugger via chrome.debugger.attach().

2

CDP commands from the agent CLI are forwarded to the tab through the extension's native messaging host.

3

Responses and events stream back to the agent, enabling real-time browser control.

The debugger version used is 1.3. The extension automatically reattaches if the debugger disconnects (e.g., after a page navigation).


Taking Screenshots

Capture the visible viewport or a full-page screenshot. The agent uses Page.captureScreenshot under the hood and returns the image as a base64-encoded PNG.

[Screenshot: Terminal output showing aif-browser screenshot command with a captured page thumbnail]
CommandDescription
aif-browser screenshotCapture the visible viewport of the active tab
aif-browser screenshot --full-pageCapture the entire scrollable page
aif-browser screenshot --selector "#hero"Capture a specific DOM element by CSS selector

Full-page screenshots on long pages produce large images. Prefer viewport or selector-based captures when possible to keep token costs down.


DOM Inspection

Query and traverse the live DOM tree without opening DevTools manually. The agent uses the CDP DOM domain to resolve nodes, read attributes, and extract text content.

CapabilityCDP MethodExample
Query selectorDOM.querySelectorFind a login button by #submit-btn
Get outer HTMLDOM.getOuterHTMLRead rendered markup of a component
Set attributeDOM.setAttributeValueToggle a disabled attribute
Get box modelDOM.getBoxModelMeasure element dimensions and position

The agent can also evaluate arbitrary JavaScript via Runtime.evaluate for DOM operations not covered by dedicated CDP methods.


Network Monitoring

Enable the CDP Network domain to watch HTTP requests and responses in real time. This is useful for debugging API calls, checking response payloads, or verifying that a form submission reached the server.

[Screenshot: Network log output showing request URLs, methods, status codes, and response sizes]

Key Capabilities

  • Intercept requests and responses with headers, body, and timing
  • Filter by URL pattern or resource type (XHR, Fetch, Document)
  • Retrieve response bodies for JSON APIs or HTML documents
  • Detect failed requests (4xx/5xx) and connection errors

Network monitoring can produce large volumes of data. Use URL filters and limit capture duration to avoid flooding the agent context.


Console Access

Read browser console output and evaluate JavaScript expressions in the page context. The agent subscribes to Runtime.consoleAPICalled events and can execute code via Runtime.evaluate.

OperationDescription
Read logsStream console.log, warn, error messages from the page
Evaluate expressionRun arbitrary JS and return the result (e.g., document.title)
Catch exceptionsSubscribe to Runtime.exceptionThrown to detect runtime errors

Combine console access with network monitoring to get full observability into a web application — the agent sees both the JavaScript runtime state and the HTTP traffic.


The agent can navigate to URLs, click elements, type into inputs, and wait for page events. These operations use a combination of CDP's Page, Input, and DOM domains.

[Screenshot: Agent navigating to a URL, typing into a search field, and clicking a button in sequence]

Common Operations

ActionHow
NavigatePage.navigate — go to a URL and wait for load
ClickResolve element via DOM.querySelector, get coordinates from DOM.getBoxModel, dispatch Input.dispatchMouseEvent
Type textFocus the input, then dispatch Input.dispatchKeyEvent for each character
Wait for loadListen for Page.loadEventFired or Page.domContentEventFired
ScrollEvaluate window.scrollTo() via Runtime.evaluate

Electron DevTools for App Introspection

The Electron bridge (aif-electron) connects to the AgentsInFlow app itself using CDP over its remote debugging port. This lets an agent inspect the app's own UI, evaluate scripts in the renderer, and take screenshots of the application window.

[Screenshot: Electron DevTools bridge connected to the AgentsInFlow renderer showing page list and evaluation output]

Setup

1

Start the app in debug mode: pnpm dev:electron:debug

2

Use aif-electron page list to discover available pages.

3

Evaluate expressions, capture screenshots, or read console messages through the CLI.

CommandDescription
aif-electron page listList all Electron renderer pages
aif-electron evaluate "document.title"Run JS in the renderer context
aif-electron screenshotCapture the app window
aif-electron consoleStream console messages from the renderer

Lighthouse Audits

Run Google Lighthouse audits through the browser bridge to measure performance, accessibility, best practices, and SEO. The agent triggers audits programmatically and parses the resulting scores, giving it data to suggest concrete improvements.

[Screenshot: Lighthouse audit results showing performance, accessibility, best practices, and SEO scores]

Audit Categories

CategoryWhat It Checks
PerformanceFirst Contentful Paint, Largest Contentful Paint, Total Blocking Time, Cumulative Layout Shift
AccessibilityARIA attributes, color contrast, alt text, heading hierarchy
Best PracticesHTTPS usage, deprecated APIs, console errors, image aspect ratios
SEOMeta tags, crawlability, structured data, mobile friendliness

Pair Lighthouse audits with the agent's code-editing capabilities: the agent can run an audit, identify the lowest-scoring category, then open the relevant source file and apply fixes in the same session.


Practical Use Cases

Form Filling & Submission

An agent navigates to a web form, fills in fields based on ticket data, and submits. It screenshots the confirmation page as proof.

Data Scraping

Extract structured data from web pages by querying the DOM, reading table rows, and returning JSON. Useful for competitive analysis or content migration.

Web App Monitoring

Periodically navigate to a dashboard, screenshot it, check for error banners via DOM queries, and report status. Combine with assistant workflows to trigger on a schedule.

Visual Regression Testing

Take screenshots before and after code changes. The agent compares them to flag unintended visual shifts, complementing unit and integration tests.