Browser Automation

Control the user's browser through Chrome DevTools Protocol — take screenshots, inspect the DOM, monitor network traffic, and run Lighthouse audits, all from your AI agent.

Overview

Browser automation turns your AI agent from an advisor into an operator. Instead of telling you what to click, the agent clicks it. AgentsInFlow exposes Chrome DevTools Protocol (CDP) through two dedicated CLI bridges — one for the user's Chrome browser and one for the Electron app itself.

[Screenshot: Agent filling a web form through the browser bridge while the terminal shows CDP commands]

Bridge	Target	CLI	Use Case
Browser Bridge	Chrome / Chromium tabs	`aif-browser`	Web app interaction, form filling, scraping
Electron Bridge	AgentsInFlow renderer	`aif-electron`	App introspection, UI debugging, self-testing

Both bridges require the AgentsInFlow Browser Chrome extension installed and the relevant debug port open. See the MCP & Browser page for initial setup.

Chrome DevTools MCP

AgentsInFlow connects to Chrome through the Chrome DevTools Protocol using the chrome.debugger extension API. The browser bridge extension attaches to tabs on demand, exposing CDP domains like Page, DOM, Network, and Runtime to the agent.

[Screenshot: Chrome DevTools Protocol connection flow — extension attaches to tab, CDP messages flow to the agent CLI]

How It Works

The Chrome extension detects the active tab and attaches the debugger via chrome.debugger.attach().

CDP commands from the agent CLI are forwarded to the tab through the extension's native messaging host.

Responses and events stream back to the agent, enabling real-time browser control.

The debugger version used is 1.3. The extension automatically reattaches if the debugger disconnects (e.g., after a page navigation).

Taking Screenshots

Capture the visible viewport or a full-page screenshot. The agent uses Page.captureScreenshot under the hood and returns the image as a base64-encoded PNG.

[Screenshot: Terminal output showing aif-browser screenshot command with a captured page thumbnail]

Command	Description
`aif-browser screenshot`	Capture the visible viewport of the active tab
`aif-browser screenshot --full-page`	Capture the entire scrollable page
`aif-browser screenshot --selector "#hero"`	Capture a specific DOM element by CSS selector

Full-page screenshots on long pages produce large images. Prefer viewport or selector-based captures when possible to keep token costs down.

DOM Inspection

Query and traverse the live DOM tree without opening DevTools manually. The agent uses the CDP DOM domain to resolve nodes, read attributes, and extract text content.

Capability	CDP Method	Example
Query selector	`DOM.querySelector`	Find a login button by `#submit-btn`
Get outer HTML	`DOM.getOuterHTML`	Read rendered markup of a component
Set attribute	`DOM.setAttributeValue`	Toggle a `disabled` attribute
Get box model	`DOM.getBoxModel`	Measure element dimensions and position

The agent can also evaluate arbitrary JavaScript via Runtime.evaluate for DOM operations not covered by dedicated CDP methods.

Network Monitoring

Enable the CDP Network domain to watch HTTP requests and responses in real time. This is useful for debugging API calls, checking response payloads, or verifying that a form submission reached the server.

[Screenshot: Network log output showing request URLs, methods, status codes, and response sizes]

Key Capabilities

Intercept requests and responses with headers, body, and timing
Filter by URL pattern or resource type (XHR, Fetch, Document)
Retrieve response bodies for JSON APIs or HTML documents
Detect failed requests (4xx/5xx) and connection errors

Network monitoring can produce large volumes of data. Use URL filters and limit capture duration to avoid flooding the agent context.

Console Access

Read browser console output and evaluate JavaScript expressions in the page context. The agent subscribes to Runtime.consoleAPICalled events and can execute code via Runtime.evaluate.

Operation	Description
Read logs	Stream `console.log`, `warn`, `error` messages from the page
Evaluate expression	Run arbitrary JS and return the result (e.g., `document.title`)
Catch exceptions	Subscribe to `Runtime.exceptionThrown` to detect runtime errors

Combine console access with network monitoring to get full observability into a web application — the agent sees both the JavaScript runtime state and the HTTP traffic.

The agent can navigate to URLs, click elements, type into inputs, and wait for page events. These operations use a combination of CDP's Page, Input, and DOM domains.

[Screenshot: Agent navigating to a URL, typing into a search field, and clicking a button in sequence]

Common Operations

Action	How
Navigate	`Page.navigate` — go to a URL and wait for load
Click	Resolve element via `DOM.querySelector`, get coordinates from `DOM.getBoxModel`, dispatch `Input.dispatchMouseEvent`
Type text	Focus the input, then dispatch `Input.dispatchKeyEvent` for each character
Wait for load	Listen for `Page.loadEventFired` or `Page.domContentEventFired`
Scroll	Evaluate `window.scrollTo()` via `Runtime.evaluate`

Electron DevTools for App Introspection

The Electron bridge (aif-electron) connects to the AgentsInFlow app itself using CDP over its remote debugging port. This lets an agent inspect the app's own UI, evaluate scripts in the renderer, and take screenshots of the application window.

[Screenshot: Electron DevTools bridge connected to the AgentsInFlow renderer showing page list and evaluation output]

Setup

Start the app in debug mode: pnpm dev:electron:debug

Use aif-electron page list to discover available pages.

Evaluate expressions, capture screenshots, or read console messages through the CLI.

Command	Description
`aif-electron page list`	List all Electron renderer pages
`aif-electron evaluate "document.title"`	Run JS in the renderer context
`aif-electron screenshot`	Capture the app window
`aif-electron console`	Stream console messages from the renderer

Lighthouse Audits

Run Google Lighthouse audits through the browser bridge to measure performance, accessibility, best practices, and SEO. The agent triggers audits programmatically and parses the resulting scores, giving it data to suggest concrete improvements.

[Screenshot: Lighthouse audit results showing performance, accessibility, best practices, and SEO scores]

Audit Categories

Category	What It Checks
Performance	First Contentful Paint, Largest Contentful Paint, Total Blocking Time, Cumulative Layout Shift
Accessibility	ARIA attributes, color contrast, alt text, heading hierarchy
Best Practices	HTTPS usage, deprecated APIs, console errors, image aspect ratios
SEO	Meta tags, crawlability, structured data, mobile friendliness

Pair Lighthouse audits with the agent's code-editing capabilities: the agent can run an audit, identify the lowest-scoring category, then open the relevant source file and apply fixes in the same session.

Practical Use Cases

Form Filling & Submission

An agent navigates to a web form, fills in fields based on ticket data, and submits. It screenshots the confirmation page as proof.

Data Scraping

Extract structured data from web pages by querying the DOM, reading table rows, and returning JSON. Useful for competitive analysis or content migration.

Web App Monitoring

Periodically navigate to a dashboard, screenshot it, check for error banners via DOM queries, and report status. Combine with assistant workflows to trigger on a schedule.

Visual Regression Testing

Take screenshots before and after code changes. The agent compares them to flag unintended visual shifts, complementing unit and integration tests.