Browser Automation
Control the user's browser through Chrome DevTools Protocol — take screenshots, inspect the DOM, monitor network traffic, and run Lighthouse audits, all from your AI agent.
Overview
Browser automation turns your AI agent from an advisor into an operator. Instead of telling you what to click, the agent clicks it. AgentsInFlow exposes Chrome DevTools Protocol (CDP) through two dedicated CLI bridges — one for the user's Chrome browser and one for the Electron app itself.
| Bridge | Target | CLI | Use Case |
|---|---|---|---|
| Browser Bridge | Chrome / Chromium tabs | aif-browser | Web app interaction, form filling, scraping |
| Electron Bridge | AgentsInFlow renderer | aif-electron | App introspection, UI debugging, self-testing |
Both bridges require the AgentsInFlow Browser Chrome extension installed and the relevant debug port open. See the MCP & Browser page for initial setup.
Chrome DevTools MCP
AgentsInFlow connects to Chrome through the Chrome DevTools Protocol using the chrome.debugger extension API. The browser bridge extension attaches to tabs on demand, exposing CDP domains like Page, DOM, Network, and Runtime to the agent.
How It Works
The Chrome extension detects the active tab and attaches the debugger via chrome.debugger.attach().
CDP commands from the agent CLI are forwarded to the tab through the extension's native messaging host.
Responses and events stream back to the agent, enabling real-time browser control.
The debugger version used is 1.3. The extension automatically reattaches if the debugger disconnects (e.g., after a page navigation).
Taking Screenshots
Capture the visible viewport or a full-page screenshot. The agent uses Page.captureScreenshot under the hood and returns the image as a base64-encoded PNG.
| Command | Description |
|---|---|
aif-browser screenshot | Capture the visible viewport of the active tab |
aif-browser screenshot --full-page | Capture the entire scrollable page |
aif-browser screenshot --selector "#hero" | Capture a specific DOM element by CSS selector |
Full-page screenshots on long pages produce large images. Prefer viewport or selector-based captures when possible to keep token costs down.
DOM Inspection
Query and traverse the live DOM tree without opening DevTools manually. The agent uses the CDP DOM domain to resolve nodes, read attributes, and extract text content.
| Capability | CDP Method | Example |
|---|---|---|
| Query selector | DOM.querySelector | Find a login button by #submit-btn |
| Get outer HTML | DOM.getOuterHTML | Read rendered markup of a component |
| Set attribute | DOM.setAttributeValue | Toggle a disabled attribute |
| Get box model | DOM.getBoxModel | Measure element dimensions and position |
The agent can also evaluate arbitrary JavaScript via Runtime.evaluate for DOM operations not covered by dedicated CDP methods.
Network Monitoring
Enable the CDP Network domain to watch HTTP requests and responses in real time. This is useful for debugging API calls, checking response payloads, or verifying that a form submission reached the server.
Key Capabilities
- Intercept requests and responses with headers, body, and timing
- Filter by URL pattern or resource type (XHR, Fetch, Document)
- Retrieve response bodies for JSON APIs or HTML documents
- Detect failed requests (4xx/5xx) and connection errors
Network monitoring can produce large volumes of data. Use URL filters and limit capture duration to avoid flooding the agent context.
Console Access
Read browser console output and evaluate JavaScript expressions in the page context. The agent subscribes to Runtime.consoleAPICalled events and can execute code via Runtime.evaluate.
| Operation | Description |
|---|---|
| Read logs | Stream console.log, warn, error messages from the page |
| Evaluate expression | Run arbitrary JS and return the result (e.g., document.title) |
| Catch exceptions | Subscribe to Runtime.exceptionThrown to detect runtime errors |
Combine console access with network monitoring to get full observability into a web application — the agent sees both the JavaScript runtime state and the HTTP traffic.
Page Navigation & Interaction
The agent can navigate to URLs, click elements, type into inputs, and wait for page events. These operations use a combination of CDP's Page, Input, and DOM domains.
Common Operations
| Action | How |
|---|---|
| Navigate | Page.navigate — go to a URL and wait for load |
| Click | Resolve element via DOM.querySelector, get coordinates from DOM.getBoxModel, dispatch Input.dispatchMouseEvent |
| Type text | Focus the input, then dispatch Input.dispatchKeyEvent for each character |
| Wait for load | Listen for Page.loadEventFired or Page.domContentEventFired |
| Scroll | Evaluate window.scrollTo() via Runtime.evaluate |
Electron DevTools for App Introspection
The Electron bridge (aif-electron) connects to the AgentsInFlow app itself using CDP over its remote debugging port. This lets an agent inspect the app's own UI, evaluate scripts in the renderer, and take screenshots of the application window.
Setup
Start the app in debug mode: pnpm dev:electron:debug
Use aif-electron page list to discover available pages.
Evaluate expressions, capture screenshots, or read console messages through the CLI.
| Command | Description |
|---|---|
aif-electron page list | List all Electron renderer pages |
aif-electron evaluate "document.title" | Run JS in the renderer context |
aif-electron screenshot | Capture the app window |
aif-electron console | Stream console messages from the renderer |
Lighthouse Audits
Run Google Lighthouse audits through the browser bridge to measure performance, accessibility, best practices, and SEO. The agent triggers audits programmatically and parses the resulting scores, giving it data to suggest concrete improvements.
Audit Categories
| Category | What It Checks |
|---|---|
| Performance | First Contentful Paint, Largest Contentful Paint, Total Blocking Time, Cumulative Layout Shift |
| Accessibility | ARIA attributes, color contrast, alt text, heading hierarchy |
| Best Practices | HTTPS usage, deprecated APIs, console errors, image aspect ratios |
| SEO | Meta tags, crawlability, structured data, mobile friendliness |
Pair Lighthouse audits with the agent's code-editing capabilities: the agent can run an audit, identify the lowest-scoring category, then open the relevant source file and apply fixes in the same session.
Practical Use Cases
Form Filling & Submission
An agent navigates to a web form, fills in fields based on ticket data, and submits. It screenshots the confirmation page as proof.
Data Scraping
Extract structured data from web pages by querying the DOM, reading table rows, and returning JSON. Useful for competitive analysis or content migration.
Web App Monitoring
Periodically navigate to a dashboard, screenshot it, check for error banners via DOM queries, and report status. Combine with assistant workflows to trigger on a schedule.
Visual Regression Testing
Take screenshots before and after code changes. The agent compares them to flag unintended visual shifts, complementing unit and integration tests.