Browser and Computer Use¶

Browser and computer use is the capability that lets an AI model control a graphical interface — clicking buttons, filling forms, reading the screen — rather than calling a structured API. When no API exists or when the task involves software designed for human eyes, this approach is the practical alternative.

What you will find in this section¶

Page	What it covers
Comparison	Side-by-side comparison of Anthropic, OpenAI, and browser-use library approaches
Anthropic computer use	Action types, coordinate system, sandbox guidance, and a worked example
OpenAI computer use	Responses API computer tool, batched actions, and differences from Anthropic
browser-use library	Python library for LLM-driven browser automation with Playwright
Operating boundaries	What must never run unsandboxed: personal browsers, payment accounts, healthcare

Why it matters for agent builders¶

The majority of digital workflows were not designed to be automated by API. Legacy enterprise software, government portals, consumer web apps, desktop applications — they expose a GUI and nothing else. Computer use is what makes agents useful in these contexts.

The capability comes with commensurate risk. An agent that can click through your browser can also accept terms of service, trigger purchases, delete files, or send messages. The Operating boundaries page exists because the risk is not hypothetical.

The three main approaches¶

Model-native computer use (Anthropic and OpenAI) — the model provider exposes a tool that accepts a screenshot and returns actions. Your code executes the actions, captures a new screenshot, and feeds it back. The model drives the loop.

Library-based browser automation (browser-use) — an open-source Python library that wraps Playwright with an LLM agent loop. You specify a goal string. The library handles screenshot capture, DOM extraction, action execution, and context management. You supply the model via a LangChain-compatible interface.

Hosted browser agents — cloud services that provide a managed browser sandbox with an agent loop. These abstract away the infrastructure entirely but come with latency and data-sovereignty tradeoffs.

Key terminology¶

Action loop — the cycle of: capture state → send to model → receive actions → execute → repeat.
Sandbox — an isolated environment (VM, container, dedicated browser profile) that limits what a computer-use agent can affect.
Coordinate system — the pixel coordinate space in which mouse clicks and moves are expressed. Different providers handle display scaling differently.
Harness — the code that bridges model outputs to actual OS or browser actions (mouse clicks, keystrokes, scroll events).