Last verified: 2026-05-06 · Drift risk: medium Official sources: browser-use repo, browser-use on PyPI
browser-use Python Library¶
browser-use is an open-source Python library that wraps browser automation with an LLM-driven agent loop. You provide a goal string and a model. The library handles screenshot capture, DOM extraction, action planning, and execution. It is the fastest way to get a working browser agent without building the action loop yourself.
Installation¶
Install from PyPI:
Then install the browser binary. The library currently uses a Chrome DevTools Protocol (CDP) client as its primary browser driver. You can install Chromium via Playwright's installer:
Note: as of 2025, browser-use migrated from Playwright to a direct CDP implementation for lower-level browser control. The Playwright installer is still a convenient way to get a Chromium binary. If you already have Chrome or Chromium installed at a standard path, the library can use it directly.
With uv:
For the interactive CLI (similar to claude on the command line):
Minimal example¶
import asyncio, os
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent
from browser_use.llm import ChatOpenAI
async def main():
agent = Agent(
task="Go to news.ycombinator.com, find the top story, and return its title and URL.",
llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
)
result = await agent.run()
print(result)
asyncio.run(main())
The task parameter is a natural-language goal string. The library translates this into a loop of: observe the browser state, plan the next action, execute, observe again. The result is a string containing whatever the agent decided to return when it determined the task was complete.
Store your API key in a .env file:
Using with other model providers¶
The llm parameter accepts any LangChain-compatible chat model. To use Claude:
from langchain_anthropic import ChatAnthropic
agent = Agent(
task="Find the current price of AAPL on Yahoo Finance and return it.",
llm=ChatAnthropic(model="claude-opus-4-7"),
)
To use an open-source model via Ollama:
from langchain_ollama import ChatOllama
agent = Agent(
task="Open example.com and return the page title.",
llm=ChatOllama(model="llama3.2"),
)
The quality of the browser agent degrades significantly with less capable models. For production use, choose a current frontier-class OpenAI or Claude model. Smaller models tend to get stuck in loops or fail to complete multi-step navigation.
How it works internally¶
The library maintains a BrowserSession that holds an open browser connection. On each agent step:
- The library captures the current page state: a screenshot plus structured DOM information (element labels, interactive element coordinates, page text).
- Both are sent to the LLM with a system prompt that defines the available actions.
- The LLM returns an action choice (click, type, go_to_url, extract_content, done, etc.).
- The library executes the action against the browser.
- The updated state is observed and the loop continues.
The DOM extraction step is a key differentiator from pure vision approaches. Even if a click target is visually ambiguous in the screenshot, the library can identify it precisely via its DOM position. This makes browser-use more reliable on text-heavy pages, single-page apps, and forms.
Running headless or with a visible browser¶
By default, the library runs with a visible browser window. For server deployments or automated testing, use headless mode:
from browser_use import Agent, BrowserSession
import os
session = BrowserSession(headless=True)
agent = Agent(
task="Check the status of api.example.com.",
llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
browser_session=session,
)
result = await agent.run()
In Docker containers, set IN_DOCKER=True in the environment. The official Docker image browseruse/browseruse includes the browser binary and all dependencies pre-configured.
MCP integration¶
browser-use can load MCP servers and expose their tools to the agent alongside browser actions:
from browser_use import Agent, Tools
from browser_use.mcp.client import MCPClient
from browser_use.llm import ChatOpenAI
import os
async def main():
tools = Tools()
filesystem_client = MCPClient(
server_name="filesystem",
command="npx",
args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/documents"],
)
await filesystem_client.connect()
await filesystem_client.register_to_tools(tools)
agent = Agent(
task="Find the latest PDF report in my documents and summarize its title.",
llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
tools=tools,
)
await agent.run()
await filesystem_client.disconnect()
asyncio.run(main())
This combines browser navigation with local file access in a single agent. The agent decides whether to use a browser action or an MCP tool at each step.
Controlling the agent loop¶
The Agent.run() method accepts a max_steps parameter (default varies by version) that limits the number of actions before the agent is forced to return. Always set an explicit limit in production:
You can also hook into the loop with callbacks:
async def on_step(agent_state):
print(f"Step: {agent_state.n_steps}, Action: {agent_state.last_action}")
agent = Agent(
task="...",
llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
on_step=on_step,
)
Persistent browser sessions¶
For tasks that require maintaining state across multiple agent runs (login sessions, multi-step workflows), reuse a BrowserSession:
session = BrowserSession(headless=False)
# First task: log in
agent1 = Agent(task="Log into example.com with user@example.com and password from env.", llm=llm, browser_session=session)
await agent1.run()
# Second task: use the logged-in session
agent2 = Agent(task="Navigate to the settings page and return the account tier.", llm=llm, browser_session=session)
await agent2.run()
await session.close()
Version notes and stability¶
browser-use is under active development and releases frequently. The library's internal architecture changed significantly in 2025 when it migrated from Playwright to a CDP-first approach. For production deployments, pin to a specific version:
Check the PyPI release history and the repository changelog before upgrading. The main branch frequently contains breaking changes; install from tagged releases.
Known limitations¶
- Browser-only: cannot automate desktop GUI applications outside the browser.
- The agent loop is opaque by default. If the LLM chooses a wrong action, it may take several steps to self-correct, consuming tokens and time.
- Dynamically rendered single-page apps with complex state can confuse the DOM extractor.
- Tasks that require handling CAPTCHAs, MFA, or complex auth flows often need human intervention at those steps.
- Do not use against your personal browser profile. See Operating boundaries.