Skip to content

Last verified: 2026-05-06 · Drift risk: medium Official sources: browser-use repo, browser-use on PyPI

browser-use Python Library

browser-use is an open-source Python library that wraps browser automation with an LLM-driven agent loop. You provide a goal string and a model. The library handles screenshot capture, DOM extraction, action planning, and execution. It is the fastest way to get a working browser agent without building the action loop yourself.

Installation

Install from PyPI:

pip install browser-use

Then install the browser binary. The library currently uses a Chrome DevTools Protocol (CDP) client as its primary browser driver. You can install Chromium via Playwright's installer:

playwright install chromium --with-deps --no-shell

Note: as of 2025, browser-use migrated from Playwright to a direct CDP implementation for lower-level browser control. The Playwright installer is still a convenient way to get a Chromium binary. If you already have Chrome or Chromium installed at a standard path, the library can use it directly.

With uv:

uv pip install browser-use
uvx playwright install chromium --with-deps --no-shell

For the interactive CLI (similar to claude on the command line):

pip install "browser-use[cli]"
browser-use

Minimal example

import asyncio, os
from dotenv import load_dotenv
load_dotenv()

from browser_use import Agent
from browser_use.llm import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to news.ycombinator.com, find the top story, and return its title and URL.",
        llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
    )
    result = await agent.run()
    print(result)

asyncio.run(main())

The task parameter is a natural-language goal string. The library translates this into a loop of: observe the browser state, plan the next action, execute, observe again. The result is a string containing whatever the agent decided to return when it determined the task was complete.

Store your API key in a .env file:

OPENAI_API_KEY=sk-...
OPENAI_MODEL=REPLACE_WITH_CURRENT_MODEL

Using with other model providers

The llm parameter accepts any LangChain-compatible chat model. To use Claude:

from langchain_anthropic import ChatAnthropic

agent = Agent(
    task="Find the current price of AAPL on Yahoo Finance and return it.",
    llm=ChatAnthropic(model="claude-opus-4-7"),
)

To use an open-source model via Ollama:

from langchain_ollama import ChatOllama

agent = Agent(
    task="Open example.com and return the page title.",
    llm=ChatOllama(model="llama3.2"),
)

The quality of the browser agent degrades significantly with less capable models. For production use, choose a current frontier-class OpenAI or Claude model. Smaller models tend to get stuck in loops or fail to complete multi-step navigation.

How it works internally

The library maintains a BrowserSession that holds an open browser connection. On each agent step:

  1. The library captures the current page state: a screenshot plus structured DOM information (element labels, interactive element coordinates, page text).
  2. Both are sent to the LLM with a system prompt that defines the available actions.
  3. The LLM returns an action choice (click, type, go_to_url, extract_content, done, etc.).
  4. The library executes the action against the browser.
  5. The updated state is observed and the loop continues.

The DOM extraction step is a key differentiator from pure vision approaches. Even if a click target is visually ambiguous in the screenshot, the library can identify it precisely via its DOM position. This makes browser-use more reliable on text-heavy pages, single-page apps, and forms.

Running headless or with a visible browser

By default, the library runs with a visible browser window. For server deployments or automated testing, use headless mode:

from browser_use import Agent, BrowserSession
import os

session = BrowserSession(headless=True)
agent = Agent(
    task="Check the status of api.example.com.",
    llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
    browser_session=session,
)
result = await agent.run()

In Docker containers, set IN_DOCKER=True in the environment. The official Docker image browseruse/browseruse includes the browser binary and all dependencies pre-configured.

MCP integration

browser-use can load MCP servers and expose their tools to the agent alongside browser actions:

from browser_use import Agent, Tools
from browser_use.mcp.client import MCPClient
from browser_use.llm import ChatOpenAI
import os

async def main():
    tools = Tools()

    filesystem_client = MCPClient(
        server_name="filesystem",
        command="npx",
        args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/documents"],
    )
    await filesystem_client.connect()
    await filesystem_client.register_to_tools(tools)

    agent = Agent(
        task="Find the latest PDF report in my documents and summarize its title.",
        llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
        tools=tools,
    )
    await agent.run()
    await filesystem_client.disconnect()

asyncio.run(main())

This combines browser navigation with local file access in a single agent. The agent decides whether to use a browser action or an MCP tool at each step.

Controlling the agent loop

The Agent.run() method accepts a max_steps parameter (default varies by version) that limits the number of actions before the agent is forced to return. Always set an explicit limit in production:

result = await agent.run(max_steps=25)

You can also hook into the loop with callbacks:

async def on_step(agent_state):
    print(f"Step: {agent_state.n_steps}, Action: {agent_state.last_action}")

agent = Agent(
    task="...",
    llm=ChatOpenAI(model=os.environ["OPENAI_MODEL"]),
    on_step=on_step,
)

Persistent browser sessions

For tasks that require maintaining state across multiple agent runs (login sessions, multi-step workflows), reuse a BrowserSession:

session = BrowserSession(headless=False)

# First task: log in
agent1 = Agent(task="Log into example.com with user@example.com and password from env.", llm=llm, browser_session=session)
await agent1.run()

# Second task: use the logged-in session
agent2 = Agent(task="Navigate to the settings page and return the account tier.", llm=llm, browser_session=session)
await agent2.run()

await session.close()

Version notes and stability

browser-use is under active development and releases frequently. The library's internal architecture changed significantly in 2025 when it migrated from Playwright to a CDP-first approach. For production deployments, pin to a specific version:

pip install browser-use==0.7.8

Check the PyPI release history and the repository changelog before upgrading. The main branch frequently contains breaking changes; install from tagged releases.

Known limitations

  • Browser-only: cannot automate desktop GUI applications outside the browser.
  • The agent loop is opaque by default. If the LLM chooses a wrong action, it may take several steps to self-correct, consuming tokens and time.
  • Dynamically rendered single-page apps with complex state can confuse the DOM extractor.
  • Tasks that require handling CAPTCHAs, MFA, or complex auth flows often need human intervention at those steps.
  • Do not use against your personal browser profile. See Operating boundaries.