欣淇
发布于 2026-05-14 / 0 阅读
0
0

🌐 Browser Use:93k Stars,让你的 AI 直接操控真实浏览器,pip install 就搞定

🌐 Browser Use:93k Stars,让你的 AI 直接操控真实浏览器,pip install 就搞定

老实说,每次看到 AI Agent 说自己"能操控浏览器",点进去一看——要么只是个 screenshot 截图工具,要么必须用人家搭好的云环境。你想让它填个表单、买点东西?先配三天 API。

Browser Use 不一样。93,828 Stars / 10,605 Forks,纯 Python 写,pip install 就完事。你的 AI 直接操控你的真实浏览器——本地 Chrome,你每天用的那个。

GitHub:https://github.com/browser-use/browser-use

语言:Python | 协议:MIT | 创建:2024-10

Let's be honest — whenever you see an AI Agent that claims it can "control a browser", 9 times out of 10 it's either just taking screenshots or requires a proprietary cloud setup. Try to make it fill a form or buy something? Three days of API config ahead.

Browser Use is different. 93,828 Stars / 10,605 Forks. Pure Python. pip install and you're done. Your AI directly controls YOUR real browser — the local Chrome you use every day.

GitHub: https://github.com/browser-use/browser-use

Language: Python | License: MIT | Created: 2024-10


它能干什么?

🔥 填表单、投简历 — "帮我填这份工作申请,用我的简历。" Browser Use 自己打开网页、定位输入框、填内容、点提交。完整代码见 apply_to_job.py

🛒 购物 — "帮我把这些加购到 Instacart。" 它能理解自然语言描述的购物清单,逐个搜索、比对、加入购物车。

💻 个人助理 — "帮我找一套 PC 组装件。" 它会自己逛电商网站,比对配置和价格,给你列出来。

最骚的是,所有这些你不需要改一行浏览器配置,不需要什么 remote debugging 端口。

🔥 Form filling & job applications — "Fill in this job application with my resume and information." Browser Use opens the page, locates fields, fills content, and submits. Full code at apply_to_job.py

🛒 Grocery shopping — "Put this list of items into my instacart." It understands natural language shopping lists, searches items one by one, compares, and adds to cart.

💻 Personal assistant — "Help me find parts for a custom PC." It browses e-commerce sites, compares specs and prices, and lists options.

The kicker: zero browser config required. No remote debugging port, no special flags.


上手

# 用 uv(推荐,Python >= 3.11)
uv init && uv add browser-use && uv sync
# 如果没有 Chromium,运行这个
uvx browser-use install

配好 API key:

# .env 文件
BROWSER_USE_API_KEY=***
# 或者用其他模型
# ANTHROPIC_API_KEY=***
# GOOGLE_API_KEY=***

写你的第一个 agent:

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def main():
    browser = Browser()
    agent = Agent(
        task="帮我查一下 browser-use 这个 repo 有多少颗星",
        llm=ChatBrowserUse(),
        browser=browser,
    )
    await agent.run()

if __name__ == "__main__":
    asyncio.run(main())

跑起来就完事了。Agent 会打开浏览器、跳转 GitHub、读取 star 数、然后告诉你结果。

# Using uv (recommended, Python >= 3.11)
uv init && uv add browser-use && uv sync
# Install Chromium if needed
uvx browser-use install

Set up your API key:

# .env file
BROWSER_USE_API_KEY=***
# Or use other models
# ANTHROPIC_API_KEY=***
# GOOGLE_API_KEY=***

Write your first agent:

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def main():
    browser = Browser()
    agent = Agent(
        task="Find the number of stars of the browser-use repo",
        llm=ChatBrowserUse(),
        browser=browser,
    )
    await agent.run()

if __name__ == "__main__":
    asyncio.run(main())

Run it and watch it go. The agent opens the browser, navigates to GitHub, reads the star count, and reports back.


CLI 模式也不错

不想写代码?它内置了 CLI:

# 浏览一个页面
browser-use open https://example.com
# 看看哪些元素可以点
browser-use state
# 点击第 5 个可交互元素
browser-use click 5
# 输入文字
browser-use type "Hello"
# 截图
browser-use screenshot page.png
# 关闭浏览器
browser-use close

CLI 模式保持浏览器一直开着,你可以一步步指挥它,适合调试和快速任务。

Don't feel like writing code? It has a built-in CLI:

# Navigate to a URL
browser-use open https://example.com
# See clickable elements
browser-use state
# Click element by index
browser-use click 5
# Type text
browser-use type "Hello"
# Take screenshot
browser-use screenshot page.png
# Close browser
browser-use close

The CLI keeps the browser running between commands — great for debugging and quick tasks.


几个要点

  • 自定义工具很简单 — 用 @tools.action(description=...) 装饰器就能给 Agent 加自定能力
  • 2. Cloud 版更猛 — 如果你需要抗检测指纹、代理轮换、验证码绕过,他们有付费云服务

    3. Claude Code Skill — 一键安装 mkdir -p ~/.claude/skills/browser-use && curl -o ... 就能在 Claude Code 里直接用

    4. ChatBrowserUse 模型最优 — 他们自己训练的浏览器操控模型,比通用模型快 3-5x,输入 $0.20/百万 token,输出 $2.00/百万 token

  • Custom tools are dead simple — Decorate with @tools.action(description=...) to extend the agent
  • 2. Cloud version is more powerful — Stealth fingerprints, proxy rotation, captcha solving available as a paid service

    3. Claude Code Skill — One-liner install: mkdir -p ~/.claude/skills/browser-use && curl -o ... for direct use in Claude Code

    4. ChatBrowserUse is the optimal model — Their self-trained browser control model, 3-5x faster than general-purpose models. Input $0.20/1M tokens, output $2.00/1M tokens


    评论