🌐 Browser Use:93k Stars,让你的 AI 直接操控真实浏览器,pip install 就搞定
老实说,每次看到 AI Agent 说自己"能操控浏览器",点进去一看——要么只是个 screenshot 截图工具,要么必须用人家搭好的云环境。你想让它填个表单、买点东西?先配三天 API。
Browser Use 不一样。93,828 Stars / 10,605 Forks,纯 Python 写,pip install 就完事。你的 AI 直接操控你的真实浏览器——本地 Chrome,你每天用的那个。
GitHub:https://github.com/browser-use/browser-use
语言:Python | 协议:MIT | 创建:2024-10
Let's be honest — whenever you see an AI Agent that claims it can "control a browser", 9 times out of 10 it's either just taking screenshots or requires a proprietary cloud setup. Try to make it fill a form or buy something? Three days of API config ahead.
Browser Use is different. 93,828 Stars / 10,605 Forks. Pure Python. pip install and you're done. Your AI directly controls YOUR real browser — the local Chrome you use every day.
GitHub: https://github.com/browser-use/browser-use
Language: Python | License: MIT | Created: 2024-10
它能干什么?
🔥 填表单、投简历 — "帮我填这份工作申请,用我的简历。" Browser Use 自己打开网页、定位输入框、填内容、点提交。完整代码见 apply_to_job.py
🛒 购物 — "帮我把这些加购到 Instacart。" 它能理解自然语言描述的购物清单,逐个搜索、比对、加入购物车。
💻 个人助理 — "帮我找一套 PC 组装件。" 它会自己逛电商网站,比对配置和价格,给你列出来。
最骚的是,所有这些你不需要改一行浏览器配置,不需要什么 remote debugging 端口。
🔥 Form filling & job applications — "Fill in this job application with my resume and information." Browser Use opens the page, locates fields, fills content, and submits. Full code at apply_to_job.py
🛒 Grocery shopping — "Put this list of items into my instacart." It understands natural language shopping lists, searches items one by one, compares, and adds to cart.
💻 Personal assistant — "Help me find parts for a custom PC." It browses e-commerce sites, compares specs and prices, and lists options.
The kicker: zero browser config required. No remote debugging port, no special flags.
上手
# 用 uv(推荐,Python >= 3.11)
uv init && uv add browser-use && uv sync
# 如果没有 Chromium,运行这个
uvx browser-use install
配好 API key:
# .env 文件
BROWSER_USE_API_KEY=***
# 或者用其他模型
# ANTHROPIC_API_KEY=***
# GOOGLE_API_KEY=***
写你的第一个 agent:
from browser_use import Agent, Browser, ChatBrowserUse
import asyncio
async def main():
browser = Browser()
agent = Agent(
task="帮我查一下 browser-use 这个 repo 有多少颗星",
llm=ChatBrowserUse(),
browser=browser,
)
await agent.run()
if __name__ == "__main__":
asyncio.run(main())
跑起来就完事了。Agent 会打开浏览器、跳转 GitHub、读取 star 数、然后告诉你结果。
# Using uv (recommended, Python >= 3.11)
uv init && uv add browser-use && uv sync
# Install Chromium if needed
uvx browser-use install
Set up your API key:
# .env file
BROWSER_USE_API_KEY=***
# Or use other models
# ANTHROPIC_API_KEY=***
# GOOGLE_API_KEY=***
Write your first agent:
from browser_use import Agent, Browser, ChatBrowserUse
import asyncio
async def main():
browser = Browser()
agent = Agent(
task="Find the number of stars of the browser-use repo",
llm=ChatBrowserUse(),
browser=browser,
)
await agent.run()
if __name__ == "__main__":
asyncio.run(main())
Run it and watch it go. The agent opens the browser, navigates to GitHub, reads the star count, and reports back.
CLI 模式也不错
不想写代码?它内置了 CLI:
# 浏览一个页面
browser-use open https://example.com
# 看看哪些元素可以点
browser-use state
# 点击第 5 个可交互元素
browser-use click 5
# 输入文字
browser-use type "Hello"
# 截图
browser-use screenshot page.png
# 关闭浏览器
browser-use close
CLI 模式保持浏览器一直开着,你可以一步步指挥它,适合调试和快速任务。
Don't feel like writing code? It has a built-in CLI:
# Navigate to a URL
browser-use open https://example.com
# See clickable elements
browser-use state
# Click element by index
browser-use click 5
# Type text
browser-use type "Hello"
# Take screenshot
browser-use screenshot page.png
# Close browser
browser-use close
The CLI keeps the browser running between commands — great for debugging and quick tasks.
几个要点
自定义工具很简单 — 用 @tools.action(description=...) 装饰器就能给 Agent 加自定能力
2. Cloud 版更猛 — 如果你需要抗检测指纹、代理轮换、验证码绕过,他们有付费云服务
3. Claude Code Skill — 一键安装 mkdir -p ~/.claude/skills/browser-use && curl -o ... 就能在 Claude Code 里直接用
4. ChatBrowserUse 模型最优 — 他们自己训练的浏览器操控模型,比通用模型快 3-5x,输入 $0.20/百万 token,输出 $2.00/百万 token
Custom tools are dead simple — Decorate with @tools.action(description=...) to extend the agent
2. Cloud version is more powerful — Stealth fingerprints, proxy rotation, captcha solving available as a paid service
3. Claude Code Skill — One-liner install: mkdir -p ~/.claude/skills/browser-use && curl -o ... for direct use in Claude Code
4. ChatBrowserUse is the optimal model — Their self-trained browser control model, 3-5x faster than general-purpose models. Input $0.20/1M tokens, output $2.00/1M tokens