> 项目地址:[bytedance/UI-TARS-desktop](https://github.com/bytedance/UI-TARS-desktop) | ⭐ 34k+ | 🛠 TypeScript | 字节跳动
---
老实说,市面上 GUI Agent 项目不少,但要么只能操作浏览器,要么部署起来一套接一套。字节跳动开源的 **Agent TARS**(UI-TARS-desktop)不一样——它是一个多模态 AI Agent 栈,CLI、桌面端、Web UI 全包了,一条命令就能跑。
## 🚀 一条命令启动
```bash
# 一行命令,无需安装
npx @agent-tars/cli@latest
# 全局安装后运行
npm install @agent-tars/cli@latest -g
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
# 或接字节自家的豆包模型
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
```
Node.js >= 22,装完直接开干。
## 🔧 核心能力
**🖱️ 混合浏览器 Agent** — 支持 GUI Agent(视觉定位)和 DOM 两种控制浏览器的方式。不像 Puppeteer 纯 DOM 操作,它能"看见"页面,遇到动态 canvas 也不慌。
**🌐 远程桌面操作** — v0.2.0 之后支持 Remote Computer Operator,你在本地就能控制远程电脑桌面,零配置全免费。
**🧰 MCP 即内核** — 整个内核基于 MCP 协议构建,天然兼容各种 MCP Server,接数据库、API、文件系统都行。
**📡 Event Stream 协议** — 每一步流式输出,你能实时看到 AI 的思考过程。
## 💻 Desktop 原生应用
UI-TARS Desktop 是桌面端原生 GUI Agent,由 UI-TARS 视觉模型驱动,下载即用。让它"帮我在 VS Code 打开自动保存,延迟改 500ms",它真能看屏幕、找菜单、点选项——跟真人操作一样。底层的 Seed-1.5-VL/1.6 视觉模型识别精度很不错。
## ⚠️ 踩坑
Agent TARS 和 UI-TARS Desktop 是同一个仓库的两个产品。Agent TARS 是通用 Agent 栈(CLI + Web UI),UI-TARS Desktop 是桌面端原生应用。只想在终端用,装 CLI 就够;想要全图形化,下载 Desktop 版。Node.js 别低于 22,我在 18 上直接报错。
## 总结
- 字节开源的多模态 AI Agent 栈,34k Stars,CLI / Web UI / 桌面端三合一
- `npx @agent-tars/cli` 一行启动,混合浏览器控制 + 远程桌面
- MCP 原生集成,能用视觉识别操作屏幕,v0.3.0 已支持流式多工具
> Project: [bytedance/UI-TARS-desktop](https://github.com/bytedance/UI-TARS-desktop) | ⭐ 34k+ | 🛠 TypeScript | ByteDance
---
Let's be real — most GUI Agent projects either only control browsers or require a mountain of setup. ByteDance's open-source **Agent TARS** (UI-TARS-desktop) is different: it's a full multimodal AI Agent stack with CLI, desktop app, and Web UI, all in one. One command and you're running.
## 🚀 One Command to Start
```bash
# One-liner, no install needed
npx @agent-tars/cli@latest
# Or install globally
npm install @agent-tars/cli@latest -g
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
# Or use ByteDance's Doubao model
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
```
Requires Node.js >= 22. That's it.
## 🔧 Core Capabilities
**🖱️ Hybrid Browser Agent** — Supports both GUI Agent (visual grounding) and DOM-based browser control. Unlike Puppeteer's pure DOM approach, it can actually "see" the page, so dynamic canvas elements don't trip it up.
**🌐 Remote Desktop Control** — Since v0.2.0, you can control a remote computer's desktop from your local machine. Zero config, completely free.
**🧰 MCP-Native Kernel** — The entire kernel is built on the MCP protocol, so it naturally supports any MCP Server — databases, APIs, file systems, you name it.
**📡 Event Stream Protocol** — Every step is streamed in real-time, so you see exactly what the AI is thinking, step by step.
## 💻 Desktop Native App
UI-TARS Desktop is the native GUI agent application powered by the UI-TARS vision model. Download and go. Tell it "open VS Code's autosave and set the delay to 500ms" — it actually looks at your screen, finds the menu, clicks the options, just like a human. The Seed-1.5-VL/1.6 vision model underneath delivers impressive accuracy.
## ⚠️ Gotchas
Agent TARS and UI-TARS Desktop are two products in the same repo. Agent TARS is the general stack (CLI + Web UI), UI-TARS Desktop is the native desktop app. If you just want terminal use, install the CLI. If you want full graphical control, grab the Desktop app. And don't use Node.js below 22 — I got errors on v18 right away.
## Summary
- ByteDance's open-source multimodal AI Agent stack with 34k+ Stars — CLI, Web UI, and Desktop all-in-one
- `npx @agent-tars/cli` to start, hybrid browser control + remote desktop
- MCP-native with visual screen recognition; v0.3.0 supports streaming and multi-tool parallelism