项目地址:alibaba/page-agent | ⭐ 17.7k Stars | 🛠 TypeScript | 🏢 Alibaba
老实说,之前要实现网页自动化,要么装个 Selenium 全家桶,要么搞 Puppeteer 跑个无头浏览器,要么装个浏览器插件——整得跟要搞个大工程似的。Alibaba 开源的 Page Agent 直接打破了这个局面,纯 JS 一个 script 标签就能让你的网页拥有 GUI Agent 能力,连后端都不用动。
🎯 用一句话说清楚它做了什么
Page Agent 是一个纯 JavaScript 实现的 GUI Agent 库。你把它塞进你的网页,然后就能用自然语言指挥它操作页面——点按钮、填表单、抓数据,全都不用写选择器。
最骚的是它不需要截图、不需要多模态模型、不需要浏览器插件。基于文本的 DOM 操作,你的普通 LLM 就能驱动它。
⚡ 一行代码体验
最快的体验方式——在你的页面里加一个 script 标签:
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.1/dist/iife/page-agent.demo.js" crossorigin="true"></script>
⚠️ 这个 CDN 用的是阿里提供的免费测试 LLM API,仅供技术评估。
🛠 正经项目集成
装依赖:
npm install page-agent
然后在代码里初始化:
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'YOUR_API_KEY',
language: 'zh-CN',
})
await agent.execute('点击登录按钮')
就这么简单。再也不用手写 document.querySelector('#login-btn').click() 了。
🔧 更多场景
除了点按钮填表单,还能做 SaaS AI Copilot(几行代码给你的产品加 AI 副驾)、跨页面 Agent(配合 Chrome 扩展)、甚至通过 MCP Server 让外部的 Agent 客户端控制你的浏览器。
Page Agent 的定位很克制——它不为服务端自动化设计,专注在客户端网页增强。实际开发中,这意味着你的 ERP/CRM/管理后台可以瞬间拥有一个会说中文的 AI 操作员。
总结
Page Agent 用纯前端 JS 实现 GUI Agent,无需后端、插件或无头浏览器
一个 script 标签或一行 npm install 就能接入
文本驱动 DOM 操作,普通 LLM 即可运行,不需要多模态模型
支持自定义 LLM、Chrome 扩展、MCP Server
适合 SaaS Copilot、智能表单、无障碍等场景
Project: alibaba/page-agent | ⭐ 17.7k Stars | 🛠 TypeScript | 🏢 Alibaba
Let's be honest — web automation has always meant either installing Selenium, spinning up a headless browser with Puppeteer, or adding a browser extension. Alibaba's open-source Page Agent flips that completely. It's pure JavaScript — one script tag gives your web page GUI Agent capabilities, no backend changes needed.
🎯 What It Actually Does
Page Agent is a JavaScript in-page GUI agent library. Drop it into any web page, and you can control the interface with natural language — click buttons, fill forms, scrape data, all without writing a single CSS selector.
The slick part? No screenshots, no multi-modal models, no browser extensions required. It manipulates the DOM through text-based commands, so any ordinary LLM can drive it.
⚡ Try It in One Line
Fastest way to get started — add one script tag to your page:
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.1/dist/iife/page-agent.demo.js" crossorigin="true"></script>
⚠️ This CDN uses a free testing LLM API — for evaluation only.
🛠 Production Integration
Install via npm:
npm install page-agent
Then initialize in your code:
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'YOUR_API_KEY',
language: 'en-US',
})
await agent.execute('Click the login button')
That's it. No more hand-writing document.querySelector('#login-btn').click().
🔧 More Use Cases
Beyond clicking buttons and filling forms, you can build SaaS AI Copilots (add an AI assistant to your product in lines of code), multi-page agents (via Chrome extension), or expose browser control to external agent clients through MCP Server.
Page Agent is intentionally scoped — it's designed for client-side enhancement, not server-side automation. In practice, that means your ERP, CRM, or admin dashboard can instantly get a natural-language AI operator.
Key Takeaways
Pure frontend JS GUI agent — no backend, no plugins, no headless browsers
One script tag or one npm install to get started
Text-driven DOM manipulation works with any standard LLM
Supports custom LLMs, Chrome extension, MCP Server
Perfect for SaaS copilots, smart forms, and accessibility enhancement