欣淇
发布于 2026-05-11 / 0 阅读
0
0

Page Agent17.7k Stars JS GUI Agent script:

项目地址:alibaba/page-agent | ⭐ 17.7k Stars | 🛠 TypeScript | 🏢 Alibaba

老实说,之前要实现网页自动化,要么装个 Selenium 全家桶,要么搞 Puppeteer 跑个无头浏览器,要么装个浏览器插件——整得跟要搞个大工程似的。Alibaba 开源的 Page Agent 直接打破了这个局面,纯 JS 一个 script 标签就能让你的网页拥有 GUI Agent 能力,连后端都不用动。

🎯 用一句话说清楚它做了什么

Page Agent 是一个纯 JavaScript 实现的 GUI Agent 库。你把它塞进你的网页,然后就能用自然语言指挥它操作页面——点按钮、填表单、抓数据,全都不用写选择器。

最骚的是它不需要截图、不需要多模态模型、不需要浏览器插件。基于文本的 DOM 操作,你的普通 LLM 就能驱动它。

⚡ 一行代码体验

最快的体验方式——在你的页面里加一个 script 标签:

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.1/dist/iife/page-agent.demo.js" crossorigin="true"></script>

⚠️ 这个 CDN 用的是阿里提供的免费测试 LLM API,仅供技术评估。

🛠 正经项目集成

装依赖:

npm install page-agent

然后在代码里初始化:

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY',
    language: 'zh-CN',
})

await agent.execute('点击登录按钮')

就这么简单。再也不用手写 document.querySelector('#login-btn').click() 了。

🔧 更多场景

除了点按钮填表单,还能做 SaaS AI Copilot(几行代码给你的产品加 AI 副驾)、跨页面 Agent(配合 Chrome 扩展)、甚至通过 MCP Server 让外部的 Agent 客户端控制你的浏览器。

Page Agent 的定位很克制——它不为服务端自动化设计,专注在客户端网页增强。实际开发中,这意味着你的 ERP/CRM/管理后台可以瞬间拥有一个会说中文的 AI 操作员。

总结

  • Page Agent 用纯前端 JS 实现 GUI Agent,无需后端、插件或无头浏览器
  • 一个 script 标签或一行 npm install 就能接入
  • 文本驱动 DOM 操作,普通 LLM 即可运行,不需要多模态模型
  • 支持自定义 LLM、Chrome 扩展、MCP Server
  • 适合 SaaS Copilot、智能表单、无障碍等场景
  • Project: alibaba/page-agent | ⭐ 17.7k Stars | 🛠 TypeScript | 🏢 Alibaba

    Let's be honest — web automation has always meant either installing Selenium, spinning up a headless browser with Puppeteer, or adding a browser extension. Alibaba's open-source Page Agent flips that completely. It's pure JavaScript — one script tag gives your web page GUI Agent capabilities, no backend changes needed.

    🎯 What It Actually Does

    Page Agent is a JavaScript in-page GUI agent library. Drop it into any web page, and you can control the interface with natural language — click buttons, fill forms, scrape data, all without writing a single CSS selector.

    The slick part? No screenshots, no multi-modal models, no browser extensions required. It manipulates the DOM through text-based commands, so any ordinary LLM can drive it.

    ⚡ Try It in One Line

    Fastest way to get started — add one script tag to your page:

    <script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.1/dist/iife/page-agent.demo.js" crossorigin="true"></script>
    

    ⚠️ This CDN uses a free testing LLM API — for evaluation only.

    🛠 Production Integration

    Install via npm:

    npm install page-agent
    

    Then initialize in your code:

    import { PageAgent } from 'page-agent'
    
    const agent = new PageAgent({
        model: 'qwen3.5-plus',
        baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
        apiKey: 'YOUR_API_KEY',
        language: 'en-US',
    })
    
    await agent.execute('Click the login button')
    

    That's it. No more hand-writing document.querySelector('#login-btn').click().

    🔧 More Use Cases

    Beyond clicking buttons and filling forms, you can build SaaS AI Copilots (add an AI assistant to your product in lines of code), multi-page agents (via Chrome extension), or expose browser control to external agent clients through MCP Server.

    Page Agent is intentionally scoped — it's designed for client-side enhancement, not server-side automation. In practice, that means your ERP, CRM, or admin dashboard can instantly get a natural-language AI operator.

    Key Takeaways

  • Pure frontend JS GUI agent — no backend, no plugins, no headless browsers
  • One script tag or one npm install to get started
  • Text-driven DOM manipulation works with any standard LLM
  • Supports custom LLMs, Chrome extension, MCP Server
  • Perfect for SaaS copilots, smart forms, and accessibility enhancement

  • 评论