🥥 CocoIndex：9.3k Stars 的增量引擎，AI Agent 永远读到最新数据

项目地址：cocoindex-io/cocoindex | ⭐ 9,296 Stars | 🛠 Python + Rust | 🍴 694 Forks

搞 AI Agent 最头疼的事是什么？不是模型不够聪明，是 Agent 读到的数据永远是上周的。代码库改了、文档更新了、Slack 里讨论出了新方案——但你的 RAG 索引还在用三天前的快照。

CocoIndex 就是来解决这个问题的。它是个增量数据处理引擎，专门给 AI Agent 和 LLM 应用提供「永远新鲜」的上下文。核心思路很简单：每次只处理变更的部分（Δ，delta），而不是全量重跑。9.3k Stars、694 Forks，GitHub 社区增长很快，已经从最初的代码索引扩展到支持代码库、会议记录、Slack、PDF、视频等多种数据源。

📦 一键安装

Python ≥3.10，pip 直接装：

pip install -U cocoindex

就是这么简单，没有复杂的系统依赖。

🧠 核心功能拆解

增量引擎：React 式数据流

CocoIndex 的编程模型像 React——你只需要声明「目标状态是什么」，引擎自动保持同步。用 @coco.fn 装饰器定义你的数据处理函数，引擎会缓存每次调用的输入和输出哈希。下次运行时，只有输入变了或代码本身变了，才会重新计算。

import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter

@coco.fn(memo=True)  # ← 自动缓存，按哈希判断是否重算
async def index_file(file, table):
    for chunk in RecursiveSplitter().split(await file.read_text()):
        table.declare_row(text=chunk.text, embedding=embed(chunk.text))

@coco.fn
async def main(src):
    table = await postgres.mount_table_target(PG, table_name="docs")
    table.declare_vector_index(column="embedding")
    await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)

coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()

跑一次是回填（backfill），再跑一次——只有改过的文件会重新嵌入。99.9% 的数据直接走缓存命中。

🔌 20+ 生产级示例

项目仓库的 examples/ 目录下直接给了 20+ 完整可跑的例子：

代码嵌入 — 遍历 git 仓库，AST 感知切分，sentence-transformers 嵌入，upsert 到 pgvector/LanceDB

PDF → RAG — 从本地/S3/Google Drive 读 PDF，RecursiveSplitter 切块，嵌入，写入向量库

Hacker News 话题分析 — 通过 Algolia API 拉取 HN 帖子，LLM 提取话题，按权重排序存入 Postgres

会议记录 → 知识图谱 — 从会议转录/ Slack / 播客中提取人物、话题、决策，存入 Neo4j 或 Kuzu

CSV → Kafka — 监控 CSV 文件夹，变更行实时发布到 Kafka topic

每个例子都是完整的 pip install && python run.py 即用，README 里直接抄。

🦀 Rust 核心，Python 接口

引擎核心用 Rust 实现，天然自带并行处理、零拷贝转换和故障隔离。一个坏记录不会拖垮整个流水线。重试、指数退避、死信队列全内置。Python 层是声明式 API，写起来像写普通 Python 脚本，底层性能由 Rust 保证。

🚀 CocoIndex-code MCP Server

团队还做了一个旗舰 MCP 服务器 CocoIndex-code，专门给 AI 编码 Agent 用。它建在 CocoIndex 引擎之上，提供 AST 感知的增量代码索引：

只处理 Δ，每次 commit 后亚秒级刷新

支持 Python / TypeScript / Rust / Go

语义代码搜索（不是 grep，是按「意思」搜）

调用图 + 爆炸半径分析

Claude Code 和 Cursor 一键接入

官方数据：比全量重算省 70% tokens，重索引缓存命中率 80-90%。

⚖️ 对比：为什么增量很重要

市面上做 RAG 管道的工具不少，但大多走「全量重跑」路线：

|------|-----------|---------|---------|

| 手动更新 | 看心情 | 人力成本 | 小项目 |

CocoIndex 的增量思路在数据量大了之后特别香——100GB 的代码库，每天改动的可能只有 0.1%，全量重算浪费 99.9% 的计算。它把「全量重跑」变成了「永远只跑 Δ」。

要点总结

pip install -U cocoindex 即装即用，20+ 完整示例直接抄

声明式 Python API（@coco.fn），引擎自动处理增量同步、缓存、重试

支持多种源（文件系统、S3、Git、Slack、PDF）和目标（pgvector、LanceDB、Neo4j、Kafka）

Rust 核心引擎，亚秒级增量处理，PB 级可扩展

CocoIndex-code MCP Server 给 AI 编码 Agent 提供实时代码索引，省 70% tokens

如果你的 AI Agent 需要「看到」最新数据，而不是上周的快照，CocoIndex 是目前最省心的方案——装完 pip install，写 10 行 Python，你的数据管道就活了。

🥥 CocoIndex: 9.3k Stars — Incremental Engine That Keeps Your AI Agents on Fresh Data

Repo: cocoindex-io/cocoindex | ⭐ 9,296 Stars | 🛠 Python + Rust | 🍴 694 Forks

The hardest problem in building AI agents isn't the model — it's the data going stale. Your codebase changes, docs get updated, Slack discussions yield new insights, but your RAG index is still serving last week's snapshot.

CocoIndex is an incremental data processing engine purpose-built for AI agents and LLM apps that need always-fresh context. The core idea: process only what changed (the Δ, or delta), not the whole corpus every time. 9.3k stars, built with a Rust core and Python API.

Get Started

pip install -U cocoindex

Core Architecture

Declarative Python API with @coco.fn(memo=True) — the engine caches by input hash + code hash. On re-runs, only changed inputs or changed code trigger recomputation.

@coco.fn(memo=True)
async def index_file(file, table):
    for chunk in RecursiveSplitter().split(await file.read_text()):
        table.declare_row(text=chunk.text, embedding=embed(chunk.text))

Key Features

Incremental by default — only Δ reprocessed, 99.9% cache hits at scale

20+ production examples — code indexing, PDF→RAG, meeting→knowledge graph, CSV→Kafka

Rust core — parallel processing, zero-copy transforms, retries, dead-letter queues

Multiple targets — pgvector, LanceDB, Neo4j, Kuzu, Kafka, Postgres

CocoIndex-code MCP server — AST-aware incremental code index for Claude Code & Cursor

Summary

pip install -U cocoindex — instant setup

Write once, sync forever — only the Δ ever recomputes

PB-scale with sub-second freshness for AI agents

菜单

分享

🥥 CocoIndex：9.3k Stars 的增量引擎，AI Agent 永远读到最新数据

🥥 CocoIndex：9.3k Stars 的增量引擎，AI Agent 永远读到最新数据

📦 一键安装

🧠 核心功能拆解

增量引擎：React 式数据流

🔌 20+ 生产级示例

🦀 Rust 核心，Python 接口

🚀 CocoIndex-code MCP Server

⚖️ 对比：为什么增量很重要

要点总结

🥥 CocoIndex: 9.3k Stars — Incremental Engine That Keeps Your AI Agents on Fresh Data

Get Started

Core Architecture

Key Features

Summary

评论

🧠 Mem0：55k Stars 的开源 AI 记忆层，pip install 让你的 Agent 不再"转头就忘" / Mem0: 55k Stars Open-Source Memory Layer for AI Agents

🐺 OpenFang：17.5k Stars 的开源 Agent 操作系统，装了它你的 Agent 就自己干活了

🤖 AionUi：25k Stars 的开源 AI 协作桌面，一个 App 管理所有 Coding Agent / AionUi: Free Open-Source Multi-Agent Cowork Desktop

🍒 Cherry Studio：45k Stars 的跨平台 AI 桌面客户端，一个 App 装下所有大模型

⚡ Mastra：23.9k Stars 的 TypeScript AI Agent 框架，Gatsby 团队出品，一行命令搭好生产级 Agent

🎨 Taste Skill：17k Stars 的 Anti-Slop 前端框架，一句命令让 AI 不再生成丑界面

⚡ Agno：40k Stars 的一站式 Agent 平台 SDK，20 行代码搭出生产级 AI 应用

🔥 GenericAgent：11.4k Stars 的自我进化 Agent，3K 行代码长出专属技能树

🎯 Page Agent：17.8k Stars，阿里开源的 JavaScript 页面 GUI Agent，一行代码给你的网页装上 AI

🦌 DeerFlow：ByteDance's 67k Stars SuperAgent Harness，三行命令跑起一个 Agent 团队