📖 rasbt/LLMs-from-scratch：94k stars 的 GPT 手写教程，从零实现 ChatGPT 级 LLM

项目简介

rasbt/LLMs-from-scratch 是 Sebastian Raschka 编写的 LLM 手写教程，教你从零开始用 PyTorch 实现一个 GPT 类大语言模型。项目与 Manning 出版社的同名书籍配套，94,000+ stars，是目前 GitHub 上最受关注的 LLM 教育类项目之一。

不同于大多数只看 API 调用的教程，这本书带你一行一行把 Transformer、多头注意力、预训练、指令微调全部手写出来。代码不需要 GPU，普通笔记本就能跑。

核心亮点

94,238 Stars — GitHub 教育类项目顶级热度

7 个章节 + 5 个附录 — 从分词器到 RLHF 全流程覆盖

零 GPU 门槛 — 全项目可在普通笔记本上运行

配套视频课程 — 17 小时手敲代码教学

多架构实现 — 包含 Llama 3.2、Qwen 3、Gemma 3/4 等从零实现

快速开始

# 克隆仓库（深度 1 就够了，节省带宽）
git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git
cd LLMs-from-scratch

# 创建虚拟环境
python -m venv venv
source venv/bin/activate

# 安装依赖
pip install -r requirements.txt

# 运行第4章：从零实现 GPT 模型
cd ch04/01_main-chapter-code
jupyter notebook ch04.ipynb

学习路线

基础篇（第 1-3 章）

理解大语言模型 — LLM 的基本原理和架构全景

2. 文本数据处理 — Tokenizer 实现、BPE 分词、Embedding 层

3. 注意力机制 — 从单头到多头注意力，手写全部实现

# 从零实现简化版多头注意力
import torch
import torch.nn as nn

class MultiHeadAttention(nn.Module):
    def __init__(self, d_in, d_out, num_heads, dropout=0.0):
        super().__init__()
        assert d_out % num_heads == 0
        self.num_heads = num_heads
        self.head_dim = d_out // num_heads
        
        self.W_query = nn.Linear(d_in, d_out, bias=False)
        self.W_key = nn.Linear(d_in, d_out, bias=False)
        self.W_value = nn.Linear(d_in, d_out, bias=False)
        self.out_proj = nn.Linear(d_out, d_out)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        b, num_tokens, d_in = x.shape
        queries = self.W_query(x)
        keys = self.W_key(x)
        values = self.W_value(x)
        
        # 拆分为多头
        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim).transpose(1, 2)
        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim).transpose(1, 2)
        values = values.view(b, num_tokens, self.num_heads, self.head_dim).transpose(1, 2)
        
        # 缩放点积注意力
        attn_scores = queries @ keys.transpose(2, 3)
        attn_weights = torch.softmax(attn_scores / self.head_dim**0.5, dim=-1)
        attn_weights = self.dropout(attn_weights)
        
        context = (attn_weights @ values).transpose(1, 2).contiguous()
        context = context.view(b, num_tokens, -1)
        return self.out_proj(context)

进阶篇（第 4-5 章）

4. 从零实现 GPT 模型 — 完整 GPT 架构，包含 KV Cache、Grouped-Query Attention、MoE

5. 无监督预训练 — 在 Project Gutenberg 语料上训练，实现学习率调度和超参数优化

微调篇（第 6-7 章）

6. 文本分类微调 — 将 GPT 改造为垃圾邮件分类器

7. 指令微调 — 训练模型遵循指令，实现对话能力

# 加载预训练模型进行指令微调
from gpt_instruction_finetuning import load_pretrained_model, InstructionDataset

model = load_pretrained_model("gpt2-medium")
dataset = InstructionDataset("path/to/instruction_data.json")

# 训练循环
from gpt_instruction_finetuning import train_model
train_model(model, dataset, num_epochs=5, learning_rate=5e-5)

扩展实现

项目还附带了多个现代架构的从零实现（在第5章的 bonus 目录中）：

| 架构 | 文件 | 特点 |

|------|------|------|

| Llama 3.2 | ch05/07_gpt_to_llama/standalone-llama32.ipynb | RoPE、SwiGLU、Grouped-Query Attention |

| Qwen 3 | ch05/11_qwen3/ | Dense + MoE 双版本 |

| Gemma 3 | ch05/12_gemma3/ | Sliding Window Attention |

| Gemma 4 E2B/E4B | ch05/17_gemma4/ | 最新架构，支持多模态 |

| Qwen 3.5 | ch05/16_qwen3.5/ | 最新版 Qwen 架构 |

配套资源

书籍：Build a Large Language Model (From Scratch) — Manning 出版社

视频课程：17 小时手敲代码教学

续作：Build A Reasoning Model (From Scratch) — 实现推理时扩展、强化学习、蒸馏

练习题：免费 170 页 PDF，每章约 30 道测试题

为什么值得学？

市面上 LLM 教程要么只讲 API 调用，要么直接端出完整代码跑一遍。这个项目的稀缺性在于 你真的能看懂每一行代码在做什么。注意力机制的矩阵运算、损失函数的梯度回传、分词器的合并逻辑——全部手写且带详尽的 notebook 注释。

如果你真想搞懂 LLM 而不是只会调 transformers 包的 from_pretrained()，这是目前最好的学习路径。

Summary

rasbt/LLMs-from-scratch is the official code repository for Sebastian Raschka's book Build a Large Language Model (From Scratch). With 94,238 stars on GitHub, it's one of the most popular LLM education resources available.

The project takes you through implementing a GPT-like LLM step by step in PyTorch—from tokenization and attention mechanisms to pretraining and instruction finetuning. Unlike most tutorials that just show API calls, this one makes you write every line of code yourself. No GPU required, runs on any laptop.

Key highlights:

7 chapters + 5 appendices covering the entire LLM pipeline

Multi-architecture implementations: Llama 3.2, Qwen 3, Gemma 3/4 from scratch

17-hour companion video course for code-along learning

Sequel book on reasoning models (inference-time scaling, RL, distillation)

Quick start:

git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git
cd LLMs-from-scratch
pip install -r requirements.txt
jupyter notebook ch04/01_main-chapter-code/ch04.ipynb

If you want to genuinely understand how LLMs work instead of just being an API consumer, this is the best path available.

菜单

分享

📖 rasbt/LLMs-from-scratch：94k stars 的 GPT 手写教程，从零实现 ChatGPT 级 LLM

📖 rasbt/LLMs-from-scratch：94k stars 的 GPT 手写教程，从零实现 ChatGPT 级 LLM

项目简介

核心亮点

快速开始

学习路线

基础篇（第 1-3 章）

进阶篇（第 4-5 章）

微调篇（第 6-7 章）

扩展实现

配套资源

为什么值得学？

Summary

评论

🧠 Mem0：55k Stars 的开源 AI 记忆层，pip install 让你的 Agent 不再"转头就忘" / Mem0: 55k Stars Open-Source Memory Layer for AI Agents

🐺 OpenFang：17.5k Stars 的开源 Agent 操作系统，装了它你的 Agent 就自己干活了

🤖 AionUi：25k Stars 的开源 AI 协作桌面，一个 App 管理所有 Coding Agent / AionUi: Free Open-Source Multi-Agent Cowork Desktop

🍒 Cherry Studio：45k Stars 的跨平台 AI 桌面客户端，一个 App 装下所有大模型

⚡ Mastra：23.9k Stars 的 TypeScript AI Agent 框架，Gatsby 团队出品，一行命令搭好生产级 Agent

🎨 Taste Skill：17k Stars 的 Anti-Slop 前端框架，一句命令让 AI 不再生成丑界面

⚡ Agno：40k Stars 的一站式 Agent 平台 SDK，20 行代码搭出生产级 AI 应用

🔥 GenericAgent：11.4k Stars 的自我进化 Agent，3K 行代码长出专属技能树

🎯 Page Agent：17.8k Stars，阿里开源的 JavaScript 页面 GUI Agent，一行代码给你的网页装上 AI

🦌 DeerFlow：ByteDance's 67k Stars SuperAgent Harness，三行命令跑起一个 Agent 团队