The Great Debate 大辩论

Three Roads to AGI

通往AGI的三条路 — 谁对谁错？

Three of the world's greatest AI minds disagree fundamentally on how to build Artificial General Intelligence. One bets on learning from action. One bets on understanding the world. One bets on mastering language. Only one can be the primary path — or perhaps all three are needed.

世界上三位最伟大的AI头脑在如何构建通用人工智能上存在根本分歧。一位押注于行动中学习，一位押注于理解世界，一位押注于掌握语言。只有一条能成为主路——或者三条都不可或缺。

■ Reinforcement Learning

■ World Models

■ Language Models

Road One

Reinforcement Learning

强化学习 — 在行动中学习智能

Champion: Demis Hassabis (Google DeepMind)

Core Belief 核心信念

Intelligence emerges from interaction with environments. An agent learns by trying actions, receiving rewards, and refining its strategy. Language is just one interface — the real intelligence is in the ability to act, plan, and achieve goals in complex environments.

智能从与环境的交互中涌现。代理通过尝试行动、接收奖励、优化策略来学习。语言只是一个接口——真正的智能在于在复杂环境中行动、规划和实现目标的能力。

Method 方法

Build agents that learn from scratch through trial and error in increasingly complex domains. Start with games (AlphaGo), move to science (AlphaFold), then generalize. Combine RL with neural networks, search, and planning. The agent discovers strategies no human ever conceived.

构建通过试错在日益复杂的领域中从零学习的代理。从游戏开始（AlphaGo），转向科学（AlphaFold），然后泛化。结合RL与神经网络、搜索和规划。代理发现人类从未构想过的策略。

"Intelligence is the ability to achieve goals in a wide range of environments."

"智能是在广泛环境中实现目标的能力。"

— Demis Hassabis

Strengths 优势

Proven superhuman performance: AlphaGo defeated world champion (2016), AlphaFold solved protein folding (2020), AlphaGeometry proved math theorems (2024). RL discovers genuinely novel solutions, not just recombinations of training data.

已证明的超人表现：AlphaGo击败世界冠军（2016），AlphaFold解决蛋白质折叠（2020），AlphaGeometry证明数学定理（2024）。RL发现真正新颖的解决方案，而非训练数据的重组。

Weaknesses 弱点

Extremely sample-inefficient — needs millions of trials. Reward function design is an unsolved problem (reward hacking). Doesn't transfer well across domains. AlphaGo can't make breakfast. Each new domain requires a new agent trained from scratch.

极其样本低效——需要数百万次试验。奖励函数设计是未解决的问题（奖励入侵）。跨领域迁移能力差。AlphaGo不会做早餐。每个新领域需要从零训练新代理。

AlphaGo (2016) AlphaZero (2017) AlphaFold (2020) AlphaCode (2022) AlphaGeometry (2024) Gemini + RL (2025)

Road Two

World Models

世界模型 — 在头脑中模拟现实

Champion: Yann LeCun (Meta AI / NYU)

Core Belief 核心信念

Intelligence = building an internal model of how the world works, then using it to predict, plan, and imagine. A baby learns more about physics in 6 months than any LLM learns from the entire internet. Language is a thin layer on top of deep world understanding.

智能=构建世界运作的内部模型，然后用它来预测、规划和想象。婴儿在6个月内学到的物理知识比任何LLM从整个互联网学到的都多。语言只是深层世界理解之上的薄层。

Method 方法

JEPA (Joint Embedding Predictive Architecture) — learn representations by predicting the future state of the world from sensory input, not by generating pixels. Self-supervised learning from video and multi-modal data. Build an "inner simulator" that can imagine consequences before acting.

JEPA（联合嵌入预测架构）——通过从感觉输入预测世界的未来状态来学习表征，而非生成像素。从视频和多模态数据进行自监督学习。构建一个能在行动前想象后果的"内部模拟器"。

"Large language models are just playing with words. They have no understanding of the physical world. A cat understands more about physics than any LLM."

"大语言模型只是在玩文字游戏。它们对物理世界没有理解。一只猫对物理学的理解超过任何LLM。"

— Yann LeCun

Strengths 优势

Biologically plausible — this is how animal brains actually work. Sample-efficient learning (babies learn from very few examples). Grounded in physical reality. Could enable common sense, intuitive physics, and the kind of understanding LLMs clearly lack.

生物学上合理——这是动物大脑实际运作的方式。样本高效学习（婴儿从极少的例子中学习）。扎根于物理现实。能实现常识、直觉物理和LLM明显缺乏的那种理解。

Weaknesses 弱点

Still largely theoretical — no world model has achieved anything close to LLM-level capabilities. JEPA papers show promise but no breakthrough product. The gap between vision and execution is enormous. LeCun has been predicting this for years without a landmark demonstration.

仍然大部分是理论性的——没有世界模型达到接近LLM级别的能力。JEPA论文展示了前景但没有突破性产品。愿景和执行之间的差距巨大。LeCun多年来一直预测这个但没有标志性的示范。

V-JEPA (2024) I-JEPA (2023) Self-supervised vision Video prediction Future: embodied AI

III

Road Three

Language Models

语言模型 — 语言即世界，语言即智能

Champions: Sam Altman (OpenAI) & Dario Amodei (Anthropic)

Core Belief 核心信念

Language is not just an interface — it IS intelligence. Only humans have language. Only humans have advanced intelligence. Language compresses all human knowledge, reasoning, and understanding into tokens. Mastering language is mastering thought itself.

语言不仅仅是接口——它就是智能。只有人类拥有语言。只有人类拥有高级智能。语言将所有人类知识、推理和理解压缩为token。掌握语言就是掌握思维本身。

Method 方法

Scale transformer models on internet-scale text. Add RLHF for alignment. Add chain-of-thought for reasoning. Add tool use for acting in the world. Add vision and audio for multimodality. The "scaling hypothesis" — intelligence emerges from sufficient scale and data.

在互联网规模的文本上缩放Transformer模型。添加RLHF进行对齐。添加思维链进行推理。添加工具使用在世界中行动。添加视觉和音频实现多模态。"缩放假说"——智能从足够的规模和数据中涌现。

"Language is the most compressed representation of intelligence that exists. When you predict the next word well enough, you are forced to model reality."

"语言是存在的最压缩的智能表征。当你足够好地预测下一个词时，你被迫建模现实。"

— Ilya Sutskever

Strengths 优势

Actually works TODAY. GPT-4, Claude, Gemini are the most capable AI systems ever built. Passed the bar exam, medical licensing, PhD-level science. Generates code, writes poetry, reasons about ethics. Hundreds of millions of users. Billions in revenue. The only approach with real-world traction.

今天确实有效。GPT-4、Claude、Gemini是有史以来最强大的AI系统。通过了律师考试、医学执照、博士级科学。生成代码、写诗、进行伦理推理。数亿用户。数十亿收入。唯一具有真实世界牵引力的方法。

Weaknesses 弱点

Hallucinations. No physical grounding. No survival instinct. No drives. You can tell an LLM to make money — it will write text about making money but doesn't want money. A cockroach wants to survive more than any LLM wants anything. Is prediction really understanding?

幻觉。没有物理基础。没有生存本能。没有驱动力。你可以告诉LLM赚钱——它会写关于赚钱的文字但不"想要"钱。一只蟑螂比任何LLM更想生存。预测真的是理解吗？

GPT-3 (2020) ChatGPT (2022) GPT-4 (2023) Claude 3 (2024) o3 reasoning (2025) Claude 4 (2025)

The Deeper Question 更深层的问题

What All Three Are Missing

三条路都缺失了什么 — 生存本能、具身性与驱动力

Even cats and dogs know how to strive for survival — it's written into their genes. Large language models cannot learn this. Even if you prompt them to make money, they appear indifferent. To them, it's all just wordplay. Even if you shut them down, they feel no emotion.

即使猫和狗也知道如何为生存而努力——这写在它们的基因里。大语言模型学不会这一点。即使你提示它们赚钱，它们仍然显得漠不关心。对它们来说，一切都只是文字游戏。即使你关闭它们，它们也毫无情感。

🧠 Drives & Survival Instinct

500 million years of evolution gave animals hunger, fear, desire. No AI system has intrinsic motivation. Without drives, there's no genuine agency — only the appearance of it.

5亿年的进化赋予动物饥饿、恐惧、欲望。没有AI系统具有内在动机。没有驱动力，就没有真正的能动性——只有它的表象。

🌎 Embodiment

Intelligence evolved in bodies that move through physical space. Touch, proprioception, pain, pleasure — these aren't extras, they may be prerequisites. A brain in a vat may never truly understand.

智能在穿越物理空间的身体中进化。触觉、本体感受、疼痛、快乐——这些不是附加品，可能是前提。缸中之脑可能永远无法真正理解。

⏰ Temporal Experience

Humans experience time. We remember the past, fear the future, feel urgency. LLMs have no sense of time passing. Each prompt is stateless. Without temporal continuity, can there be consciousness?

人类体验时间。我们记住过去，畏惧未来，感受紧迫。LLM没有时间流逝的感觉。每次提示都是无状态的。没有时间连续性，能有意识吗？

💜 Subjective Experience

The Hard Problem: why does physical processing give rise to "something it is like" to be conscious? None of the three roads even attempts to answer this. Perhaps AGI doesn't require consciousness — but perhaps it does.

困难问题：为什么物理处理产生了"作为某物的感觉"的意识？三条路都没有试图回答这个问题。也许AGI不需要意识——但也许需要。

Side by Side 对照表

The Three Roads Compared

三条路线对比

Dimension 维度	RL (Hassabis)	World Models (LeCun)	Language (Altman/Amodei)
Core idea核心思想	Learn by doing	Learn by observing	Learn by reading
Analogy类比	An athlete training	A baby watching the world	A scholar reading every book
Data source数据来源	Environment interaction	Video, sensory streams	Text (internet-scale)
Grounding基础	Action in environments	Physical world perception	Linguistic abstraction
Best result最佳成果	AlphaFold (solved proteins)	V-JEPA (early research)	GPT-4 / Claude (general use)
Users today当前用户	Researchers only	Researchers only	Hundreds of millions
Survival instinct生存本能	Reward-shaped (artificial)	Not addressed	None
Language role语言角色	One of many interfaces	Thin surface layer	The core of intelligence
Biggest bet最大赌注	Generalization across domains	JEPA architecture works	Scale is all you need
Risk风险	May never generalize	May never leave the lab	May hit a ceiling without grounding

Synthesis 综合

Perhaps AGI Needs All Three

也许AGI需要三者兼备

Language captures symbolic reasoning. World models capture physical intuition. Reinforcement learning captures goal-directed behavior. The human brain does all three — plus something we haven't identified yet. Perhaps embodiment. Perhaps drives. Perhaps consciousness itself.

语言捕获符号推理。世界模型捕获物理直觉。强化学习捕获目标导向行为。人脑三者兼具——再加上我们尚未识别的某些东西。也许是具身性，也许是驱动力，也许是意识本身。