LogoThread Easy
  • 探索
  • 線程創作
LogoThread Easy

Twitter 線程的一站式夥伴

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

A good chunk of people misunderstood this tweet btw, which is my bad. I am not suggesting people use the old style promoting techniques of “you are an expert swift programmer” or etc. it’s ok.

A good chunk of people misunderstood this tweet btw, which is my bad. I am not suggesting people use the old style promoting techniques of “you are an expert swift programmer” or etc. it’s ok.

Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.

avatar for Andrej Karpathy
Andrej Karpathy
Tue Dec 09 04:17:51
*the Chinese
failed edit
I'm an opponent of psychedelic cultures in general, they seem ngmi. Agriculturalists drink and build civilizations of unbounded potential, hippies become One With The Universe and live in squalor. Just how it is

*the Chinese failed edit I'm an opponent of psychedelic cultures in general, they seem ngmi. Agriculturalists drink and build civilizations of unbounded potential, hippies become One With The Universe and live in squalor. Just how it is

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Tue Dec 09 04:17:26
[论文解读] DeepSeek-V3.2 技术报告:通过架构创新和高效训练策略,在推理能力和智能体表现上,追平甚至超越同期的顶尖闭源模型,同时大幅降低计算成本

架构突破:DeepSeek 稀疏注意力机制 (DSA)
这是模型最核心的底层创新。传统大模型在处理长文本时,计算量会随着文本长度呈爆炸式增长,导致速度慢且成本高昂。
· 技术原理:DeepSeek 提出了一种“稀疏注意力”机制。不同于以往模型需要“全盘扫描”所有信息,DSA 能够让模型智能地识别并聚焦于关键信息片段,忽略无关的噪音。
· 实际价值:这种机制在保持模型理解能力不下降的前提下,将计算复杂度从几何级数增长降低到了线性水平。简单来说,它让模型在处理海量信息时,既快又准,且显著降低了算力门槛。

训练策略:大规模强化学习与专家蒸馏
为了提升模型的“智商”,特别是逻辑推理和数学解题能力,论文展示了一套全新的训练流程。
· 专家分化与融合:团队并没有直接训练一个全能模型,而是先训练了多个在特定领域(如数学、编程、逻辑推理)达到极致水平的“专家模型”。
· 知识蒸馏:随后利用这些专家模型生成的优质数据,配合大规模强化学习算法,将这些能力“传授”给 DeepSeek-V3.2 主模型。这种“集百家之长”的策略,使得通用模型也能拥有特定领域的深度推理能力。

智能体能力:合成数据构建演练场
针对大模型不仅要“会说话”还要“会做事”(即使用工具、操控软件)的需求,论文提出了一种创新的数据生成方法。
· 模拟演练:团队通过算法合成构建了超过 1800 种复杂的虚拟任务场景,涵盖了从简单的日程安排到复杂的代码调试。
· 强化训练:模型在这些高难度的模拟环境中反复进行“试错-反馈-优化”的训练。这极大增强了模型在现实世界中调用工具、遵循复杂指令的鲁棒性。

性能表现与行业评估
· 顶尖竞赛水平:在 2025 年的国际数学奥林匹克(IMO)和国际信息学奥林匹克(IOI)中,该模型均达到了金牌水准,证明了其在硬核理科领域的深厚功底。
· 比肩闭源巨头:在多项权威基准测试中,其综合推理能力与谷歌的 Gemini-3.0-Pro 持平,并在部分复杂任务上优于 GPT-5。 

阅读论文原文

[论文解读] DeepSeek-V3.2 技术报告:通过架构创新和高效训练策略,在推理能力和智能体表现上,追平甚至超越同期的顶尖闭源模型,同时大幅降低计算成本 架构突破:DeepSeek 稀疏注意力机制 (DSA) 这是模型最核心的底层创新。传统大模型在处理长文本时,计算量会随着文本长度呈爆炸式增长,导致速度慢且成本高昂。 · 技术原理:DeepSeek 提出了一种“稀疏注意力”机制。不同于以往模型需要“全盘扫描”所有信息,DSA 能够让模型智能地识别并聚焦于关键信息片段,忽略无关的噪音。 · 实际价值:这种机制在保持模型理解能力不下降的前提下,将计算复杂度从几何级数增长降低到了线性水平。简单来说,它让模型在处理海量信息时,既快又准,且显著降低了算力门槛。 训练策略:大规模强化学习与专家蒸馏 为了提升模型的“智商”,特别是逻辑推理和数学解题能力,论文展示了一套全新的训练流程。 · 专家分化与融合:团队并没有直接训练一个全能模型,而是先训练了多个在特定领域(如数学、编程、逻辑推理)达到极致水平的“专家模型”。 · 知识蒸馏:随后利用这些专家模型生成的优质数据,配合大规模强化学习算法,将这些能力“传授”给 DeepSeek-V3.2 主模型。这种“集百家之长”的策略,使得通用模型也能拥有特定领域的深度推理能力。 智能体能力:合成数据构建演练场 针对大模型不仅要“会说话”还要“会做事”(即使用工具、操控软件)的需求,论文提出了一种创新的数据生成方法。 · 模拟演练:团队通过算法合成构建了超过 1800 种复杂的虚拟任务场景,涵盖了从简单的日程安排到复杂的代码调试。 · 强化训练:模型在这些高难度的模拟环境中反复进行“试错-反馈-优化”的训练。这极大增强了模型在现实世界中调用工具、遵循复杂指令的鲁棒性。 性能表现与行业评估 · 顶尖竞赛水平:在 2025 年的国际数学奥林匹克(IMO)和国际信息学奥林匹克(IOI)中,该模型均达到了金牌水准,证明了其在硬核理科领域的深厚功底。 · 比肩闭源巨头:在多项权威基准测试中,其综合推理能力与谷歌的 Gemini-3.0-Pro 持平,并在部分复杂任务上优于 GPT-5。 阅读论文原文

邵猛,中年失业程序员 😂 专注 - Context Engineering, AI Agents. 分享 - AI papers, apps and OSS. ex Microsoft MVP 合作 - 私信/邮箱:shaomeng@outlook.com 📢 公众号/小红书: AI 启蒙小伙伴

avatar for meng shao
meng shao
Tue Dec 09 04:06:34
Giant update for me, I thought the China had ≈zero psychedelic culture. It would make sense for Daoists or Buddhists, though. But not for Legalists or Confucians. So they're basically rediscovering it from scratch now.

Giant update for me, I thought the China had ≈zero psychedelic culture. It would make sense for Daoists or Buddhists, though. But not for Legalists or Confucians. So they're basically rediscovering it from scratch now.

*the Chinese failed edit I'm an opponent of psychedelic cultures in general, they seem ngmi. Agriculturalists drink and build civilizations of unbounded potential, hippies become One With The Universe and live in squalor. Just how it is

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Tue Dec 09 04:05:06
It seems to me like Claude Code directly ignores queued messages. Does anyone have better luck with them? For instance, I'll queue up a /command, and it just won't execute it.

It seems to me like Claude Code directly ignores queued messages. Does anyone have better luck with them? For instance, I'll queue up a /command, and it just won't execute it.

modeling language at @allen_ai

avatar for finbarr
finbarr
Tue Dec 09 04:02:57
RT @mmbronstein: NeurIPS 2025 papers per 1 Million People

1. Singapore – 64.51
2. Switzerland – 22.13
3. Israel – 11.17
4. UAE – 9.47
5. U…

RT @mmbronstein: NeurIPS 2025 papers per 1 Million People 1. Singapore – 64.51 2. Switzerland – 22.13 3. Israel – 11.17 4. UAE – 9.47 5. U…

Artificial Intelligence @amazon, @awscloud Reinforcement Learning, OSS AI, Coding Agents, General Purpose Agents All views personal - I only represent myself!

avatar for GDP at NeurIPS 2025
GDP at NeurIPS 2025
Tue Dec 09 04:02:40
  • Previous
  • 1
  • More pages
  • 1227
  • 1228
  • 1229
  • More pages
  • 5634
  • Next