探索 | Thread Easy - 展开 Twitter 线程｜阅读、总结与创作

I gave the Erdos #481 to LLMs, then had them rate proofs and deduce authorship. Ground truth: A = Gemini DeepResearch, B = Gemini 3.0 Preview, C = DeepSeek V3.2 (not Speciale!), D = GPT 5.1, E = Human Everyone prefers E and C Gemini almost nails labels GPT is delusional

It's remarkable that Opus, Gemini and DeepSeek all conclude that Proof C (DeepSeek) is either human-written or indeed from DeepSeek. GPT 5.1 labels it "human" and then assigns both its own output (rating it 2/10!) and actual human proof to itself. of note, Opus on DS-Math V2:

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Wed Dec 03 00:43:11

[论文解读] 从代码基础模型到智能体与应用：代码智能实践指南论文总结了当前最前沿的技术，还手把手地展示了如何从零开始构建和应用代码智能——从基础模型训练一直讲到能够独立写代码的 AI Agents。核心主题：代码智能的“全生命周期”百科全书好比一本 “AI 程序员养成手册”。没有局限于某一个具体算法，而是系统性地梳理了代码大模型从诞生到落地的完整流程： · 数据准备：AI读什么书（如何清洗和筛选高质量代码数据） · 预训练：打基础（如何让模型理解编程语言的语法和逻辑） · 微调：学技能（如何教模型回答编程问题、修 Bug） · 强化学习：精进（如何通过反馈让模型写出的代码质量更高） · 自主智能体：最终形态（如何让 AI 像真正的工程师一样，自主规划、写码、调试、部署）关键看点与对比论文对市面上的两大类“选手”进行了深入的对比评测： · 通用全能型选手：如 GPT-4, Claude, LLaMA。它们什么都懂，写代码也不错。 · 代码专用型选手：如 StarCoder, Code LLaMA, DeepSeek-Coder, QwenCoder。它们专攻编程，往往在特定编程任务上性价比更高。结论是：虽然通用模型很强，但经过专门优化的代码模型在处理复杂工程问题时，往往能提供更精准、更符合开发者习惯的帮助。痛点剖析：学术界 vs 工业界的“代沟” 这是这篇论文最接地气的地方，直接指出了“刷榜分高”不等于“好用”： · 学术界喜欢看 HumanEval 这种简单的算法题跑分（比如“写一个斐波那契数列”）。 · 工业界（真实开发）面对的是：庞大的代码库、复杂的依赖关系、代码安全性、以及如何与现有的开发流集成。 · 论文详细探讨了如何填补这个鸿沟，让AI不仅仅是“做题家”，而是能真正干活的“工程师”。未来趋势：从 “Copilot” 到 “Agent” · 过去/现在：Copilot 模式。你需要一步步告诉 AI “写个函数”、“解释这段代码”，它被动响应。 · 未来：Agent 模式。你只需要说“帮我给登录页面加个验证码功能”，AI 就会自己去阅读现有代码 -> 规划修改方案 -> 写代码 -> 运行测试 -> 修复报错 -> 提交代码。今年具有代表性的工具，如 Github Copilot, Cursor, Trae, Claude Code, OpenAI CodeX 等正在引领这种从“辅助”到“智能体”的转变。论文地址

邵猛，中年失业程序员 😂 专注 - Context Engineering, AI Agents. 分享 - AI papers, apps and OSS. ex Microsoft MVP 合作 - 私信/邮箱：shaomeng@outlook.com 📢 公众号/小红书: AI 启蒙小伙伴

meng shao

Wed Dec 03 00:38:03

My buddy @star13tara (ex-Stripe, Mercury, Plaid, Gusto) just launched The Right Turn—a program for high-achieving women who are done waiting to feel “ready” and want to build their own consulting/ solopreneur business. Check it out: https://t.co/SEgDyg7U79

Deeply researched product, growth, and career advice

Lenny Rachitsky

Wed Dec 03 00:37:08

Want to get a weekly curated list of top GitHub repos and similar posts like this? Join our newsletter and get them straight to your inbox 👇 https://t.co/fIQKe7W5O3

We're sharing/showcasing best of @github projects/repos. Follow to stay in loop. Promoting Open-Source Contributions. UNOFFICIAL, but followed by github

GitHub Projects Community

Wed Dec 03 00:31:16

Mistral AI 发布 Mistral 3 系列开源模型，从 3B、8B、14B 小规模到 675B MoE Large 版本 Mistral Large 3 - Mistral 目前最强的模型 · 架构：采用 MoE 架构，总参数高达 675B（激活参数 41B），这是一种在保证极高性能的同时兼顾推理效率的先进设计。 · 能力：在多语言对话、图像理解（多模态）以及通用指令遵循上达到了目前开源权重的顶尖水平。 · 亮点：虽然是巨型模型，但通过与 NVIDIA 等伙伴的合作，它被优化得更易于部署（支持单节点运行）。 Ministral 3 系列 - 高性价比、端侧部署 · 定位：主打“高性价比”和“端侧部署”（如笔记本电脑、甚至机器人）。 · 规格：包含 3B、8B、14B 三种参数规模。 · 特色：尽管体积小，但它们同样具备多模态（看图）能力，并专门发布了推理版本。例如 14B 的推理版在数学竞赛级测试（AIME '25）中准确率达到了 85%，这对于小模型来说是非常惊人的。技术亮点与趋势解读 · 全面拥抱“多模态”与“多语言”： Mistral 3 的所有模型都原生支持图像理解，不再局限于纯文本。同时，官方特别强调了其在非英语（尤其是多语言环境）下的卓越表现，这对于全球化业务非常关键。 · 推理能力下放：通常只有超大模型才具备深度推理能力（如类似 OpenAI o1 的思维链），但 Mistral 将这种能力下放到了 Ministral 3 这样的小模型上。这意味着在很多专业场景下，我们不再必须依赖昂贵的云端大模型。 · 生态系统的深度优化： Mistral 并没有“管杀不管埋”，而是联合了 NVIDIA、Red Hat 和 vLLM 等基础设施巨头进行深度适配。例如，他们发布了专门优化的检查点，使得这些大模型可以在更少的硬件资源上跑得更快。

邵猛，中年失业程序员 😂 专注 - Context Engineering, AI Agents. 分享 - AI papers, apps and OSS. ex Microsoft MVP 合作 - 私信/邮箱：shaomeng@outlook.com 📢 公众号/小红书: AI 启蒙小伙伴

meng shao

Wed Dec 03 00:29:26

这是一个有不错前景的可商业化项目，即使有对Crypto不敢兴趣的。这期课程的定时任务、一些开发技巧和避坑点，同样值得你学习这期视频实操不难，但是建议大家看完supabase和n8n那两期课程，对于Edge Function有所了解后再看。同时有一些专业术语，如果大家感到陌生的可以问AI。课程购买链接：https://t.co/9ftg2pTRTj

我的AI编程课(https://t.co/HVZn3ItASW) |B站up主 | 分享创造 + 无限迭代ing

熠辉 Indie

Wed Dec 03 00:29:02

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

I gave the Erdos #481 to LLMs, then had them rate proofs and deduce authorship. Ground truth: A = Gemini DeepResearch, B = Gemini 3.0 Preview, C = DeepSeek V3.2 (not Speciale!), D = GPT 5.1, E = Human Everyone prefers E and C Gemini almost nails labels GPT is delusional

My buddy @star13tara (ex-Stripe, Mercury, Plaid, Gusto) just launched The Right Turn—a program for high-achieving women who are done waiting to feel “ready” and want to build their own consulting/ solopreneur business. Check it out: https://t.co/SEgDyg7U79

Want to get a weekly curated list of top GitHub repos and similar posts like this? Join our newsletter and get them straight to your inbox 👇 https://t.co/fIQKe7W5O3

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

I gave the Erdos #481 to LLMs, then had them rate proofs and deduce authorship. Ground truth: A = Gemini DeepResearch, B = Gemini 3.0 Preview, C = DeepSeek V3.2 (*not* Speciale!), D = GPT 5.1, E = Human Everyone prefers E and C Gemini *almost* nails labels GPT is delusional

My buddy @star13tara (ex-Stripe, Mercury, Plaid, Gusto) just launched The Right Turn—a program for high-achieving women who are done waiting to feel “ready” and want to build their own consulting/ solopreneur business. Check it out: https://t.co/SEgDyg7U79

Want to get a weekly curated list of top GitHub repos and similar posts like this? Join our newsletter and get them straight to your inbox 👇 https://t.co/fIQKe7W5O3

I gave the Erdos #481 to LLMs, then had them rate proofs and deduce authorship. Ground truth: A = Gemini DeepResearch, B = Gemini 3.0 Preview, C = DeepSeek V3.2 (not Speciale!), D = GPT 5.1, E = Human Everyone prefers E and C Gemini almost nails labels GPT is delusional