LogoThread Easy
  • 탐색
  • 스레드 작성
LogoThread Easy

트위터 스레드의 올인원 파트너

© 2025 Thread Easy All Rights Reserved.

탐색

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

I strongly condemn dunking on Prime Intellect, they're doing the exact right thing.

Post-training Chinese base models to the frontier level is in fact *more important* right now than learning to pretrain our own bases. I basically don't care what PI, Arcee and others can pretrain, though I have reasonable expectations that they'll catch up soon. Compute is abundant in the West and we already see evidence of sufficient pretraining expertise with smaller models (these two + @ZyphraAI, @Dorialexander, @natolambert with Olmo…) in the Western open space; by all accounts it scales. But that's mostly of… geopolitical significance, of what you guys will be allowed to run on your patriotic servers plugged into agentic frameworks. I'm not Western nor Chinese, and contrary to my posting, I don't care terminally about this dimension, it's a purely instrumental issue. Consult the bio: the race is not between the US/West and China, it's between humans and AGIs vs ape power centralization. And Prime Intellect is doing more than anyone to arrest the centralizing drive.

Consider and weep: HF is chock full of Celestial gifts that we're too inept to utilize, they just rot there until they become obsolete. Thousands to millions of downloads and nothing to show. Why is Qwen even doing antiquated, very expensive Llama-like dense models in the first place? Mostly because a) Alibaba has a KPI "monthly HF downloads" and b) academics and small labs can't figure out how to finetune modern architectures. Even were the infrastructure more mature and they less technically ngmi, what do they finetune it on? The narrative peak of open source finetuning was Nous-Hermes, and that paradigm was basically just distilling GPT-4, filtering according to "taste" and vague criteria, SFTing over a strong base, and hoping for the best. That angle of attack was scornfully dismissed in advance by OpenAI et al as a non-threatening dead end that rewards hallucinations and style mimicking, and it predictably fizzled out. What next, «RL»? What RL, how RL, what is the signal generator, how does it intersect with downstream tasks? Kimi-K2, an immaculate frontier-level base, has been available to all for many months. DeepSeek-V3, nearly a year now. V2, well over a year. Dozens of models in all sizes, periodically updated with longer context and other boons. And what have we built with all that? 
Anything that even approaches Chinese in-house Instructs, nevermind contemporary frontier? Hello? Can you point me to these derivatives? It's a complete profanation of the idea of open science. And not even the Chinese bother, they all just train their own models from scratch. I can think of a tiny number of exceptions (eg Rednote making DSV3-VL), but none of them made a big splash. Startups worth billions, whose moat is search or agentic coding and thus large post-training datasets, sneakily use DS/GLM/Qwen in their proprietary products, but they don't share alpha. That's… about it.

Enter Prime Intellect. They're solving training. They're solving environment generation. They're thinking in a principled manner about signals that shape general model cognition. They are, in effect, unlocking the immense store of inert value that had been accumulated. For the world, this is so much more than another me-too model. They're scary smart, they have good intentions, they've got a solid roadmap, and they're my friends. I won't stand for pooh-poohing their work, because it serves the Great Common Task. If you don't see it, you don't have a clue of what's really important at this stage.

I strongly condemn dunking on Prime Intellect, they're doing the exact right thing. Post-training Chinese base models to the frontier level is in fact *more important* right now than learning to pretrain our own bases. I basically don't care what PI, Arcee and others can pretrain, though I have reasonable expectations that they'll catch up soon. Compute is abundant in the West and we already see evidence of sufficient pretraining expertise with smaller models (these two + @ZyphraAI, @Dorialexander, @natolambert with Olmo…) in the Western open space; by all accounts it scales. But that's mostly of… geopolitical significance, of what you guys will be allowed to run on your patriotic servers plugged into agentic frameworks. I'm not Western nor Chinese, and contrary to my posting, I don't care terminally about this dimension, it's a purely instrumental issue. Consult the bio: the race is not between the US/West and China, it's between humans and AGIs vs ape power centralization. And Prime Intellect is doing more than anyone to arrest the centralizing drive. Consider and weep: HF is chock full of Celestial gifts that we're too inept to utilize, they just rot there until they become obsolete. Thousands to millions of downloads and nothing to show. Why is Qwen even doing antiquated, very expensive Llama-like dense models in the first place? Mostly because a) Alibaba has a KPI "monthly HF downloads" and b) academics and small labs can't figure out how to finetune modern architectures. Even were the infrastructure more mature and they less technically ngmi, what do they finetune it on? The narrative peak of open source finetuning was Nous-Hermes, and that paradigm was basically just distilling GPT-4, filtering according to "taste" and vague criteria, SFTing over a strong base, and hoping for the best. That angle of attack was scornfully dismissed in advance by OpenAI et al as a non-threatening dead end that rewards hallucinations and style mimicking, and it predictably fizzled out. What next, «RL»? What RL, how RL, what is the signal generator, how does it intersect with downstream tasks? Kimi-K2, an immaculate frontier-level base, has been available to all for many months. DeepSeek-V3, nearly a year now. V2, well over a year. Dozens of models in all sizes, periodically updated with longer context and other boons. And what have we built with all that? Anything that even approaches Chinese in-house Instructs, nevermind contemporary frontier? Hello? Can you point me to these derivatives? It's a complete profanation of the idea of open science. And not even the Chinese bother, they all just train their own models from scratch. I can think of a tiny number of exceptions (eg Rednote making DSV3-VL), but none of them made a big splash. Startups worth billions, whose moat is search or agentic coding and thus large post-training datasets, sneakily use DS/GLM/Qwen in their proprietary products, but they don't share alpha. That's… about it. Enter Prime Intellect. They're solving training. They're solving environment generation. They're thinking in a principled manner about signals that shape general model cognition. They are, in effect, unlocking the immense store of inert value that had been accumulated. For the world, this is so much more than another me-too model. They're scary smart, they have good intentions, they've got a solid roadmap, and they're my friends. I won't stand for pooh-poohing their work, because it serves the Great Common Task. If you don't see it, you don't have a clue of what's really important at this stage.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Fri Nov 28 00:51:02
RT @gumroad: Gumroad creators: Drop your Black Friday deals below. And if you set up a deal with the BLACKFRIDAY2025 discount code, we'll f…

RT @gumroad: Gumroad creators: Drop your Black Friday deals below. And if you set up a deal with the BLACKFRIDAY2025 discount code, we'll f…

Father. Formerly @Gumroad. Working on something old.

avatar for Sahil Lavingia
Sahil Lavingia
Fri Nov 28 00:49:13
hmmm except the music which i can keep permanently in the background anyway, this can be implemented relatively easily.

they should definitely add voice chat in codex and claude code.

hmmm except the music which i can keep permanently in the background anyway, this can be implemented relatively easily. they should definitely add voice chat in codex and claude code.

making models learn • eXperiments lab • memes and training lores

avatar for tokenbender
tokenbender
Fri Nov 28 00:47:39
realising i don't want to look at anywhere except just the file and interact purely by voice, not just voice type.

and if the model goes for interleaved reasoning or web search, it should be playing something classical while i wait.

realising i don't want to look at anywhere except just the file and interact purely by voice, not just voice type. and if the model goes for interleaved reasoning or web search, it should be playing something classical while i wait.

hmmm except the music which i can keep permanently in the background anyway, this can be implemented relatively easily. they should definitely add voice chat in codex and claude code.

avatar for tokenbender
tokenbender
Fri Nov 28 00:42:27
NVIDIA 官方回应:祝贺 Google AI 进步,同时强调自身领先地位

Google 在 AI 领域的快速推进(尤其 Gemini 3 模型和 TPU 芯片优化),引发市场对 NVIDIA 主导地位的讨论。NVIDIA 以积极却自信的口吻回应,表面上赞扬对手,实则重申其 GPU 平台的无可匹敌优势。

对 Google 的致敬:NVIDIA 开篇表达“欣喜”(delighted),认可 Google 在 AI 上的“巨大进步”(great advances),并强调双方持续合作—— NVIDIA 仍为 Google 供应硬件。这显示出 NVIDIA 的战略成熟:不搞零和对抗,而是定位为生态伙伴,避免被视为“垄断者”。
  
NVIDIA 的核心优势:核心是宣示 “NVIDIA 领先行业整整一代”(a generation ahead)。其 GPU 平台是唯一能“运行所有 AI 模型,并在所有计算环境中部署”(runs every AI model and does it everywhere computing is done)的解决方案。相比之下,ASIC(专用集成电路,如 Google 的 TPU)虽针对特定 AI 框架或任务优化,但缺乏通用性。

性能对比:NVIDIA 突出其产品在“性能”(performance)、“多功能性”(versatility)和“可互换性”(fungibility)上的全面领先。ASIC 虽高效,但“专为特定用途设计”,易受模型迭代或框架变化影响,导致灵活性不足。这在 AI 训练/推理场景中至关重要,尤其当下模型多样化(如从 Transformer 到多模态)。

看完后的感受:GPU 是更通用的架构,对规模、用途的应用更广,个人也能用、超级大厂集群也能用;TPU 是 Google 专门做过系统和架构、工具链优化的,对大规模集群的性能优化更好,不过小量用户用不起来,像 Deepmind 和 Anthropic 这种体量才能体现优势。
所以感觉 GPU 和 TPU 不是直接的硬件销售竞争,TPU 会以 Google Cloud 对外提供,云端算力的竞争。

NVIDIA 官方回应:祝贺 Google AI 进步,同时强调自身领先地位 Google 在 AI 领域的快速推进(尤其 Gemini 3 模型和 TPU 芯片优化),引发市场对 NVIDIA 主导地位的讨论。NVIDIA 以积极却自信的口吻回应,表面上赞扬对手,实则重申其 GPU 平台的无可匹敌优势。 对 Google 的致敬:NVIDIA 开篇表达“欣喜”(delighted),认可 Google 在 AI 上的“巨大进步”(great advances),并强调双方持续合作—— NVIDIA 仍为 Google 供应硬件。这显示出 NVIDIA 的战略成熟:不搞零和对抗,而是定位为生态伙伴,避免被视为“垄断者”。 NVIDIA 的核心优势:核心是宣示 “NVIDIA 领先行业整整一代”(a generation ahead)。其 GPU 平台是唯一能“运行所有 AI 模型,并在所有计算环境中部署”(runs every AI model and does it everywhere computing is done)的解决方案。相比之下,ASIC(专用集成电路,如 Google 的 TPU)虽针对特定 AI 框架或任务优化,但缺乏通用性。 性能对比:NVIDIA 突出其产品在“性能”(performance)、“多功能性”(versatility)和“可互换性”(fungibility)上的全面领先。ASIC 虽高效,但“专为特定用途设计”,易受模型迭代或框架变化影响,导致灵活性不足。这在 AI 训练/推理场景中至关重要,尤其当下模型多样化(如从 Transformer 到多模态)。 看完后的感受:GPU 是更通用的架构,对规模、用途的应用更广,个人也能用、超级大厂集群也能用;TPU 是 Google 专门做过系统和架构、工具链优化的,对大规模集群的性能优化更好,不过小量用户用不起来,像 Deepmind 和 Anthropic 这种体量才能体现优势。 所以感觉 GPU 和 TPU 不是直接的硬件销售竞争,TPU 会以 Google Cloud 对外提供,云端算力的竞争。

邵猛,中年失业程序员 😂 专注 - Context Engineering, AI Agents. 分享 - AI papers, apps and OSS. ex Microsoft MVP 合作 - 私信/邮箱:shaomeng@outlook.com 📢 公众号/小红书: AI 启蒙小伙伴

avatar for meng shao
meng shao
Fri Nov 28 00:41:50
感谢立青精心制作的分享!!
还有好多细节需要优化,下一步会先把X登陆后自动提取推文生成文风先处理了!

Copilot现在是一步到位的
但我更希望它能成为一个内容老师去启发创作,而不是直接改写成没有人格的冰冷ai文字

继续优化💪

感谢立青精心制作的分享!! 还有好多细节需要优化,下一步会先把X登陆后自动提取推文生成文风先处理了! Copilot现在是一步到位的 但我更希望它能成为一个内容老师去启发创作,而不是直接改写成没有人格的冰冷ai文字 继续优化💪

Believing is seeing

avatar for Yangyi
Yangyi
Fri Nov 28 00:39:25
  • Previous
  • 1
  • More pages
  • 2164
  • 2165
  • 2166
  • More pages
  • 5634
  • Next