LogoThread Easy
  • Explorer
  • Composer un thread
LogoThread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

I think Confucianism would get more respect if people understood that Confucius was a gigachad who advanced a prosocial version of vitalist Nietzscheanism in almost @bronzeagemantis style, MOGGING people into virtue, whereas "based" Legalists were bugmen terrified of excellence.

I think Confucianism would get more respect if people understood that Confucius was a gigachad who advanced a prosocial version of vitalist Nietzscheanism in almost @bronzeagemantis style, MOGGING people into virtue, whereas "based" Legalists were bugmen terrified of excellence.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Wed Dec 17 03:42:00
@heyglif Also - this is part of "25 Days of Consumer AI" 

@omooretweets and I are featuring some incredible products this month. 

Follow along, there are fourteen presents still to be unwrapped 👀

@heyglif Also - this is part of "25 Days of Consumer AI" @omooretweets and I are featuring some incredible products this month. Follow along, there are fourteen presents still to be unwrapped 👀

Partner @a16z AI 🤖 and twin to @omooretweets | Investor in @elevenlabsio, @krea_ai, @bfl_ml, @hedra_labs, @wabi, @WaveFormsAI, @ViggleAI, @MireloAI

avatar for Justine Moore
Justine Moore
Wed Dec 17 03:41:05
Even if information is somehow the fundamental currency of reality, that information never comes without a cost. The "bit" and the "it" are inextricably linked. You cannot have a "bit" without a physical substrate to hold it, and you cannot update that "bit" without a physical interaction.

Even if information is somehow the fundamental currency of reality, that information never comes without a cost. The "bit" and the "it" are inextricably linked. You cannot have a "bit" without a physical substrate to hold it, and you cannot update that "bit" without a physical interaction.

https://t.co/l5MgIkakoz

avatar for Michael Frank Martin
Michael Frank Martin
Wed Dec 17 03:31:26
two exciting patterns (among many) in agent engineering

1. Specialized capabilities distributed via Skills and SubAgents allow companies to pick one problem in agent building and go HAM on it.  Agentic local search and web search are early leaders here

2. Open harnesses, let you inspect, edit, and generally fully customize the operating env for your agent.  This makes it easy to plug in 1) to make your agent better.

two exciting patterns (among many) in agent engineering 1. Specialized capabilities distributed via Skills and SubAgents allow companies to pick one problem in agent building and go HAM on it. Agentic local search and web search are early leaders here 2. Open harnesses, let you inspect, edit, and generally fully customize the operating env for your agent. This makes it easy to plug in 1) to make your agent better.

agents, harnesses, and evals @LangChainAI, prev @awscloud, phd cs @ temple

avatar for Viv
Viv
Wed Dec 17 03:28:30
从 Deepseek 离职之后加入小米的罗福莉也注册了推特,看来新模型是她主导的

介绍了一下昨晚小米发布的 MiMo‑V2‑Flash 模型技术细节

架构:采用 Hybrid SWA(混合可加权注意力)。在长上下文推理上优于其他线性注意力方案,且固定 KV cache 更适配当前基础设施。窗口大小以 128 最佳;512 反而降性能;“sink values”必须保留,不能省略。

MTP(多 token 预测):对高效 RL 很关键。除首层外只需很少微调即可拿到较高 accept length。3 层 MTP在编码任务上实现 >3 的 accept length 和约 2.5×速度提升,能解决小批量 On‑Policy RL 长尾样本导致的 GPU 空闲问题。本次因时间未并入 RL 回路,但非常契合;3 层 MTP已开源,便于社区开发。

MOPD 后训练:采用 Thinking Machine 的 On‑Policy Distillation,将多个 RL 模型融合,效率收益显著。相较标准 SFT+RL 流程,计算量降到不足 1/50 仍可匹配教师模型表现,并显露出“学生自我强化为更强教师”的演进路径。

强调务实工程与产线友好。Hybrid SWA + 固定 KV cache 提高长上下文与部署效率;MTP 带来训练/推理并行收益;MOPD 以极低算力复刻/融合 RL 能力。

从 Deepseek 离职之后加入小米的罗福莉也注册了推特,看来新模型是她主导的 介绍了一下昨晚小米发布的 MiMo‑V2‑Flash 模型技术细节 架构:采用 Hybrid SWA(混合可加权注意力)。在长上下文推理上优于其他线性注意力方案,且固定 KV cache 更适配当前基础设施。窗口大小以 128 最佳;512 反而降性能;“sink values”必须保留,不能省略。 MTP(多 token 预测):对高效 RL 很关键。除首层外只需很少微调即可拿到较高 accept length。3 层 MTP在编码任务上实现 >3 的 accept length 和约 2.5×速度提升,能解决小批量 On‑Policy RL 长尾样本导致的 GPU 空闲问题。本次因时间未并入 RL 回路,但非常契合;3 层 MTP已开源,便于社区开发。 MOPD 后训练:采用 Thinking Machine 的 On‑Policy Distillation,将多个 RL 模型融合,效率收益显著。相较标准 SFT+RL 流程,计算量降到不足 1/50 仍可匹配教师模型表现,并显露出“学生自我强化为更强教师”的演进路径。 强调务实工程与产线友好。Hybrid SWA + 固定 KV cache 提高长上下文与部署效率;MTP 带来训练/推理并行收益;MOPD 以极低算力复刻/融合 RL 能力。

关注人工智能、LLM 、 AI 图像视频和设计(Interested in AI, LLM, Stable Diffusion, and design) AIGC 周刊主理人|公众号:歸藏的AI工具箱

avatar for 歸藏(guizang.ai)
歸藏(guizang.ai)
Wed Dec 17 03:22:46
Here is a quick demo of pasting the GitHub code snippets on my note app ✨
You can quickly paste code snippets from GitHub links to your Markdown notes😎

Here is a quick demo of pasting the GitHub code snippets on my note app ✨ You can quickly paste code snippets from GitHub links to your Markdown notes😎

Check out Inkdrop v6 canary https://t.co/vVwVTILQ4C

avatar for Takuya 🐾 devaslife
Takuya 🐾 devaslife
Wed Dec 17 03:20:53
  • Previous
  • 1
  • More pages
  • 556
  • 557
  • 558
  • More pages
  • 5634
  • Next