LogoThread Easy
  • 探索
  • 線程創作
LogoThread Easy

Twitter 線程的一站式夥伴

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

RT @_xjdr: # Why Training MoEs is So Hard

recently, i have found myself wanting a small, research focused training repo
that i can do smal…

RT @_xjdr: # Why Training MoEs is So Hard recently, i have found myself wanting a small, research focused training repo that i can do smal…

ai agents @hud_evals | owned @AIHubCentral (1 million users,acq.) ex climate protester🦦 dont do the deferred life plan

avatar for Minh Nguyen✈️NeurIPS
Minh Nguyen✈️NeurIPS
Sun Dec 07 00:36:15
Very high alpha writeup on Modern MoE training
it also goes to show how valuable it is to use a large cluster (duh), if you can get your parallelisms working. Many problems become easier to solve at scale, it's not just about turning tokens to gradients faster

Very high alpha writeup on Modern MoE training it also goes to show how valuable it is to use a large cluster (duh), if you can get your parallelisms working. Many problems become easier to solve at scale, it's not just about turning tokens to gradients faster

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Sun Dec 07 00:33:16
[开源教程] 开源模型 + 成熟 Agent 框架 + 工具 => 复刻 Claude Code 级 AI Agent

关键技术组成
· 开源模型:DeepSeek-V3.2
· 成熟 Agent 框架:Claude Agents SDK
· 工具和数据:MongoDB MCP Server
整体架构:模型 → Agents SDK → MongoDB 工具 → 数据库,实现闭环交互。

项目核心理念:三大技术的“强强联合”
构建一个能听懂人话、能自动操作数据库的智能体,融合了三项技术:
1. 大脑 —— DeepSeek v3.2:
换脑操作:通过修改 API Base URL,让 Claude Agents SDK 误以为自己在调用 Claude 模型,实际上调用的是 DeepSeek v3.2,这也成为 OpenAI API 之后 LLM API 的标配操作。

2. 骨架 —— Claude Agents SDK:
选择原因:没有选择 LangChain 或 OpenAI SDK,原因是 Claude Agents SDK 提供了构建复杂 Agent 所需的成熟“脚手架”(如子智能体管理、MCP 支持等),这些是驱动 Claude Code 的核心技术。

3. 手眼 —— MongoDB MCP Server:
技术点:采用 MCP 协议,通过 MongoDB 的 MCP 服务器,AI 可以标准化地执行查询、分析 Schema、甚至写入数据,而不需要复杂的胶水代码。

架构精髓:用“子智能体”对抗“脑雾”
教程中最具技术深度的部分。作者提出了一个关键问题:Context Rot。即使模型宣称支持 200k+ 的上下文,一旦输入过多信息,模型就会变笨、混淆工具。

解决方案:分而治之(Subagents)
教程没有使用一个全能 Agent,而是构建了 3 个专业分工的子智能体,每个只负责 MongoDB MCP 工具集中的一部分:
· Reader Agent:只负责读(查数据)。
· Writer Agent:只负责写(增删改)。
· Query Agent:负责根据模糊指令找到相关数据。

优势:通过限制每个智能体的视野和工具箱,极大降低了 DeepSeek 犯错的概率,保证了操作的精确性。

实战价值:从“玩具”到“工具”
教程不仅仅演示了“查询有多少部电影”这种简单 Demo,还提供了一个极具现实意义的案例:
· 数据迁移与分析:
脚本演示了如何将 Hugging Face Hub 上的真实数据(模型统计、数据集热度等)导入 MongoDB。
· 复杂查询:
导入后,你可以直接问 Agent:“Hugging Face 上最受欢迎的 10 个模型是什么?” Agent 会自动生成聚合查询语句,从数据库中提取答案。

总结
· 模型去魅:你不需要依赖昂贵的闭源模型(如 Claude Opus 4.5),DeepSeek v3.2 配合好的架构完全可以胜任复杂任务。
· MCP 普及:通过 MCP 协议连接数据库将成为标准,大大降低了开发 AI 应用的门槛。
· 架构优先:相比于追求更长的上下文,“主智能体 + 专用子智能体” 的架构才是解决复杂问题的稳定解法。

教程原文

[开源教程] 开源模型 + 成熟 Agent 框架 + 工具 => 复刻 Claude Code 级 AI Agent 关键技术组成 · 开源模型:DeepSeek-V3.2 · 成熟 Agent 框架:Claude Agents SDK · 工具和数据:MongoDB MCP Server 整体架构:模型 → Agents SDK → MongoDB 工具 → 数据库,实现闭环交互。 项目核心理念:三大技术的“强强联合” 构建一个能听懂人话、能自动操作数据库的智能体,融合了三项技术: 1. 大脑 —— DeepSeek v3.2: 换脑操作:通过修改 API Base URL,让 Claude Agents SDK 误以为自己在调用 Claude 模型,实际上调用的是 DeepSeek v3.2,这也成为 OpenAI API 之后 LLM API 的标配操作。 2. 骨架 —— Claude Agents SDK: 选择原因:没有选择 LangChain 或 OpenAI SDK,原因是 Claude Agents SDK 提供了构建复杂 Agent 所需的成熟“脚手架”(如子智能体管理、MCP 支持等),这些是驱动 Claude Code 的核心技术。 3. 手眼 —— MongoDB MCP Server: 技术点:采用 MCP 协议,通过 MongoDB 的 MCP 服务器,AI 可以标准化地执行查询、分析 Schema、甚至写入数据,而不需要复杂的胶水代码。 架构精髓:用“子智能体”对抗“脑雾” 教程中最具技术深度的部分。作者提出了一个关键问题:Context Rot。即使模型宣称支持 200k+ 的上下文,一旦输入过多信息,模型就会变笨、混淆工具。 解决方案:分而治之(Subagents) 教程没有使用一个全能 Agent,而是构建了 3 个专业分工的子智能体,每个只负责 MongoDB MCP 工具集中的一部分: · Reader Agent:只负责读(查数据)。 · Writer Agent:只负责写(增删改)。 · Query Agent:负责根据模糊指令找到相关数据。 优势:通过限制每个智能体的视野和工具箱,极大降低了 DeepSeek 犯错的概率,保证了操作的精确性。 实战价值:从“玩具”到“工具” 教程不仅仅演示了“查询有多少部电影”这种简单 Demo,还提供了一个极具现实意义的案例: · 数据迁移与分析: 脚本演示了如何将 Hugging Face Hub 上的真实数据(模型统计、数据集热度等)导入 MongoDB。 · 复杂查询: 导入后,你可以直接问 Agent:“Hugging Face 上最受欢迎的 10 个模型是什么?” Agent 会自动生成聚合查询语句,从数据库中提取答案。 总结 · 模型去魅:你不需要依赖昂贵的闭源模型(如 Claude Opus 4.5),DeepSeek v3.2 配合好的架构完全可以胜任复杂任务。 · MCP 普及:通过 MCP 协议连接数据库将成为标准,大大降低了开发 AI 应用的门槛。 · 架构优先:相比于追求更长的上下文,“主智能体 + 专用子智能体” 的架构才是解决复杂问题的稳定解法。 教程原文

邵猛,中年失业程序员 😂 专注 - Context Engineering, AI Agents. 分享 - AI papers, apps and OSS. ex Microsoft MVP 合作 - 私信/邮箱:shaomeng@outlook.com 📢 公众号/小红书: AI 启蒙小伙伴

avatar for meng shao
meng shao
Sun Dec 07 00:26:36
i used to have an open office policy - the deal was that you can book me for a codl chat but you have to opt in to me recording and posting them up

these things can have a 3 year impact cycle :)

i used to have an open office policy - the deal was that you can book me for a codl chat but you have to opt in to me recording and posting them up these things can have a 3 year impact cycle :)

achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai

avatar for swyx 🔜 NeurIPS + #DevWritersRetreat
swyx 🔜 NeurIPS + #DevWritersRetreat
Sun Dec 07 00:26:09
Wandered into another restaurant 
@ 
It’s a Falun Gong recruitment spot
@
They even have booklets in Russian
@
When will I find a straightforward CCP restaurant? The whole city is rife with rival Chinese sects offering delicious food, where is MSS looking. Can they cook at all

Wandered into another restaurant @ It’s a Falun Gong recruitment spot @ They even have booklets in Russian @ When will I find a straightforward CCP restaurant? The whole city is rife with rival Chinese sects offering delicious food, where is MSS looking. Can they cook at all

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Sun Dec 07 00:16:25
Per, @JustinLin610 (Qwen 3 Coder) talking about Synthetic data, RL, Scaling, Gates attention and Future direction.

- Thinking doesn't support coding use cases well.

- 256K context lenght, but even 128K would be sufficient for coding agents, as code is filtered before being added to the context. That said API support 1M tokens.

- Qwen 2.5 Coder helped with generating synthetic data with Qwen 3 Coder. It involved taking noisy data and cleaning it, rewriting it.

- Qwen Team's large scale environment for RL with MegaFlow scheduler. They us multiple scaffolds/agents to generate trajectories (interesting).

- RL Performance boost significantly, so efforts are well worth it.

- Qwen3-Max made Alibaba scale pilled. It does not have much innovation, but much larger scale. This is all due to scaling.
- Next generation of Qwen LLMs will have Gated Delta Attention given success of Qwen3-Next. They might combine Sparse attention tricks as well (DSA?).

- Future direction
1. New architecture for long context
2. Integrated search capabilities
3. Incorporating vision in coding models for computer use agent
4. Techniques for long-horizon tasks (24 hrs or longer)

Per, @JustinLin610 (Qwen 3 Coder) talking about Synthetic data, RL, Scaling, Gates attention and Future direction. - Thinking doesn't support coding use cases well. - 256K context lenght, but even 128K would be sufficient for coding agents, as code is filtered before being added to the context. That said API support 1M tokens. - Qwen 2.5 Coder helped with generating synthetic data with Qwen 3 Coder. It involved taking noisy data and cleaning it, rewriting it. - Qwen Team's large scale environment for RL with MegaFlow scheduler. They us multiple scaffolds/agents to generate trajectories (interesting). - RL Performance boost significantly, so efforts are well worth it. - Qwen3-Max made Alibaba scale pilled. It does not have much innovation, but much larger scale. This is all due to scaling. - Next generation of Qwen LLMs will have Gated Delta Attention given success of Qwen3-Next. They might combine Sparse attention tricks as well (DSA?). - Future direction 1. New architecture for long context 2. Integrated search capabilities 3. Incorporating vision in coding models for computer use agent 4. Techniques for long-horizon tasks (24 hrs or longer)

Artificial Intelligence @amazon, @awscloud RL, OSS AI, Coding Agents, General Purpose Agents All views personal!

avatar for GDP at NeurIPS 2025
GDP at NeurIPS 2025
Sun Dec 07 00:13:18
  • Previous
  • 1
  • More pages
  • 1391
  • 1392
  • 1393
  • More pages
  • 5634
  • Next