LogoThread Easy
  • 発見
  • スレッド作成
LogoThread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

哭了, SOTA 只是面子, 真干活还得靠"牛马模型"

OpenRouter 创始人 Alex Atallah  刚发了个推, 说自己用量最大的还是 Kimi-K2-0711 (7月份的 Kimi-K2-Instruct).

然后是 openai-o4-mini-high, Claude-3.7-Sonnet, gpt-oss-120b, openai-o3

我第一看上去, 这人是不是断网了, 好久没用新的大模型了? 

但仔细一想, 不对, 很不对劲. 这才是真正 Power User 的用法, 太真实了

如果在这个时间点找一个, 足够大上下文(128K), 性能够用(SWE-Bench Verified > 65), Agent 能力强(Tau2-bench > 65), 知识面巨广(参数量相当大), 且回答得快(非Thinking模型), 好像只有 Kimi-K2-Instruct 了.

这么倒推 Alex Atallah 应该大部分工作都是处理文档 (长上下文, 尤其是用了13.4M token), 使用工具分析并撰写报告 (Agent 能力), 这些 Kimi-K2-Instruct 都能搞定, 然后写写脚本 (o4 和 Claude-3.7-Sonnet 兜底, 甚至包装成 Agent 让 Kimi-k2 调用这些模型来写脚本). 

最后 Kimi-k2 还能满足最重要的一点, 数据隐私, 因为模型是开放权重的, 可以部署在自家服务器, 任何敏感信息都不会泄露给 OpenAI 或者 Anthropic. 甚至下面那个 GPT-OSS-120B 存在意义应该也在于此.

我大概能懂现在新的大模型为什么卷 Agent 能力了, 人直接用 AI 只是中间阶段, 高级用户都已经用 AI 来操作 AI 了. 一个用来收发所有 AI 上下文的 Agent 特化模型必然会是用量 Top.

原帖:

哭了, SOTA 只是面子, 真干活还得靠"牛马模型" OpenRouter 创始人 Alex Atallah 刚发了个推, 说自己用量最大的还是 Kimi-K2-0711 (7月份的 Kimi-K2-Instruct). 然后是 openai-o4-mini-high, Claude-3.7-Sonnet, gpt-oss-120b, openai-o3 我第一看上去, 这人是不是断网了, 好久没用新的大模型了? 但仔细一想, 不对, 很不对劲. 这才是真正 Power User 的用法, 太真实了 如果在这个时间点找一个, 足够大上下文(128K), 性能够用(SWE-Bench Verified > 65), Agent 能力强(Tau2-bench > 65), 知识面巨广(参数量相当大), 且回答得快(非Thinking模型), 好像只有 Kimi-K2-Instruct 了. 这么倒推 Alex Atallah 应该大部分工作都是处理文档 (长上下文, 尤其是用了13.4M token), 使用工具分析并撰写报告 (Agent 能力), 这些 Kimi-K2-Instruct 都能搞定, 然后写写脚本 (o4 和 Claude-3.7-Sonnet 兜底, 甚至包装成 Agent 让 Kimi-k2 调用这些模型来写脚本). 最后 Kimi-k2 还能满足最重要的一点, 数据隐私, 因为模型是开放权重的, 可以部署在自家服务器, 任何敏感信息都不会泄露给 OpenAI 或者 Anthropic. 甚至下面那个 GPT-OSS-120B 存在意义应该也在于此. 我大概能懂现在新的大模型为什么卷 Agent 能力了, 人直接用 AI 只是中间阶段, 高级用户都已经用 AI 来操作 AI 了. 一个用来收发所有 AI 上下文的 Agent 特化模型必然会是用量 Top. 原帖:

A coder, road bike rider, server fortune teller, electronic waste collector, co-founder of KCORES, ex-director at IllaSoft, KingsoftOffice, Juejin.

avatar for karminski-牙医
karminski-牙医
Fri Dec 19 12:22:26
Massive token efficiency gains from Seed.
But what's interesting is not that they've improved economics. More generally, the question is whether you can do this "high" thing for real, if you can trade compute for output quality without a ceiling on compute.
Some labs can.

Massive token efficiency gains from Seed. But what's interesting is not that they've improved economics. More generally, the question is whether you can do this "high" thing for real, if you can trade compute for output quality without a ceiling on compute. Some labs can.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Fri Dec 19 12:21:58
> - PutnamBench: 87.9% (580 solved)
Putnam got saturated within 8 months.
@huajian_xin is on the team of course.

> - PutnamBench: 87.9% (580 solved) Putnam got saturated within 8 months. @huajian_xin is on the team of course.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Fri Dec 19 12:16:34
1300+ downloads in one week! 

npx create-8004-agent is rapidly becoming the starting point for 1000s of on-chain agents ✨

1300+ downloads in one week! npx create-8004-agent is rapidly becoming the starting point for 1000s of on-chain agents ✨

Note: aggregated create-trustless-agent + new create-8004-agent

avatar for Vitto Rivabella
Vitto Rivabella
Fri Dec 19 11:57:22
RT @getpy: Last Issue of DSPyWeekly for 2025 - Issue 16th

📚 Articles
Stop Writing Prompts Like a Medieval Alchemist: Why it's time to ditc…

RT @getpy: Last Issue of DSPyWeekly for 2025 - Issue 16th 📚 Articles Stop Writing Prompts Like a Medieval Alchemist: Why it's time to ditc…

Asst professor @MIT EECS & CSAIL (@nlp_mit). Author of https://t.co/VgyLxl0oa1 and https://t.co/ZZaSzaRaZ7 (@DSPyOSS). Prev: CS PhD @StanfordNLP. Research @Databricks.

avatar for Omar Khattab
Omar Khattab
Fri Dec 19 11:57:16
I love those business guru YouTubers selling you the idea that you can create a new product from zero and start earning from Day 1.

And then they "casually" mention that their product became successful because they posted on Instagram to their 300k followers.

I love those business guru YouTubers selling you the idea that you can create a new product from zero and start earning from Day 1. And then they "casually" mention that their product became successful because they posted on Instagram to their 300k followers.

~20 yrs in web-dev, now mostly Laravel. My Laravel courses: https://t.co/HRUAJdMRZL My Youtube channel: https://t.co/qPQAkaov2F

avatar for Povilas Korop | Laravel Courses Creator & Youtuber
Povilas Korop | Laravel Courses Creator & Youtuber
Fri Dec 19 11:53:00
  • Previous
  • 1
  • More pages
  • 340
  • 341
  • 342
  • More pages
  • 5634
  • Next