LogoThread Easy
  • 탐색
  • 스레드 작성
LogoThread Easy

트위터 스레드의 올인원 파트너

© 2025 Thread Easy All Rights Reserved.

탐색

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

To explain my dad joke: this is pretty much an exact translation of Chinese borders on the world map, geometrically speaking
Nano-Banana took "overlaps with…" prompt line and literally translated Chinese land borders to satisfy the overlapping

To explain my dad joke: this is pretty much an exact translation of Chinese borders on the world map, geometrically speaking Nano-Banana took "overlaps with…" prompt line and literally translated Chinese land borders to satisfy the overlapping

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Tue Dec 23 10:07:46
ULMFiT was really ahead of its time, complete with the pre-train -> mid-train -> SFT pipeline we use today

ULMFiT was really ahead of its time, complete with the pre-train -> mid-train -> SFT pipeline we use today

🤠 post-training @huggingface

avatar for Lewis Tunstall
Lewis Tunstall
Tue Dec 23 09:55:47
ULMFiT was really ahead of its time, complete with the pre-train -> mid-train -> SFT pipeline we use today

ULMFiT was really ahead of its time, complete with the pre-train -> mid-train -> SFT pipeline we use today

🤠 post-training @huggingface

avatar for Lewis Tunstall
Lewis Tunstall
Tue Dec 23 09:55:47
ULMFiT was really ahead of its time, complete with the pre-train -> mid-train -> SFT pipeline we use today

ULMFiT was really ahead of its time, complete with the pre-train -> mid-train -> SFT pipeline we use today

🤠 post-training @huggingface

avatar for Lewis Tunstall
Lewis Tunstall
Tue Dec 23 09:55:47
抱歉我们只有超大杯! GLM-4.7实测!

本次测试覆盖了GLM-4.7的编程能力, Agent/ToolCall能力, 长上下文召回能力, 给大家带来刚发布的 GLM 4.7 的测试结果:

考验Agent能力的硅基骑手测试, 简单讲是让大模型使用工具模拟骑手取外卖送餐.

GLM 4.7 在24小时总计300回合的极限送餐中收益达到了 571.91 元, 执行了总计 354 次 tool call, 测试使用了大约 50% 的上下文空间, 直到超过100K后才停止工作.

Agent 测试这次是创了新高, 执行效率特别高, 得益于模型可以在一次会话中发起多个 tool call, 节省了时间并能选择收益最大的方案.

然后是考验长上下文召回能力的霍格沃茨测试, 简单来讲就是在长上下文中, 能否记住上下文并准确的回答问题.

GLM 4.7 在192K以内召回水平在91%到100%区间, 而200K也有95%, 召回效果同样也很不错.

最后再来看编程能力测试上最大的感受是粒子, 建模, 光影效果都有提升, 尤其是空间能力有了巨大的提升. 当然性能问题仍然存在, 希望下个版本着重优化下生成代码的性能问题.

总结, 这次GLM 4.7 在各个方面都有明显的提升, 作为主力编程模型不是问题, LMArena 和 SWE-bench 等编程测试中都取得了开源大模型 SOTA 的水平.

不过还是要说一句, 测试中我发现API速度时快时慢, 是不是因为大家都在用新版本导致的? 希望官方赶紧加机器.

#GLM47 #智谱AI #智谱GLM #AIAgent #ai编程 #大模型 #开源 #KCORES大模型竞技场

抱歉我们只有超大杯! GLM-4.7实测! 本次测试覆盖了GLM-4.7的编程能力, Agent/ToolCall能力, 长上下文召回能力, 给大家带来刚发布的 GLM 4.7 的测试结果: 考验Agent能力的硅基骑手测试, 简单讲是让大模型使用工具模拟骑手取外卖送餐. GLM 4.7 在24小时总计300回合的极限送餐中收益达到了 571.91 元, 执行了总计 354 次 tool call, 测试使用了大约 50% 的上下文空间, 直到超过100K后才停止工作. Agent 测试这次是创了新高, 执行效率特别高, 得益于模型可以在一次会话中发起多个 tool call, 节省了时间并能选择收益最大的方案. 然后是考验长上下文召回能力的霍格沃茨测试, 简单来讲就是在长上下文中, 能否记住上下文并准确的回答问题. GLM 4.7 在192K以内召回水平在91%到100%区间, 而200K也有95%, 召回效果同样也很不错. 最后再来看编程能力测试上最大的感受是粒子, 建模, 光影效果都有提升, 尤其是空间能力有了巨大的提升. 当然性能问题仍然存在, 希望下个版本着重优化下生成代码的性能问题. 总结, 这次GLM 4.7 在各个方面都有明显的提升, 作为主力编程模型不是问题, LMArena 和 SWE-bench 等编程测试中都取得了开源大模型 SOTA 的水平. 不过还是要说一句, 测试中我发现API速度时快时慢, 是不是因为大家都在用新版本导致的? 希望官方赶紧加机器. #GLM47 #智谱AI #智谱GLM #AIAgent #ai编程 #大模型 #开源 #KCORES大模型竞技场

A coder, road bike rider, server fortune teller, electronic waste collector, co-founder of KCORES, ex-director at IllaSoft, KingsoftOffice, Juejin.

avatar for karminski-牙医
karminski-牙医
Tue Dec 23 09:53:43
RT @michaelfreedman: Replit's deep dive into their snapshotting infrastructure, posted Thursday, is worth a read. It highlights a key shift…

RT @michaelfreedman: Replit's deep dive into their snapshotting infrastructure, posted Thursday, is worth a read. It highlights a key shift…

ceo @replit. civilizationist

avatar for Amjad Masad
Amjad Masad
Tue Dec 23 09:50:40
  • Previous
  • 1
  • More pages
  • 74
  • 75
  • 76
  • More pages
  • 5634
  • Next