Keep on to blur preview images; turn off to show them clearly
https://t.co/fwpAL9bbmU


Husband / Father of two / Founder @voidzerodev / Creator @vuejs & @vite_js. Chinese-only alt: @yuxiyou


Host of @dwarkeshpodcast https://t.co/3SXlu7fy6N https://t.co/4DPAxODFYi https://t.co/hQfIWdM1Un

🚧 building https://t.co/AJfZ3LMlgq https://t.co/606cFUoda3 https://t.co/s0m0tpQMDH https://t.co/UQ5vrrYdAG 🐣learning/earning while helping others ❤️making software, storytelling videos 🔙alibaba @thoughtworks

![Fiction.LiveBench 终于更新了,这次他们更新了 deepseek-v3.2-exp [reasoning: high], deepseek-v3.2-exp, nemotron-nano-9b-v2:free, qwen-max, qwen3-next-80b-a3b-instruct. 这几个模型的召回性能测试结果。
比较令我吃惊的是,deepseek-v3.2-exp [reasoning: high] 竟然达到了准一流的水平,32K内至少83%的召回率。这是 deepseek 有史以来最好的结果了。不过大于60K应该是出错了,没有得出测试结果。
另外 Qwen3-Next 新架构看上去效果一般,希望新版本能有提升,我目前本地最喜欢的中等大小模型就是 Qwen3-Next 和 Kimi-linear. Fiction.LiveBench 终于更新了,这次他们更新了 deepseek-v3.2-exp [reasoning: high], deepseek-v3.2-exp, nemotron-nano-9b-v2:free, qwen-max, qwen3-next-80b-a3b-instruct. 这几个模型的召回性能测试结果。
比较令我吃惊的是,deepseek-v3.2-exp [reasoning: high] 竟然达到了准一流的水平,32K内至少83%的召回率。这是 deepseek 有史以来最好的结果了。不过大于60K应该是出错了,没有得出测试结果。
另外 Qwen3-Next 新架构看上去效果一般,希望新版本能有提升,我目前本地最喜欢的中等大小模型就是 Qwen3-Next 和 Kimi-linear.](/_next/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG7LrhcRaMAA43KO.jpg&w=3840&q=75)
A coder, road bike rider, server fortune teller, electronic waste collector, co-founder of KCORES, ex-director at IllaSoft, KingsoftOffice, Juejin.


GP @a16z — Building American Dynamism 🇺🇸 — Anthropologist — Formerly Founder/CEO @OpenDNS — Lokah Samastah Sukhino Bhavantu
