探索 | Thread Easy - 展开 Twitter 线程｜阅读、总结与创作

Deep dive on RL lora rank size, which adds to my overall feeling there are reasoning <=> compute effort scaling laws waiting to be found.

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

Alexander Doria

Sun Nov 09 09:56:08

印度公司也开始用国产大模型了？刚我又看到了一个剪枝模型！MiniMax-M2-THRIFT。从250B剪到了192B，性能下降约5%。我觉得模型本身性能啥的倒无所谓，但自从llama倒下后，再到这个月已经有俩基于国产模型的剪枝模型 (Kimi-Linear-REAP和MiniMax-M2-THRIFT)。虽然这个魔改模型可能并不十分亮眼，但值得一提的是，这个模型的发布者叫VibeStudio，他们主打在云端可以运行的vibe环境，试想一下一个在网页中运行的VSCode+AI Agent 或者在网页运行的ClaudeCode. 最大的优点是 Vibe Everywhere. 介绍这个公司干嘛呢？因为我搜了下这是一家印度公司，位置在金奈。而他们推理使用 cerebras 服务, 模型则使用 Kimi-K2. 便宜大碗优势开始显现。现在除了那些必须要站队的公司（微软，NVIDIA等）还在用llama3魔改，剩下的无论是初创公司还是算力服务商都在用国产开放权重模型了。开放权重模型的生态正在被国产大模型不断占据。给力。模型地址：

模型数据

karminski-牙医

Sun Nov 09 09:43:54

RT @egrefen: Yorick Wilks used to like the quote "After Leibniz, a philosopher is a guy too lazy to work in a laboratory" (can't remember…

Building @SakanaAILabs 🧠

hardmaru

Sun Nov 09 09:33:40

Kimi K2 Thinking, thinking of how to write Pushkin (translated) this is so unbearably cute. This drama queen model is growing on me.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Sun Nov 09 09:31:21

我发明了一个新词：「规模套利」 AI 起号，真正的价值一群AI号，粉丝数不用特别多，上去就是一顿框框私信广告，赚不赚钱？过去矩阵号，几十个号分发一个脸的视频切片，框框的占领用户心智，你都不认识这货是谁，但他的脸早已经深入人心。如果你不想成为行尸走肉，就要给自己脑子里设一道防线，用来对抗这种心智入侵的行为。你看到的那种视频，都是针对你的心理薄弱设计好的，比如高清大头照视频、语言全部都是祈使句、全部者是肯定句，你要小心了，你是他的猎物。包括我之前讲的，想起X号，多发推，本质上也是一种规模套利。但是当越来越多的人用这种方法，这类方法就不凑效了。

Programmer ｜ Growth Coach｜Helping creators build their personal brand on X 公众号：PandaTalk8

Mr Panda

Sun Nov 09 09:04:05

Exactly Especially Americans Which is why I'm visiting China after because nobody ever goes to China I was there last in 2018 in Qingdao and Beijing and it's really interesting to visit

🇪🇺https://t.co/NdorAWqJC3 📸https://t.co/lAyoqmSBRX $125K/m 🏡https://t.co/1oqUgfD6CZ $40K/m 🛰https://t.co/ZHSvI2wjyW $38K/m 🌍https://t.co/UXK5AFqCaQ $16K/m 👙https://t.co/RyXpqGuFM3 $14K/m 💾https://t.co/M1hEUBAynC $6K/m

@levelsio

Sun Nov 09 09:03:06

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

Deep dive on RL lora rank size, which adds to my overall feeling there are reasoning <=> compute effort scaling laws waiting to be found.

RT @egrefen: Yorick Wilks used to like the quote "After Leibniz, a philosopher is a guy too lazy to work in a laboratory" (can't remember…

Kimi K2 Thinking, thinking of how to write Pushkin (translated) this is so unbearably cute. This drama queen model is growing on me.

Exactly Especially Americans Which is why I'm visiting China after because nobody ever goes to China I was there last in 2018 in Qingdao and Beijing and it's really interesting to visit