探索 | Thread Easy - Twitterスレッドを展開 | リーダー・要約・作成

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

Reasoning models coming (very) soon. Co-founder @pleiasfr

Alexander Doria

Sun Nov 09 09:57:40

5 years of grind 1 depression Nothing took off This was three years ago, launching yet another small bet, hoping that someday one would take off.

🧑‍💻 https://t.co/Y30jsaHwz9 $20K/m ⚡️ https://t.co/vatLDmi9UG $17K/m 📈 https://t.co/3EDxln5mdi $16K/m ⭐️ https://t.co/MZc8tG9xWi $8K/m 🧬 https://t.co/SfrVXVtmdA $.5K/m 🍜 https://t.co/r07EpGSYJ2 $0K/m 🧾 https://t.co/7olaOzV8Xd $0/m +18 https://t.co/4zCWHGJp1S

Marc Lou

Sun Nov 09 09:57:36

This is like the third methodology i posted on that MSL released as a paper shortly after.

This means MSL is trying ideas and executing with high entropy. Unexpected and interesting.

tokenbender

Sun Nov 09 09:57:16

Deep dive on RL lora rank size, which adds to my overall feeling there are reasoning <=> compute effort scaling laws waiting to be found.

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

Alexander Doria

Sun Nov 09 09:56:08

印度公司也开始用国产大模型了？刚我又看到了一个剪枝模型！MiniMax-M2-THRIFT。从250B剪到了192B，性能下降约5%。我觉得模型本身性能啥的倒无所谓，但自从llama倒下后，再到这个月已经有俩基于国产模型的剪枝模型 (Kimi-Linear-REAP和MiniMax-M2-THRIFT)。虽然这个魔改模型可能并不十分亮眼，但值得一提的是，这个模型的发布者叫VibeStudio，他们主打在云端可以运行的vibe环境，试想一下一个在网页中运行的VSCode+AI Agent 或者在网页运行的ClaudeCode. 最大的优点是 Vibe Everywhere. 介绍这个公司干嘛呢？因为我搜了下这是一家印度公司，位置在金奈。而他们推理使用 cerebras 服务, 模型则使用 Kimi-K2. 便宜大碗优势开始显现。现在除了那些必须要站队的公司（微软，NVIDIA等）还在用llama3魔改，剩下的无论是初创公司还是算力服务商都在用国产开放权重模型了。开放权重模型的生态正在被国产大模型不断占据。给力。模型地址：

模型数据

karminski-牙医

Sun Nov 09 09:43:54

RT @egrefen: Yorick Wilks used to like the quote "After Leibniz, a philosopher is a guy too lazy to work in a laboratory" (can't remember…

Building @SakanaAILabs 🧠

hardmaru

Sun Nov 09 09:33:40

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

5 years of grind 1 depression Nothing took off This was three years ago, launching yet another small bet, hoping that someday one would take off.

This is like the third methodology i posted on that MSL released as a paper shortly after.

Deep dive on RL lora rank size, which adds to my overall feeling there are reasoning <=> compute effort scaling laws waiting to be found.

RT @egrefen: Yorick Wilks used to like the quote "After Leibniz, a philosopher is a guy too lazy to work in a laboratory" (can't remember…