LogoThread Easy
  • 発見
  • スレッド作成
LogoThread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

Reasoning models coming (very) soon. Co-founder @pleiasfr

avatar for Alexander Doria
Alexander Doria
Sun Nov 09 09:57:40
5 years of grind
1 depression
Nothing took off

This was three years ago, launching yet another small bet, hoping that someday one would take off.

5 years of grind 1 depression Nothing took off This was three years ago, launching yet another small bet, hoping that someday one would take off.

🧑‍💻 https://t.co/Y30jsaHwz9 $20K/m ⚡️ https://t.co/vatLDmi9UG $17K/m 📈 https://t.co/3EDxln5mdi $16K/m ⭐️ https://t.co/MZc8tG9xWi $8K/m 🧬 https://t.co/SfrVXVtmdA $.5K/m 🍜 https://t.co/r07EpGSYJ2 $0K/m 🧾 https://t.co/7olaOzV8Xd $0/m +18 https://t.co/4zCWHGJp1S

avatar for Marc Lou
Marc Lou
Sun Nov 09 09:57:36
This is like the third methodology i posted on that MSL released as a paper shortly after.

This is like the third methodology i posted on that MSL released as a paper shortly after.

This means MSL is trying ideas and executing with high entropy. Unexpected and interesting.

avatar for tokenbender
tokenbender
Sun Nov 09 09:57:16
Deep dive on RL lora rank size, which adds to my overall feeling there are reasoning <=> compute effort scaling laws waiting to be found.

Deep dive on RL lora rank size, which adds to my overall feeling there are reasoning <=> compute effort scaling laws waiting to be found.

Now that I finally have controlled synthetic environments, seeing similar trade-off on the pretrain side. Like stacking layers is even more beneficial to some tasks/domains (math) than others.

avatar for Alexander Doria
Alexander Doria
Sun Nov 09 09:56:08
印度公司也开始用国产大模型了?

刚我又看到了一个剪枝模型!MiniMax-M2-THRIFT。

从250B剪到了192B,性能下降约5%。我觉得模型本身性能啥的倒无所谓,但自从llama倒下后,再到这个月已经有俩基于国产模型的剪枝模型 (Kimi-Linear-REAP和MiniMax-M2-THRIFT)。

虽然这个魔改模型可能并不十分亮眼,但值得一提的是,这个模型的发布者叫VibeStudio,他们主打在云端可以运行的vibe环境,试想一下一个在网页中运行的VSCode+AI Agent 或者在网页运行的ClaudeCode. 最大的优点是 Vibe Everywhere. 介绍这个公司干嘛呢?因为我搜了下这是一家印度公司,位置在金奈。而他们推理使用 cerebras 服务, 模型则使用 Kimi-K2. 便宜大碗优势开始显现。

现在除了那些必须要站队的公司(微软,NVIDIA等)还在用llama3魔改,剩下的无论是初创公司还是算力服务商都在用国产开放权重模型了。开放权重模型的生态正在被国产大模型不断占据。给力。

模型地址:

印度公司也开始用国产大模型了? 刚我又看到了一个剪枝模型!MiniMax-M2-THRIFT。 从250B剪到了192B,性能下降约5%。我觉得模型本身性能啥的倒无所谓,但自从llama倒下后,再到这个月已经有俩基于国产模型的剪枝模型 (Kimi-Linear-REAP和MiniMax-M2-THRIFT)。 虽然这个魔改模型可能并不十分亮眼,但值得一提的是,这个模型的发布者叫VibeStudio,他们主打在云端可以运行的vibe环境,试想一下一个在网页中运行的VSCode+AI Agent 或者在网页运行的ClaudeCode. 最大的优点是 Vibe Everywhere. 介绍这个公司干嘛呢?因为我搜了下这是一家印度公司,位置在金奈。而他们推理使用 cerebras 服务, 模型则使用 Kimi-K2. 便宜大碗优势开始显现。 现在除了那些必须要站队的公司(微软,NVIDIA等)还在用llama3魔改,剩下的无论是初创公司还是算力服务商都在用国产开放权重模型了。开放权重模型的生态正在被国产大模型不断占据。给力。 模型地址:

模型数据

avatar for karminski-牙医
karminski-牙医
Sun Nov 09 09:43:54
RT @egrefen: Yorick Wilks used to like the quote  "After Leibniz, a philosopher is a guy too lazy to work in a laboratory" (can't remember…

RT @egrefen: Yorick Wilks used to like the quote "After Leibniz, a philosopher is a guy too lazy to work in a laboratory" (can't remember…

Building @SakanaAILabs 🧠

avatar for hardmaru
hardmaru
Sun Nov 09 09:33:40
  • Previous
  • 1
  • More pages
  • 341
  • 342
  • 343
  • More pages
  • 2117
  • Next