LogoThread Easy
  • 発見
  • スレッド作成
LogoThread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
  • Previous
  • 1
  • Next