LogoThread Easy
  • 探索
  • 線程創作
LogoThread Easy

Twitter 線程的一站式夥伴

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
  • Previous
  • 1
  • Next