探索
線程創作

Thread Easy

Twitter 線程的一站式夥伴

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Author handle

From date

To date

Blur thumbnails

Keep on to blur preview images; turn off to show them clearly

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

Tue Dec 09 16:19:31

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

Tue Dec 09 16:19:31

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

Tue Dec 09 16:19:31

Previous
1
Next