探索
撰写 Thread

Thread Easy

您的一体化 Twitter 线程助手

© 2025 Thread Easy All Rights Reserved.

探索

最新在前，按卡片方式浏览线程

作者账号

起始日期

结束日期

模糊预览图

开启时会模糊预览图，关闭后正常显示

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

Tue Dec 09 16:19:31

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

Tue Dec 09 16:19:31

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

Tue Dec 09 16:19:31

Previous
1
Next