LogoThread Easy
  • 탐색
  • 스레드 작성
LogoThread Easy

트위터 스레드의 올인원 파트너

© 2025 Thread Easy All Rights Reserved.

탐색

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story:

- GLM 4.6 (32B): 74% fewer
- Minimax M2 (10B): 92% fewer
- K2 thinking (32B): 74% fewer
- V3.2 (37B): 70% fewer

size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

nice that it's open weight, but comparing dense vs moe models and only looking at total params is pretty unfair, if you look at active params instead of total params it's a different story: - GLM 4.6 (32B): 74% fewer - Minimax M2 (10B): 92% fewer - K2 thinking (32B): 74% fewer - V3.2 (37B): 70% fewer size (both total or active!) is not the right metric here, we should have the same graph with speed on vllm / sglang

Training llm's (now: @huggingface) anon feedback: https://t.co/JmMh7Sfvxd

avatar for elie
elie
Tue Dec 09 16:19:31
  • Previous
  • 1
  • Next