X (Twitter)

I just checked the latest SWE-Benchverified test results, and MiniMax-M2 has become the highest-scoring open-weight model! Minimax M2 is currently the king of open-source models, with strong agent capabilities. However, the official documentation states that testing consumed a lot of tokens. Nevertheless, its ability to handle long tasks is truly outstanding, maintaining stability even with 200+ steps. Deepseek v3.2 Reasoning version is a close second, with an incredibly low price, though it's a bit slow. If you're not in a rush, its price-performance ratio is unbeatable. You can achieve excellent results in about 100 steps. The GLM 4.6 performs very well this time, offering fast speeds, a low price, and good performance, making it a top performer in terms of cost-effectiveness. It's roughly on par with the qwen3 coder 480b a35b, but with a much faster response time. Overall, open-source models are progressing quite rapidly. Although they still lag behind closed-source models such as Gemini 3 Pro and Claude 4.5 Opus, they are constantly catching up with leading commercial models. #SWEBench #AIEvaluation #LargeModel #Minimax #DeepSeek #GLM #OpenSourceModel #AIPerformance #CodeGeneration

Thread by karminski-牙医 (@karminski3)

Author details

Thread content