开启时会模糊预览图,关闭后正常显示

I'm surprised how small Opus feels sometimes just like Sonnet, I can unintentionally collapse it into parroting (which it can be rescued from) I suspect it relies on CoT preservation though. need to retest


but also, that's partially a style/gimmick this gimmick takes real brains to execute but it's an arbitrary design choice, not a product of raw capability jump overflowing into "huh, this teortaxes guy ain't that smart". It still sometimes objects inanely a bit smaller step


聊硅基 AI,看有机 Orange。


Coding 编码能力 SWE-Bench Pro 是一项针对真实世界软件工程的严格评估。 与仅测试 Python 的 SWE-bench Verified 不同,SWE-Bench Pro 测试四种语言,并致力于具备更强的抗污染能力、更高的挑战性、更丰富的多样性以及更强的工业相关性。 GPT‑5.2 Thinking 在 SWE-Bench Pro 上取得了 55.6%的全新最先进水平。超过了 Claude Opus 4.5 的 52% 和 Gemini 3 Pro 的 43.3% 。


Download the latest versions of Windsurf and Windsurf Next to try it out: https://t.co/E6JgVot67u


具体性能/1
