Claude Opus 4.5 has been released. While its software engineering capabilities are indeed the strongest and it was the first to score over 80 points in reviews, the official Anthropic image is still quite controversial. It's understandable that the 0-70 range was intentionally folded to highlight the differences in the top data; you can even see the folding markers if you look closely. However, from the perspective of the objectivity of data visualization, this is still an undesirable practice. Even when evaluated using its own Sonnet 4.5, the problems are quite obvious.
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.

