X (Twitter)

Claude Opus 4.5 has been released. While its software engineering capabilities are indeed the strongest and it was the first to score over 80 points in reviews, the official Anthropic image is still quite controversial. It's understandable that the 0-70 range was intentionally folded to highlight the differences in the top data; you can even see the folding markers if you look closely. However, from the perspective of the objectivity of data visualization, this is still an undesirable practice. Even when evaluated using its own Sonnet 4.5, the problems are quite obvious.

Thread by meng shao (@shao__meng)

Author details

Thread content