X (Twitter)

claude-opus-4.5 has been released! This time, Opus-4.5 boosted the Aider Polyglot Coding Leaderboard (which I find to be the most accurate programming benchmark in practice) to 89.4! It's finally going to break through! Let me explain why DeepSeek-R1 could only complete 56.9% of the questions at the beginning of the year, but this time it can complete about 90%. So, what's the cost? Of course, it'll drain your wallets. This thing is priced at one million tokens and outputs $25... The silicon-based delivery rider test I showed you yesterday basically costs 1 million tokens per run... that's a whopping $170... It's really unaffordable... In addition, all other tests are also state-of-the-art (SOTA). I will bring you video test results later! Stay tuned!

Performance parameters / 1

Performance parameters / 2

Pricing

Summarize

Thread by karminski-牙医 (@karminski3)

Author details

Thread content