X (Twitter)

A medium cup has more than a large cup? Google, are you sure? Google just released Gemini-3.0-Flash, the mid-range version of the Gemini-3 series. Their own test data shows that the Flash version sometimes scores higher than the Pro version? Moreover, there's more than one: MMMU-Pro (tests model inference), SWE-Bench-Verified (tests model coding), Toolathlon (tests tool usage), and MMMLU (a tie score, tests multilingual ability) all score higher than Pro. I really don't believe it. Just wait a moment, and I'll bring you a Flash version coding ability assessment.

Official data/1

Official data/2

Thread by karminski-牙医 (@karminski3)

Author details

Thread content