X (Twitter)

I like it because otherwise the struggle would seem hollow. GPT-5/5.1 clearly were more or less DS-V3.1, only with more compute, early mover advantage, multimodality and poorer cost discipline. What is there to exceed? A GPU pile? This, this is actually a big research problem.

To be clear, I overstate my case. It’s plausible that something like ParScale or PaCoRe or another consensus aggregation + V3.2 + an ugly prompt machine get you “kind of” 5.2, maybe even cheaper. But these are cheats. 5.2 can also use them. We need to do better.

Thread by Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex)

Author details

Thread content