Nobody tries to replicate these papers. I find that insane. Nobody replicated Whale papers either, until they stomped the open source competition into the pavement with R1, proving superiority of their DS-MoE+DSMath+GRPO stack. Now it's default. But they're "small". GDM is GDM.
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.