V3 announced “We will consistently explore and iterate on the deep thinking capabilities of our models” 3 weeks before R1. Also: “infinite context” (still max 128k). V3.2 announces more compute. Also: more efficient CoTs. So, Big Training Promise. Which is it? Weeks or years?
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.

