OpenAI has historically scaled up training compute by around 100x with each new generation of its GPT. However, GPT-5 appears to be an exception to this trend. 🧵
GPT-4 was trained on 2e25 floating-point operations, and OpenAI said GPT-4.5 was about an order-of-magnitude (10x) scale-up. We don’t have a rigorous estimate yet, but GPT-5’s compute scale may be *between* GPT-4 and GPT-4.5, and it is probably not a large scale-up from 4.5.
Training compute scales with model size × training data. GPT-5 is fast and fairly cheap on the API, with output tokens 15x cheaper and served ~2-4x faster than GPT-4.5 on launch! This suggests GPT-5 is a much smaller model than GPT-4.5.
We don’t know how much data GPT-5 was trained on. But since scaling pre-training data was a major challenge for GPT-4.5 just six months ago, GPT-5 likely didn’t use significantly more real data. It also used synthetic data from o3, but with a focus on quality, not quantity.
Our conclusion that GPT-5 isn’t a 100x scale-up from GPT-4 was confirmed by Rohan Pandey (formerly OpenAI), at least in terms of pre-training.
Companies are also rapidly scaling reinforcement learning, which follows traditional pretraining, to improve reasoning and other skills. For example, OpenAI scaled up RL compute by 10x between o1 and o3.
But most models to date were majority pre-training compute. Efficiently scaling up RL will require research on data, environments, and reward models, and GPT-5 is probably too early to achieve GPT-4.5 scale through RL alone, much less reach a new frontier in compute.
GPT-5’s compute scale has implications for AI's trajectory. OpenAI might feel that scaling is relatively unpromising for now, perhaps due to inference costs. But if GPT-5 doesn’t set a new compute frontier, they have headroom for faster iteration cycles and future scale-ups.
We’ll be watching closely for more evidence on how GPT-5 was trained. Stay tuned!