it’s ridiculous that evals are still improving so fast this late into the AI era. top models are only keeping SOTA for months, even weeks, still
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.