Someone smart made a great analogy to me that I told him I’d steal so here it is: RLing a model to do a specific (benchmarkable) task is like finding a chemical compound that has a specific medical effect. It may or may not work for other, even non-adjacent tasks — you can only learn what else it’s good for (or any side effects) by experimenting.
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.