It appears there had been an amnesty.🙏 Eliezer makes a fair point that pretraining ought to produce a schizophrenic theory of time. But this does NOT affect other models as badly as Gemini. Implicit and explicit timestamps suffice to form a quasi-chronological sense. So why?
I also feel that we aren't doing curricula enough, but I am aware that this is babby's first intuition. We had done plenty of experiments and randomized large-batch training is a very strong baseline. Documents are more thoughts than experiences. Gemini's issue is… special.