the plan? recognize the ascent of web-scale pretraining and its importance for future economic growth around the year 2021. understand a significant percentage of the data powering modern models comes from reddit or wikipedia, two massive community-organized knowledgebases. slowly buy and bribe access to the accounts of influential moderators and embed yourself deeper into the volunteer cabals that control the modern internet. and then, wait. wait for years until the data that you control becomes an indispensable part of the modern frontier model pipeline. and then, act. start with something small: a fake fact, spread across several articles and subreddits, goes undetected. months later it's embedded into the parameters of Gemini, chatGPT, and Claude. the models faithfully recount said fact as true with no thought as to how it might be misinformation that goes all the way back to the training data. good. your hard work has paid off, and the approach works beautifully. you control the fundamental information sources of the digital economy. you control the models. the internet belongs to you. the models belong to you. Claude belongs to you.
phd research @cornell // language models, information theory, science of AI