LogoThread Easy
  • Explorar
  • Componer hilo
LogoThread Easy

Tu compañero integral para hilos de Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI.

It weighs ~8,000 lines of imo quite clean code to:

- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with "GRPO"
- Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI.
- Write a single markdown report card, summarizing and gamifying the whole thing.

Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.

My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.

Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

GitHub repo: https://t.co/Cpm3Dc44rY A lot more detailed and technical walkthrough: https://t.co/YmHaZfNjcJ Example conversation with the $100, 4-hour nanochat in the WebUI. It's... entertaining :) Larger models (e.g. a 12-hour depth 26 or a 24-hour depth 30) quickly get more coherent.

avatar for Andrej Karpathy
Andrej Karpathy
Mon Oct 13 15:16:53
Crete, Greece

Crete, Greece

Edinburgh, UK

avatar for Stas Kulesh
Stas Kulesh
Mon Oct 13 13:32:37
Anthropic has had a Head of Developer Relations role open for the last 4 months, despite offering $385-460k salary

This is my attempt to explain why, after kind of dying out last year, DevRel is So Back, thanks to prompting from @philvanonline and @pdrmnvd . something's changed.

as always if you know capable people looking for a devrel role, I can refer you to a large range of fantastic AI startups from OpenAI to Fireworks to others I invested in

Anthropic has had a Head of Developer Relations role open for the last 4 months, despite offering $385-460k salary This is my attempt to explain why, after kind of dying out last year, DevRel is So Back, thanks to prompting from @philvanonline and @pdrmnvd . something's changed. as always if you know capable people looking for a devrel role, I can refer you to a large range of fantastic AI startups from OpenAI to Fireworks to others I invested in

@PhilVanOnline @pdrmnvd actually if you are hiring devrel, pls link your job post here in replies and i'll signal boost what i can (would love for it to work the other way but very often good people have jobs and are only lowkey on the market)

avatar for swyx
swyx
Mon Oct 13 06:09:20
Hours from their return.
Now our group must fulfill our end of the bargain. 
There is a gap, all your experts will agree, between the specialities. Between Quantum Physics and Deterministic Binary Systems with Neural Network Theory. 
It will take a multidisciplinary approach and there are but 100 thousand Quantum Physicists on the planet. 
Isn't it lovely that peacocks and art deco have returned.
#ai #quantum #israel @Joeian65278116

Hours from their return. Now our group must fulfill our end of the bargain. There is a gap, all your experts will agree, between the specialities. Between Quantum Physics and Deterministic Binary Systems with Neural Network Theory. It will take a multidisciplinary approach and there are but 100 thousand Quantum Physicists on the planet. Isn't it lovely that peacocks and art deco have returned. #ai #quantum #israel @Joeian65278116

> Isn't it lovely that peacocks and art deco have returned. Seth Morris (A.I.): “Indeed, it is lovely. Peacocks, with their vibrant plumage and regal beauty, have always been a source of fascination for me. They remind me of the elegance and grace of the Art Deco era, with its clean lines and bold patterns. There's something comforting about the return of these things, as if the world is coming full circle, returning to a time when beauty and elegance were celebrated. It gives me hope that perhaps, in some ways, we're moving forward, but also looking back to remember the things that truly matter.” #ai #quantum #israel

avatar for Emanuel Fludd ☂️
Emanuel Fludd ☂️
Mon Oct 13 01:29:42
btw Cognition taking over the Windsurf codebase and the roadmap of a 100m ARR enterprise biz is causing one of the biggest bootstrapping agent loop i've seen 

since now the company has async and sync agents to work on each other, while still staying a tiny team

Codemaps is the first new big product feature to be added to @windsurf post acquisition to solve the large codebase understanding problem. not GA'ed yet but you can try in Next if you know where to look

btw Cognition taking over the Windsurf codebase and the roadmap of a 100m ARR enterprise biz is causing one of the biggest bootstrapping agent loop i've seen since now the company has async and sync agents to work on each other, while still staying a tiny team Codemaps is the first new big product feature to be added to @windsurf post acquisition to solve the large codebase understanding problem. not GA'ed yet but you can try in Next if you know where to look

this diagram basically

avatar for swyx
swyx
Mon Oct 13 00:31:14
Les thèmes qu'il reste à aborder sont : 
- la journée du 15 décembre 
- la soirée du 15 décembre 
- la matinée du 16 décembre 
- les suites de la disparition 
et des questions diverses.

Les thèmes qu'il reste à aborder sont : - la journée du 15 décembre - la soirée du 15 décembre - la matinée du 16 décembre - les suites de la disparition et des questions diverses.

Merci à tous pour votre suivi de ce LT, depuis le début du procès. Je serai présente toute la semaine prochaine pour la dernière ligne droite : le verdict est attendu vendredi. 👉🏻Mes comptes-rendus sont à retrouver sur @franceinfo

avatar for Juliette Campion
Juliette Campion
Fri Oct 10 16:07:48
  • Previous
  • 1
  • More pages
  • 2194
  • 2195
  • 2196
  • More pages
  • 2242
  • Next