LogoThread Easy
  • Explorar
  • Criar thread
LogoThread Easy

Seu parceiro completo para threads do Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

Out of the box, no game-specific RL or tuning? No way. I'd be genuinely shocked. As an interaction problem, this is way, way harder than driving or humanoid control. And most of the game LLM results so far have been bullshit.

Voyager for Minecraft calls actions like "go mine a coal" and tons of public example scripts. This would be a standalone from-scratch RL task.

Chess just so happens to record games in the exact text format that you'd want for LLMs. And there are lots of them.

When you don't have that but have access to a sim? Small model RL crushes. We have several examples of super-human play trained in seconds on a single GPU on https://t.co/wPfmdJfe1d. It's not just games either. Most fancy sims we build for clients end up being easier to RL than even relatively simple games.

In my mind, the best result of our field by far was OpenAI five. Beat top pros at DoTA with ~1000 GPUs. You could probably do it with 64-256 H100s now. The CPUs are a real killer, but hey, that's why we build fast custom sims for problems we really care about. We continually see RL come up with solutions that I don't see how in hell an LLM will ever just zero-shot. Interaction is fundamental to intelligence.

If you RL finetune an LLM by playing the game? Sure, and it will be more sample efficient than training from scratch. But vastly compute inefficient. We've got pretty good evidence that scaling laws in RL tend towards way smaller model sizes and way more data. This is the bet I've made in my own research, and so far so good.

So what if you really wanted to use Grok for an impressive RL result? My bet is on bridging the gap between giant and tiny models. Take almost all (>>99.9%) of actions during training with the small model. Play lots of games. Use the large model to guide exploration etc. There are some results on this already in games and robotics, but nothing really satisfactory yet. I'm not working on it because imo the small-model RL side has way more potential right now w/ a very clear path forward at even small scale.

Out of the box, no game-specific RL or tuning? No way. I'd be genuinely shocked. As an interaction problem, this is way, way harder than driving or humanoid control. And most of the game LLM results so far have been bullshit. Voyager for Minecraft calls actions like "go mine a coal" and tons of public example scripts. This would be a standalone from-scratch RL task. Chess just so happens to record games in the exact text format that you'd want for LLMs. And there are lots of them. When you don't have that but have access to a sim? Small model RL crushes. We have several examples of super-human play trained in seconds on a single GPU on https://t.co/wPfmdJfe1d. It's not just games either. Most fancy sims we build for clients end up being easier to RL than even relatively simple games. In my mind, the best result of our field by far was OpenAI five. Beat top pros at DoTA with ~1000 GPUs. You could probably do it with 64-256 H100s now. The CPUs are a real killer, but hey, that's why we build fast custom sims for problems we really care about. We continually see RL come up with solutions that I don't see how in hell an LLM will ever just zero-shot. Interaction is fundamental to intelligence. If you RL finetune an LLM by playing the game? Sure, and it will be more sample efficient than training from scratch. But vastly compute inefficient. We've got pretty good evidence that scaling laws in RL tend towards way smaller model sizes and way more data. This is the bet I've made in my own research, and so far so good. So what if you really wanted to use Grok for an impressive RL result? My bet is on bridging the gap between giant and tiny models. Take almost all (>>99.9%) of actions during training with the small model. Play lots of games. Use the large model to guide exploration etc. There are some results on this already in games and robotics, but nothing really satisfactory yet. I'm not working on it because imo the small-model RL side has way more potential right now w/ a very clear path forward at even small scale.

I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.

avatar for Joseph Suarez 🐡
Joseph Suarez 🐡
Tue Nov 25 16:55:04
RT @rohangilkes: 💰Startup revenue vs Twitter followers!

Why am I off the charts and on here 4 times?😲  

15 Reasons I'd say. 

(Wrote this…

RT @rohangilkes: 💰Startup revenue vs Twitter followers! Why am I off the charts and on here 4 times?😲 15 Reasons I'd say. (Wrote this…

I build stuff. On my way to making $1M 💰 My projects 👇

avatar for Florin Pop 👨🏻‍💻
Florin Pop 👨🏻‍💻
Tue Nov 25 16:53:52
RT @_mchenco: a new image model has entered the playground...
https://t.co/7NIrZVSgsm

RT @_mchenco: a new image model has entered the playground... https://t.co/7NIrZVSgsm

vp developers & ai @cloudflare ✨ and how does that error make you feel?

avatar for rita kozlov 🐀
rita kozlov 🐀
Tue Nov 25 16:51:43
RT @shl: A one-dimensional identity puts all of your happiness in one basket.

Fill your life with friends, family, lore, love, books, hobb…

RT @shl: A one-dimensional identity puts all of your happiness in one basket. Fill your life with friends, family, lore, love, books, hobb…

Father. Formerly @Gumroad. Working on something old.

avatar for Sahil Lavingia
Sahil Lavingia
Tue Nov 25 16:50:29
入住一家深圳的酒店,广东电信的WiFi上网太拉胯了。

手机wifi连上,弹出登录页面,输入手机,发送短信,没有自动获取短信。

切换到短信查看验证码,登录界面不见了,又得重头输入...

好在第二次发送的短信还有效,终于搞定。

然后,发现Mac电脑连上wifi也不弹出认证页面。

修改DNS为 8.8.8.8 ,检查关掉所有代理。

输入192.168.1.1 强制跳转认证页面,输入手机号验证码登录。

终于上网成功。

入住一家深圳的酒店,广东电信的WiFi上网太拉胯了。 手机wifi连上,弹出登录页面,输入手机,发送短信,没有自动获取短信。 切换到短信查看验证码,登录界面不见了,又得重头输入... 好在第二次发送的短信还有效,终于搞定。 然后,发现Mac电脑连上wifi也不弹出认证页面。 修改DNS为 8.8.8.8 ,检查关掉所有代理。 输入192.168.1.1 强制跳转认证页面,输入手机号验证码登录。 终于上网成功。

喜欢摇滚乐、爱钓鱼的PM 网站:https://t.co/vnUpLt752o

avatar for 向阳乔木
向阳乔木
Tue Nov 25 16:50:04
> The model couples the Mistral-3 24B parameter vision-language model [Mistral Small 3.2 apparently] with a rectified flow transformer

> The model couples the Mistral-3 24B parameter vision-language model [Mistral Small 3.2 apparently] with a rectified flow transformer

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Tue Nov 25 16:49:19
  • Previous
  • 1
  • More pages
  • 2382
  • 2383
  • 2384
  • More pages
  • 5635
  • Next