発見
スレッド作成

Thread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Author handle

From date

To date

Blur thumbnails

Keep on to blur preview images; turn off to show them clearly

Making decisions with imperfect information at the frontier AI labs

Please follow @zpysky1125 - lead researcher Minimax AI - creators of M2, the current leading OSS model and first OSS interleaved thinking model to my knowledge.

The below blog by @zpysky1125 is a beautiful blog 💕if you are interested in what goes on in minds of people who train state of the art (SOTA) LLMs.

It discusses what kind of choices they are posed and how they make decision with imperfect information. The issue is you can not run too many experiments with LLM trainings, as each run is very expensive. This is unlike conventional ML.

Pengyu very honestly discusses why they had to discard, or rather put on the back bench, their earlier innovation of 'Linear attention' that they used for MiniMax M1 model, and go back to the 'Full attention' for M2.

They abandoned the technology tree they invented and had to discard with a heavy heart. They discuss it with great honesty. It is heartfelt.

Pengyu discusses advantages of the proven path in the short run - even if it may be less efficient. They also discusses in what situations they will revisit decision on Linear Attention. You will get so much to learn!!!!

This is a rare insight into minds of decision makers at the frontier labs. Let us please have more of this sharing American labs.

<TL:DR> Pick your battles wisely.

Thanks @Hailuo_AI and Pengyu (@zpysky1125 )

@dwarkesh_sp, @himanshustwts please have Chinese researchers (from Chinese labs) on your podcast 🇨🇳🇺🇸💕.

Making decisions with imperfect information at the frontier AI labs Please follow @zpysky1125 - lead researcher Minimax AI - creators of M2, the current leading OSS model and first OSS interleaved thinking model to my knowledge. The below blog by @zpysky1125 is a beautiful blog 💕if you are interested in what goes on in minds of people who train state of the art (SOTA) LLMs. It discusses what kind of choices they are posed and how they make decision with imperfect information. The issue is you can not run too many experiments with LLM trainings, as each run is very expensive. This is unlike conventional ML. Pengyu very honestly discusses why they had to discard, or rather put on the back bench, their earlier innovation of 'Linear attention' that they used for MiniMax M1 model, and go back to the 'Full attention' for M2. They abandoned the technology tree they invented and had to discard with a heavy heart. They discuss it with great honesty. It is heartfelt. Pengyu discusses advantages of the proven path in the short run - even if it may be less efficient. They also discusses in what situations they will revisit decision on Linear Attention. You will get so much to learn!!!! This is a rare insight into minds of decision makers at the frontier labs. Let us please have more of this sharing American labs. <TL:DR> Pick your battles wisely. Thanks @Hailuo_AI and Pengyu (@zpysky1125 ) @dwarkesh_sp, @himanshustwts please have Chinese researchers (from Chinese labs) on your podcast 🇨🇳🇺🇸💕.

AI @amazon. Open Source AI enjoyer. GPU rich, but loves the GPU poor. Pied piper to AI agents. Hill climber with RL. All views personal!

Wed Oct 29 17:33:40

RT @ashvinair: Some exciting new to share - I joined Cursor! We just shipped a model 🐆 It's really good - try it out!

https://t.co/kc1gmT3…

RT @ashvinair: Some exciting new to share - I joined Cursor! We just shipped a model 🐆 It's really good - try it out! https://t.co/kc1gmT3…

@cursor_ai, created https://t.co/n8cSXZO4VH, started @SupermavenAI and @Tabnine, formerly @OpenAI

Wed Oct 29 17:32:06

RT @sea_snell: It has been a joy working on composer with the team and watching all the pieces come together over the past few months

I ho…

RT @sea_snell: It has been a joy working on composer with the team and watching all the pieces come together over the past few months I ho…

@cursor_ai, created https://t.co/n8cSXZO4VH, started @SupermavenAI and @Tabnine, formerly @OpenAI

Wed Oct 29 17:31:52

RT @_awettig: We did a thing!

RT @_awettig: We did a thing!

@cursor_ai, created https://t.co/n8cSXZO4VH, started @SupermavenAI and @Tabnine, formerly @OpenAI

Wed Oct 29 17:31:39

RT @srush_nlp: Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and…

RT @srush_nlp: Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and…

@cursor_ai, created https://t.co/n8cSXZO4VH, started @SupermavenAI and @Tabnine, formerly @OpenAI

Wed Oct 29 17:31:26

It's my greatest joy to receive an email that says some version of "You are invited to this event but do not need to come," but when I send the email "You are invited but don't need to come" to people, they absolutely freak out that there is hidden meaning or guidance. 😇

It's my greatest joy to receive an email that says some version of "You are invited to this event but do not need to come," but when I send the email "You are invited but don't need to come" to people, they absolutely freak out that there is hidden meaning or guidance. 😇

GP @a16z — Building American Dynamism 🇺🇸 — Anthropologist — Formerly Founder/CEO @OpenDNS — Lokah Samastah Sukhino Bhavantu

David Ulevitch 🇺🇸

Wed Oct 29 17:31:13

Previous
1
1953
1954
1955
2137
Next