探索 | Thread Easy - Twitterスレッドを展開 | リーダー・要約・作成

I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now

Though thinking again, for my speculative Flash to have like 16/3200 expert pattern, those experts would have to be TINY!!, and I don't think this is optimal on the other hand: this meme paper and the fact that Qwen3-Next already uses experts of that scale (if my math is right)

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Wed Nov 05 20:17:57

i truly do not care what koding agent you use, but if you dont have a slackbot that you can just @ and it converts your slack threads into PRs you are living in the caveman era and actively not even giving AI a chance out of spite

achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai

swyx

Wed Nov 05 20:15:31

RT @Gradio: 🏆 GOOGLE GEMINI sponsoring a MASSIVE prize! $15K in API credits for the best Gemini-powered agent 🤯[WOW] Build with multimoda…

AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5YmrQ

AK

Wed Nov 05 20:12:36

LLLFFFGGGGGGGGGG If you’re a creator who shares this vision, DM me or apply to @joinbond and help shape the future of creator–fan relationships.

Building @joinbond | prev @a16zcrypto | programmer | magician

Michael Blau

Wed Nov 05 20:11:15

My kind of healthcare company

Author (Husk, Crypto Confidential): https://t.co/L2COrV5QA3 Building: https://t.co/HMcbuBhTP3 Teaching: https://t.co/Dy0FsZHQaz

Nat Eliason

Wed Nov 05 20:09:47

Very aggressive and possibly very retarded hunch: it's Flash 3, and it's 1.2T total 12B active Pro is like 30-3200 their systems allow that, Google is unmatched in penny-pinching, and we (thanks @AntLingAGI) know that >99% sparsity continues to deliver efficiency leverage.

I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Wed Nov 05 20:08:32

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now

i truly do not care what koding agent you use, but if you dont have a slackbot that you can just @ and it converts your slack threads into PRs you are living in the caveman era and actively not even giving AI a chance out of spite

RT @Gradio: 🏆 GOOGLE GEMINI sponsoring a MASSIVE prize! $15K in API credits for the best Gemini-powered agent 🤯[WOW] Build with multimoda…

LLLFFFGGGGGGGGGG If you’re a creator who shares this vision, DM me or apply to @joinbond and help shape the future of creator–fan relationships.

My kind of healthcare company

Very aggressive and possibly very retarded hunch: it's Flash 3, and it's 1.2T total 12B active Pro is like 30-3200 their systems allow that, Google is unmatched in penny-pinching, and we (thanks @AntLingAGI) know that >99% sparsity continues to deliver efficiency leverage.