LogoThread Easy
  • 発見
  • スレッド作成
LogoThread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now

I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now

Though thinking again, for my speculative Flash to have like 16/3200 expert pattern, those experts would have to be TINY!!, and I don't think this is optimal on the other hand: this meme paper and the fact that Qwen3-Next already uses experts of that scale (if my math is right)

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Wed Nov 05 20:17:57
i truly do not care what koding agent you use, but if you dont have a slackbot that you can just @ and it converts your slack threads into PRs you are living in the caveman era and actively not even giving AI a chance out of spite

i truly do not care what koding agent you use, but if you dont have a slackbot that you can just @ and it converts your slack threads into PRs you are living in the caveman era and actively not even giving AI a chance out of spite

achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai

avatar for swyx
swyx
Wed Nov 05 20:15:31
RT @Gradio: 🏆 GOOGLE GEMINI sponsoring a MASSIVE prize!

$15K in API credits for the best Gemini-powered agent 🤯[WOW]

Build with multimoda…

RT @Gradio: 🏆 GOOGLE GEMINI sponsoring a MASSIVE prize! $15K in API credits for the best Gemini-powered agent 🤯[WOW] Build with multimoda…

AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5YmrQ

avatar for AK
AK
Wed Nov 05 20:12:36
LLLFFFGGGGGGGGGG

If you’re a creator who shares this vision, DM me or apply to @joinbond and help shape the future of creator–fan relationships.

LLLFFFGGGGGGGGGG If you’re a creator who shares this vision, DM me or apply to @joinbond and help shape the future of creator–fan relationships.

Building @joinbond | prev @a16zcrypto | programmer | magician

avatar for Michael Blau
Michael Blau
Wed Nov 05 20:11:15
My kind of healthcare company

My kind of healthcare company

Author (Husk, Crypto Confidential): https://t.co/L2COrV5QA3 Building: https://t.co/HMcbuBhTP3 Teaching: https://t.co/Dy0FsZHQaz

avatar for Nat Eliason
Nat Eliason
Wed Nov 05 20:09:47
Very aggressive and possibly very retarded hunch:
it's Flash 3, and it's 1.2T total 12B active
Pro is like 30-3200
their systems allow that, Google is unmatched in penny-pinching, and we (thanks @AntLingAGI) know that >99% sparsity continues to deliver efficiency leverage.

Very aggressive and possibly very retarded hunch: it's Flash 3, and it's 1.2T total 12B active Pro is like 30-3200 their systems allow that, Google is unmatched in penny-pinching, and we (thanks @AntLingAGI) know that >99% sparsity continues to deliver efficiency leverage.

I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Wed Nov 05 20:08:32
  • Previous
  • 1
  • More pages
  • 829
  • 830
  • 831
  • More pages
  • 2131
  • Next