Keep on to blur preview images; turn off to show them clearly

Though thinking again, for my speculative Flash to have like 16/3200 expert pattern, those experts would have to be TINY!!, and I don't think this is optimal on the other hand: this meme paper and the fact that Qwen3-Next already uses experts of that scale (if my math is right)


achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai

![RT @Gradio: 🏆 GOOGLE GEMINI sponsoring a MASSIVE prize!
$15K in API credits for the best Gemini-powered agent 🤯[WOW]
Build with multimoda… RT @Gradio: 🏆 GOOGLE GEMINI sponsoring a MASSIVE prize!
$15K in API credits for the best Gemini-powered agent 🤯[WOW]
Build with multimoda…](/_next/image?url=https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F1451191636810092553%2FkpM5Fe12_400x400.jpg&w=3840&q=75)
AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5YmrQ


Building @joinbond | prev @a16zcrypto | programmer | magician


Author (Husk, Crypto Confidential): https://t.co/L2COrV5QA3 Building: https://t.co/HMcbuBhTP3 Teaching: https://t.co/Dy0FsZHQaz


I also predict that granularity has a complex scaling law that is dependent on specifics of the architecture and training, and that larger models (Ant stops at 28B total) have higher optimal granularity than we use now
