Keep on to blur preview images; turn off to show them clearly

How Kimi-K2-Thinking compares to MiniMax M2 on size? 2/n 1. MiniMax M2 has 10B active and 230B total parameters with full attention. 2. Kimi K2 has 35B active and 1 trillion total parameters. Both have most of their weights in 8bits. That mean M2 will be much easier to host and its KV cache will be much more compact. MiniMax M2 uses full attention, would be interesting to see if Kimi-M2 has done something interesting to the attention layer. (for this calculations I am assuming Kimi-K2-Thinking is based on Kimi-K2-Base)


Indie hacker, solopreneur building a portfolio of products & SaaS: 🔌 https://t.co/rLuxXYrO8V 🟩 https://t.co/f93dEmKKZU 📋 https://t.co/zmLERuwToj ✍️ https://t.co/MW9HQLABxB 🏡 https://t.co/EtGIs3qLMx


AI @amazon. All views personal!


~20 yrs in web-dev, now mostly Laravel. My Laravel courses: https://t.co/HRUAJdMRZL My Youtube channel: https://t.co/qPQAkaov2F


Founder | Author | Speaker Building @beltstripe. Healtech/EdTech/Agric I'm Not The Man Of Your Dreams. Your Imagination Wasn't This Great.


Founder 📈 @parqetapp Host of 🎙 @minimalempires Prev. @stripe
