LogoThread Easy
  • Explorar
  • Componer hilo
LogoThread Easy

Tu compañero integral para hilos de Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

RT @nathanbarrydev: Added confidence-aware parallel decoding to my tiny text diffusion model!

Before, we had “scheduled iterative refineme…

RT @nathanbarrydev: Added confidence-aware parallel decoding to my tiny text diffusion model! Before, we had “scheduled iterative refineme…

RL and efficient distributed pretraining • eXperiments lab • memes and training lores

avatar for tokenbender
tokenbender
Mon Nov 10 17:59:25
Looking forward to chatting with @anthemos tomorrow, where we'll be diving into how @Zumper is leveraging AI as a leading housing marketplace. 

We'll also be discussing topics like AEO, so this is a group chat you won't want to miss. DM me if you'd like to join in!

Looking forward to chatting with @anthemos tomorrow, where we'll be diving into how @Zumper is leveraging AI as a leading housing marketplace. We'll also be discussing topics like AEO, so this is a group chat you won't want to miss. DM me if you'd like to join in!

Founder of Everything Marketplaces (@marketplaceshq). Always working with & investing in marketplaces at https://t.co/HgyZIpWIEQ

avatar for Yoroomie
Yoroomie
Mon Nov 10 17:54:05
- “wen K3?”
- “before sam's trillion-dollar data center is built” 😂

AMA link: https://t.co/6yZSsjQXvM

- “wen K3?” - “before sam's trillion-dollar data center is built” 😂 AMA link: https://t.co/6yZSsjQXvM

Co-founder & CTO @hyperbolic_labs cooking fun AI systems. Prev: OctoAI (acquired by @nvidia) building Apache TVM, PhD @ University of Washington.

avatar for Yuchen Jin
Yuchen Jin
Mon Nov 10 17:52:00
Kimi AMA on K2 Thinking:

1. $4.6M training cost is not an official number
2. Trained on H800s (nerfed H100s)
3. KDA (Kimi Delta Attention) hybrids with NoPE MLA perform better than full MLA with RoPE
4. Muon scales well to 1T parameters. “there are tens of optimizers and architectures that do not survive the grill.”
5. Kimi K2 will have vision
6. K2 Thinking is natively INT4 to be friendlier to non-Blackwell GPUs while leveraging the existing int4 inference marlin kernels.

Kimi AMA on K2 Thinking: 1. $4.6M training cost is not an official number 2. Trained on H800s (nerfed H100s) 3. KDA (Kimi Delta Attention) hybrids with NoPE MLA perform better than full MLA with RoPE 4. Muon scales well to 1T parameters. “there are tens of optimizers and architectures that do not survive the grill.” 5. Kimi K2 will have vision 6. K2 Thinking is natively INT4 to be friendlier to non-Blackwell GPUs while leveraging the existing int4 inference marlin kernels.

- “wen K3?” - “before sam's trillion-dollar data center is built” 😂 AMA link: https://t.co/6yZSsjQXvM

avatar for Yuchen Jin
Yuchen Jin
Mon Nov 10 17:49:32
Assuming the model companies can touch every market and, more importantly, *do it well*, and that the advantage to being multi model is limited feels a lot like sitting out the last 20 years because Google/Amazon/Facebook could be the only winners

Assuming the model companies can touch every market and, more importantly, *do it well*, and that the advantage to being multi model is limited feels a lot like sitting out the last 20 years because Google/Amazon/Facebook could be the only winners

partner @a16z // saas + b2b fintech // strong opinions on 🍕

avatar for Seema Amble
Seema Amble
Mon Nov 10 17:48:10
RT @jamisonfox: Big milestone for @GammaApp today: we’ve raised a $68M Series B led by Sarah Wang at @a16z.

It’s been humbling to see this…

RT @jamisonfox: Big milestone for @GammaApp today: we’ve raised a $68M Series B led by Sarah Wang at @a16z. It’s been humbling to see this…

Growth investing @a16z

avatar for Steph Zhang
Steph Zhang
Mon Nov 10 17:37:00
  • Previous
  • 1
  • More pages
  • 253
  • 254
  • 255
  • More pages
  • 2127
  • Next