LogoThread Easy
  • Explorer
  • Composer un thread
LogoThread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

Torch C++ & CUDA optimization dev all day today + tomorrow, streaming here/yt/twitch. The goal is to
1) Make PufferLib do RL at 10M steps/second
2) Eliminate hard to profile sources of potential bottlenecks
3) See how simple we can make it
Some questions for GPU devs below

Torch C++ & CUDA optimization dev all day today + tomorrow, streaming here/yt/twitch. The goal is to 1) Make PufferLib do RL at 10M steps/second 2) Eliminate hard to profile sources of potential bottlenecks 3) See how simple we can make it Some questions for GPU devs below

Q: I have small nets and need to reduce kernel launches. My options are 1) suffer through cudagraph hell 2) write some big fused kernels or 3) both. Fused kernels seem cool, but NVIDIA's cublas matmul isn't open source. What do?

avatar for Joseph Suarez 🐡
Joseph Suarez 🐡
Fri Nov 07 13:09:05
Q: My instinct is to avoid extra libraries unless absolutely necessary. Really, really don't like Triton from what I see, for instance (though I'd be less annoyed if it would generate the kernels once which I could then include statically in my project). I do need some level of tile size tuning. What do?

Q: My instinct is to avoid extra libraries unless absolutely necessary. Really, really don't like Triton from what I see, for instance (though I'd be less annoyed if it would generate the kernels once which I could then include statically in my project). I do need some level of tile size tuning. What do?

I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.

avatar for Joseph Suarez 🐡
Joseph Suarez 🐡
Fri Nov 07 13:09:05
Q: So far, fp32 kerns are pretty easy. Pretty much just writing C. What's the easiest way to do TF32, FP16, BF16 support without making a bloody mess?

Q: So far, fp32 kerns are pretty easy. Pretty much just writing C. What's the easiest way to do TF32, FP16, BF16 support without making a bloody mess?

Q: My instinct is to avoid extra libraries unless absolutely necessary. Really, really don't like Triton from what I see, for instance (though I'd be less annoyed if it would generate the kernels once which I could then include statically in my project). I do need some level of tile size tuning. What do?

avatar for Joseph Suarez 🐡
Joseph Suarez 🐡
Fri Nov 07 13:09:05
I just find it weird when people set up a camera to come online and then cry. I see TikToks of it, it just never sits well, and half the time, it's borderline psycho people.

I just find it weird when people set up a camera to come online and then cry. I see TikToks of it, it just never sits well, and half the time, it's borderline psycho people.

Founder | Author | Speaker Building @beltstripe. Healtech/EdTech/Agric I'm Not The Man Of Your Dreams. Your Imagination Wasn't This Great.

avatar for Sani Yusuf
Sani Yusuf
Fri Nov 07 13:08:43
潮流周刊居然已经 243 期了,到今年差不多持续更新 5 年了,主要是更新看到的工程师好用的工具,开源产品,以及我的随便看看,还有随便说说,欢迎新朋友关注和订阅 RSS,话说你是什么时候知道潮流周刊的?
https://t.co/8abZ9vxSJk

潮流周刊居然已经 243 期了,到今年差不多持续更新 5 年了,主要是更新看到的工程师好用的工具,开源产品,以及我的随便看看,还有随便说说,欢迎新朋友关注和订阅 RSS,话说你是什么时候知道潮流周刊的? https://t.co/8abZ9vxSJk

Father of Pake • MiaoYan • Mole • XRender

avatar for Tw93
Tw93
Fri Nov 07 13:07:59
RT @ElKomnrs: @skominers Wordle 1,602 1/6
🙏

🟩🟩🟩🟩🟩

https://t.co/KvFo35FTy0

RT @ElKomnrs: @skominers Wordle 1,602 1/6 🙏 🟩🟩🟩🟩🟩 https://t.co/KvFo35FTy0

Market Design/Entrepreneurship Professor @HarvardHBS & Faculty Affiliate @Harvard Economics; Research @a16zcrypto; Editor @restatjournal; Econ @Quora; … | #QED

avatar for Scott Kominers
Scott Kominers
Fri Nov 07 13:05:15
  • Previous
  • 1
  • More pages
  • 567
  • 568
  • 569
  • More pages
  • 2118
  • Next