Keep on to blur preview images; turn off to show them clearly

Q: I have small nets and need to reduce kernel launches. My options are 1) suffer through cudagraph hell 2) write some big fused kernels or 3) both. Fused kernels seem cool, but NVIDIA's cublas matmul isn't open source. What do?


I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.


Q: My instinct is to avoid extra libraries unless absolutely necessary. Really, really don't like Triton from what I see, for instance (though I'd be less annoyed if it would generate the kernels once which I could then include statically in my project). I do need some level of tile size tuning. What do?


Founder | Author | Speaker Building @beltstripe. Healtech/EdTech/Agric I'm Not The Man Of Your Dreams. Your Imagination Wasn't This Great.


Father of Pake • MiaoYan • Mole • XRender


Market Design/Entrepreneurship Professor @HarvardHBS & Faculty Affiliate @Harvard Economics; Research @a16zcrypto; Editor @restatjournal; Econ @Quora; … | #QED
