LogoThread Easy
  • Explore
  • Thread Compose
LogoThread Easy

Your All-in-One Twitter Thread Companion

© 2025 Thread Easy All Rights Reserved.

Explore

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

There is lot more to say on these beautiful visualizations. Please let me know of any errors or other cool things we should know. 

More at my blog: 

How to Get and Interpret GPU Memory Profiling
https://t.co/3uPt0S6RIp

There is lot more to say on these beautiful visualizations. Please let me know of any errors or other cool things we should know. More at my blog: How to Get and Interpret GPU Memory Profiling https://t.co/3uPt0S6RIp

Mostly research and code. If universe is an optimizer, what is its loss function? All opinions are my own.

avatar for Shital Shah
Shital Shah
Sun Dec 07 08:39:58
In above picture, the green stripe is storage needed to hold sqrt of grad^2 for Adam. Fused Adam computes updates to apply layer by layer causing stripes. At the top it applies those updates in one shot. The spike is temp buffer needed for division before applying update.

In above picture, the green stripe is storage needed to hold sqrt of grad^2 for Adam. Fused Adam computes updates to apply layer by layer causing stripes. At the top it applies those updates in one shot. The spike is temp buffer needed for division before applying update.

There is lot more to say on these beautiful visualizations. Please let me know of any errors or other cool things we should know. More at my blog: How to Get and Interpret GPU Memory Profiling https://t.co/3uPt0S6RIp

avatar for Shital Shah
Shital Shah
Sun Dec 07 08:39:58
For past two days I ended up staring at several GPU memory profile plots. There is not a lot of info available on what to make out of GPU memory profiles so I wrote a script to do it in one line and wrote a blog post. 

So what all those strange shapes mean?

Quick tutorial🧵

For past two days I ended up staring at several GPU memory profile plots. There is not a lot of info available on what to make out of GPU memory profiles so I wrote a script to do it in one line and wrote a blog post. So what all those strange shapes mean? Quick tutorial🧵

First, to profile any block of code, just drop this Python file in your project and use code like below to get profiling data: https://t.co/iYTzEFksVR

avatar for Shital Shah
Shital Shah
Sun Dec 07 08:39:54
What @OpenAI really needs is "Feedback" button. Right now LaTeX rendering is broken in their UX. I am seeing strange unnecessary tool calls like in below image. But there is no way to tell them. I'm literally being forced to move my prompts elsewhere. I guess they will notice drop in traffic few days later and scratch their heads about what happened.

What @OpenAI really needs is "Feedback" button. Right now LaTeX rendering is broken in their UX. I am seeing strange unnecessary tool calls like in below image. But there is no way to tell them. I'm literally being forced to move my prompts elsewhere. I guess they will notice drop in traffic few days later and scratch their heads about what happened.

Mostly research and code. If universe is an optimizer, what is its loss function? All opinions are my own.

avatar for Shital Shah
Shital Shah
Tue Dec 02 05:30:00
When people say they are trying to find themselves what they really mean to say is they had been RLing out in open world and still developing their value function.

When people say they are trying to find themselves what they really mean to say is they had been RLing out in open world and still developing their value function.

Mostly research and code. If universe is an optimizer, what is its loss function? All opinions are my own.

avatar for Shital Shah
Shital Shah
Sat Nov 29 21:10:14
Another cool work from our team! Karpathy’s original quest of controlling computer with pixels in, keyboard+mouse out now can be done with just 7B model!

Another cool work from our team! Karpathy’s original quest of controlling computer with pixels in, keyboard+mouse out now can be done with just 7B model!

Mostly research and code. If universe is an optimizer, what is its loss function? All opinions are my own.

avatar for Shital Shah
Shital Shah
Mon Nov 24 19:32:47
  • Previous
  • 1
  • 2
  • Next