LogoThread Easy
  • 探索
  • 撰写 Thread
LogoThread Easy

您的一体化 Twitter 线程助手

© 2025 Thread Easy All Rights Reserved.

探索

最新在前,按卡片方式浏览线程

开启时会模糊预览图,关闭后正常显示

all features from before are still there! and accessible from the cloud like shared space or shortcuts menu :)

all features from before are still there! and accessible from the cloud like shared space or shortcuts menu :)

curious guy creating things @ https://t.co/HXWladhJaA - up and coming wife guy

avatar for jack friks
jack friks
Sat Nov 08 15:31:19
i wonder if turning my couples app into a pet pig simulator was a good idea or it was the most tragic pivot of my entire app development career

i wonder if turning my couples app into a pet pig simulator was a good idea or it was the most tragic pivot of my entire app development career

all features from before are still there! and accessible from the cloud like shared space or shortcuts menu :)

avatar for jack friks
jack friks
Sat Nov 08 15:30:37
RT @indie_maker_fox: 这书买晚了!我干嘛要等双十一啊!

RT @indie_maker_fox: 这书买晚了!我干嘛要等双十一啊!

🚀 The best AI SaaS boilerplate - https://t.co/VyNtTs0jSX 🔥 The best directory boilerplate with AI - https://t.co/wEvJ1Dd8aR 🎉 https://t.co/zubXJCoY92 & https://t.co/tfQf8T7gGF & https://t.co/TqRkfQj41f

avatar for Fox@MkSaaS.com
Fox@MkSaaS.com
Sat Nov 08 15:30:08
THIS IS NOT FINANCIAL ADVICE

THIS IS NOT FINANCIAL ADVICE

curious guy creating things @ https://t.co/HXWladhJaA - up and coming wife guy

avatar for jack friks
jack friks
Sat Nov 08 15:29:02
btw these payouts are 100% taxable income... remember that to avoid getting a $1000+ tax bill end of year and being like "wait but i spent that money"

been there, done that!

btw these payouts are 100% taxable income... remember that to avoid getting a $1000+ tax bill end of year and being like "wait but i spent that money" been there, done that!

THIS IS NOT FINANCIAL ADVICE

avatar for jack friks
jack friks
Sat Nov 08 15:28:52
A clever trick used by the impressive new Kimi 2 model called “quantization aware training,” or QAT.

It’s philosophically similar to dropout. In dropout, you don’t want the model to rely on other neurons co-adapting, since it makes things brittle. So you intentionally blank some of them out during training to avoid that reliance.

Here, you don’t want the model relying on precision for inference that will be lost in the final quantization after training completes, so you intentionally lose the precision during training to avoid that reliance. 

The model is thus forced to never depend on critically important information being stored in the low order bits of the weights. 

But you need that accuracy to keep the gradients flowing well during optimization, so they fake it by keeping full precision weights just for gradient computation while simulating INT4 effects in the forward pass.

A clever trick used by the impressive new Kimi 2 model called “quantization aware training,” or QAT. It’s philosophically similar to dropout. In dropout, you don’t want the model to rely on other neurons co-adapting, since it makes things brittle. So you intentionally blank some of them out during training to avoid that reliance. Here, you don’t want the model relying on precision for inference that will be lost in the final quantization after training completes, so you intentionally lose the precision during training to avoid that reliance. The model is thus forced to never depend on critically important information being stored in the low order bits of the weights. But you need that accuracy to keep the gradients flowing well during optimization, so they fake it by keeping full precision weights just for gradient computation while simulating INT4 effects in the forward pass.

Former Quant Investor, now building @lumera (formerly called Pastel Network) | My Open Source Projects: https://t.co/9qbOCDlaqM

avatar for Jeffrey Emanuel
Jeffrey Emanuel
Sat Nov 08 15:24:50
  • Previous
  • 1
  • More pages
  • 426
  • 427
  • 428
  • More pages
  • 2127
  • Next