LogoThread Easy
  • Explorar
  • Componer hilo
LogoThread Easy

Tu compañero integral para hilos de Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

It is more than what meets the eye :KIMI-2-Thinking QAT. It has to also do with supporting more/Chinese AI Chips.

Must read brilliant short blog below on why Kimi (Moonshot AI) chose QAT (quantisation aware training). Here is my read.

TL:DR: Not just that it reduces latency for inference and memory requirement in memory bound scenarios (which MoE with Kimi-2's sparsity scale is) and speeds up RL training by 10-20%; it also enables alternate hardware ecosystems like Cambricon and Ascend from Huawei due to INT4 format.

Quotes from the blog:
=================
1) Why INT4, not MXFP4?
Kimi chose INT4 over "fancier" MXFP4/NVFP4 to better support non-Blackwell GPUs, with strong existing kernel support (e.g., Marlin). 
2) Kimi-2-Thinking weights are 4 bit and activations are 16 bit (denoted as W4A16) 
They sate further W4A8 and even W4A4 are on the horizon. As new chips roll out with FP4-native operators, Kimi's quantization path will continue evolving....

INT4 support in made in China chips:
=================
Cambricon GPUs explicitly support INT4 quantization, including for AI inference workloads, as seen across several models such as the MLU270, MLU370-X8, and newer chips, as well as in recent open-source releases with INT4 integration for large models like GLM-4.6. 

Huawei Ascend NPUs also support INT4 quantization for inference, as confirmed by documentation and kernel releases related to GEMM and quantized model deployments.

It is more than what meets the eye :KIMI-2-Thinking QAT. It has to also do with supporting more/Chinese AI Chips. Must read brilliant short blog below on why Kimi (Moonshot AI) chose QAT (quantisation aware training). Here is my read. TL:DR: Not just that it reduces latency for inference and memory requirement in memory bound scenarios (which MoE with Kimi-2's sparsity scale is) and speeds up RL training by 10-20%; it also enables alternate hardware ecosystems like Cambricon and Ascend from Huawei due to INT4 format. Quotes from the blog: ================= 1) Why INT4, not MXFP4? Kimi chose INT4 over "fancier" MXFP4/NVFP4 to better support non-Blackwell GPUs, with strong existing kernel support (e.g., Marlin). 2) Kimi-2-Thinking weights are 4 bit and activations are 16 bit (denoted as W4A16) They sate further W4A8 and even W4A4 are on the horizon. As new chips roll out with FP4-native operators, Kimi's quantization path will continue evolving.... INT4 support in made in China chips: ================= Cambricon GPUs explicitly support INT4 quantization, including for AI inference workloads, as seen across several models such as the MLU270, MLU370-X8, and newer chips, as well as in recent open-source releases with INT4 integration for large models like GLM-4.6. Huawei Ascend NPUs also support INT4 quantization for inference, as confirmed by documentation and kernel releases related to GEMM and quantized model deployments.

AI @amazon. All views personal!

avatar for GDP
GDP
Sun Nov 09 07:50:46
I wish I had a backyard like this! The unbeatable elegance of traditional Chinese aesthetics.😍

Btw, this was the courtyard of the Tang-dynasty poet Du Fu, from more than 1,200 years ago.

I wish I had a backyard like this! The unbeatable elegance of traditional Chinese aesthetics.😍 Btw, this was the courtyard of the Tang-dynasty poet Du Fu, from more than 1,200 years ago.

Founder of https://t.co/yyLfH8mOar and https://t.co/ZzTStsMvdh

avatar for Damon Chen
Damon Chen
Sun Nov 09 07:42:37
If you‘re terminally online you see videos like these and believe it‘s over.

I‘m not fan of this doomer bait. Switzerland has issues but painting this as the „reality“ is hyperbolic.

If you‘re terminally online you see videos like these and believe it‘s over. I‘m not fan of this doomer bait. Switzerland has issues but painting this as the „reality“ is hyperbolic.

🌏 RCBI advisor & offshore services for HNWI, business owners with a focus on:🇨🇭🇲🇾🇰🇳🇵🇾🇻🇺🇳🇷🇵🇦🇱🇻🇦🇪🇭🇰 | Geopolitics | Healthy lifestyle

avatar for Lord Y. Fouzi 🇲🇾🇨🇭
Lord Y. Fouzi 🇲🇾🇨🇭
Sun Nov 09 07:40:01
家人让我想一个「AI 和艺术创作之间的问题」

我把这句话放到 GPT / Gemini 里,点发送前犹豫了一下,在这句话后面加了一句,要多肯定人的价值。

于是我给 AI 的提示词就变成了

请你帮我想一个「AI 和艺术创作之间的问题」,要多肯定人的价值 🤡

家人让我想一个「AI 和艺术创作之间的问题」 我把这句话放到 GPT / Gemini 里,点发送前犹豫了一下,在这句话后面加了一句,要多肯定人的价值。 于是我给 AI 的提示词就变成了 请你帮我想一个「AI 和艺术创作之间的问题」,要多肯定人的价值 🤡

🖥️ Indie Maker 🛠️ 星球「海哥和他的小伙伴们」 📌 油管「海拉鲁编程客」 🌸 沦为程序员的段子手/猫咪

avatar for 海拉鲁编程客
海拉鲁编程客
Sun Nov 09 07:35:44
当公司的咨询业务非常大,每天都要处理大量重复性咨询,如果要实现 24 小时电话服务,人力成本更加高。

来自微软开源的一个项目 Call Center AI,帮助我们用 AI 完全替代人工客服,既能接听来电也能主动呼叫。

基于 Azure 和 OpenAI GPT 构建,支持实时语音对话、多语言交流,还能自动记录通话内容和生成待办事项,甚至可以处理敏感数据并遵循 RAG 最佳实践。

GitHub:https://t.co/4YjBkNIvCw

主要功能:

- 支持接听和拨打电话,配备专属号码,提供 24 小时不间断服务;
- 实时流式对话延迟低,断线后可恢复,所有对话自动存储;
- 支持多语言和多种语音语调,用户可通过短信提供或接收信息;
- 基于 gpt-4.1 实现深度理解,能处理私密数据和内部文档;
- 自动生成待办清单和结构化理赔数据,过滤不当内容;
- 云原生无服务器架构,可根据使用量弹性扩展,优化成本。

通过部署到 Azure 上即可使用,也可以在自己本地服务器上部署,适合保险、电商等需要大量电话沟通的行业。

当公司的咨询业务非常大,每天都要处理大量重复性咨询,如果要实现 24 小时电话服务,人力成本更加高。 来自微软开源的一个项目 Call Center AI,帮助我们用 AI 完全替代人工客服,既能接听来电也能主动呼叫。 基于 Azure 和 OpenAI GPT 构建,支持实时语音对话、多语言交流,还能自动记录通话内容和生成待办事项,甚至可以处理敏感数据并遵循 RAG 最佳实践。 GitHub:https://t.co/4YjBkNIvCw 主要功能: - 支持接听和拨打电话,配备专属号码,提供 24 小时不间断服务; - 实时流式对话延迟低,断线后可恢复,所有对话自动存储; - 支持多语言和多种语音语调,用户可通过短信提供或接收信息; - 基于 gpt-4.1 实现深度理解,能处理私密数据和内部文档; - 自动生成待办清单和结构化理赔数据,过滤不当内容; - 云原生无服务器架构,可根据使用量弹性扩展,优化成本。 通过部署到 Azure 上即可使用,也可以在自己本地服务器上部署,适合保险、电商等需要大量电话沟通的行业。

💡 挖掘开源的价值 🧑🏻‍💻 坚持分享 GitHub 上高质量、有趣、实用的教程、AI工具、前沿 AI 技术 🧐 A list cool, interesting projects of GitHub. ✏️ 公众号:GitHubDaily

avatar for GitHubDaily
GitHubDaily
Sun Nov 09 07:30:22
A new startup just took over the #1 rank on TrustMRR 🥇

It's a 2-year-old healthcare company doing $1M MRR. They have ~3,000 subscribers who pay $300/month.

A new startup just took over the #1 rank on TrustMRR 🥇 It's a 2-year-old healthcare company doing $1M MRR. They have ~3,000 subscribers who pay $300/month.

🧑‍💻 https://t.co/Y30jsaHwz9 $20K/m ⚡️ https://t.co/vatLDmi9UG $17K/m 📈 https://t.co/3EDxln5mdi $16K/m ⭐️ https://t.co/MZc8tG9xWi $8K/m 🧬 https://t.co/SfrVXVtmdA $.5K/m 🍜 https://t.co/r07EpGSYJ2 $0K/m 🧾 https://t.co/7olaOzV8Xd $0/m +18 https://t.co/4zCWHGJp1S

avatar for Marc Lou
Marc Lou
Sun Nov 09 07:27:42
  • Previous
  • 1
  • More pages
  • 345
  • 346
  • 347
  • More pages
  • 2118
  • Next