LogoThread Easy
  • 탐색
  • 스레드 작성
LogoThread Easy

트위터 스레드의 올인원 파트너

© 2025 Thread Easy All Rights Reserved.

탐색

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

RT @proxy_vector: @shl lol we also replaced "zeitgeist" with "vibe shift" and somehow it carries more weight now

language evolves by getti…

RT @proxy_vector: @shl lol we also replaced "zeitgeist" with "vibe shift" and somehow it carries more weight now language evolves by getti…

Founder/CEO @Gumroad

avatar for Sahil Lavingia
Sahil Lavingia
Sun Nov 02 16:09:34
The Devil is in the Chat Template
=========================

Potatoes, potahtoes are not the same thing, contrary to how Phoebe Buffay would have you believe.

This is a must read blog if you are losing your sleep over AGI, or if you simply work on implementing open source AI models.

If you are a public intellectual on "AI security" and if you don't understand this blog, you are not qualified to comment on the subject matter. Read a book, as they say.

Now that I am done with my ranting, I want to tell you this blog will tell you what all things can go wrong that may end up making your frontier model "DUMB". 

LLM inference is very fragile. The inference engine has to present the input in a strict format (Chat template) to the LLM. If you deviate a bit, the outputs are not great. If not anything, it will reduce your AGI anxiety - technology is not going to be the skynet.

Thanks @vllm_project  and Lilian Weng. Here they narrate how  they worked on feedback from the Kimi team to improve tool call success rate for Kimi k2 model being run on vLLM to near 100%.

They did it very fast after getting the feedback. Kudos. Your community service is greatly appreciate 🧡💕

Key Lesson (quoting)
The Devil is in the Chat Template: The chat_template is the critical handshake between a model and its serving framework. When integrating a new model, meticulously validate every piece of its template logic against the framework’s specific behaviors and assumptions.

Peel Back the Abstraction Layer: High-level APIs like /chat/completions are convenient but can obscure root causes. When debugging, don’t hesitate to drop down to lower-level endpoints like /completions. Manually building the input is a powerful technique to isolate the problem.

A Pro-Tip: Token IDs are the Ultimate Ground Truth: For the most subtle issues, inspecting the final sequence of token IDs sent to the model is the only way to be certain. While I didn’t need to resort to this for the issues above, it’s a critical tool in the toolbox. Techniques like using the OpenAI-compatible API to return token IDs can be a lifesaver. For those interested, we also highlighted this in our Agent Lightning post.

Understand Framework Design Philosophy: vLLM’s strict handling of **kwargs is not a bug, but a deliberate security choice. Understanding these design decisions helps in quickly identifying the root cause rather than getting stuck on unexpected behavior.

The Open Ecosystem Challenge: Advanced features like a tool-call “Enforcer” are hallmarks of polished, proprietary services. Implementing these capabilities robustly and elegantly in open-source projects like vLLM is a vital challenge for the community to address.

The Devil is in the Chat Template ========================= Potatoes, potahtoes are not the same thing, contrary to how Phoebe Buffay would have you believe. This is a must read blog if you are losing your sleep over AGI, or if you simply work on implementing open source AI models. If you are a public intellectual on "AI security" and if you don't understand this blog, you are not qualified to comment on the subject matter. Read a book, as they say. Now that I am done with my ranting, I want to tell you this blog will tell you what all things can go wrong that may end up making your frontier model "DUMB". LLM inference is very fragile. The inference engine has to present the input in a strict format (Chat template) to the LLM. If you deviate a bit, the outputs are not great. If not anything, it will reduce your AGI anxiety - technology is not going to be the skynet. Thanks @vllm_project and Lilian Weng. Here they narrate how they worked on feedback from the Kimi team to improve tool call success rate for Kimi k2 model being run on vLLM to near 100%. They did it very fast after getting the feedback. Kudos. Your community service is greatly appreciate 🧡💕 Key Lesson (quoting) The Devil is in the Chat Template: The chat_template is the critical handshake between a model and its serving framework. When integrating a new model, meticulously validate every piece of its template logic against the framework’s specific behaviors and assumptions. Peel Back the Abstraction Layer: High-level APIs like /chat/completions are convenient but can obscure root causes. When debugging, don’t hesitate to drop down to lower-level endpoints like /completions. Manually building the input is a powerful technique to isolate the problem. A Pro-Tip: Token IDs are the Ultimate Ground Truth: For the most subtle issues, inspecting the final sequence of token IDs sent to the model is the only way to be certain. While I didn’t need to resort to this for the issues above, it’s a critical tool in the toolbox. Techniques like using the OpenAI-compatible API to return token IDs can be a lifesaver. For those interested, we also highlighted this in our Agent Lightning post. Understand Framework Design Philosophy: vLLM’s strict handling of **kwargs is not a bug, but a deliberate security choice. Understanding these design decisions helps in quickly identifying the root cause rather than getting stuck on unexpected behavior. The Open Ecosystem Challenge: Advanced features like a tool-call “Enforcer” are hallmarks of polished, proprietary services. Implementing these capabilities robustly and elegantly in open-source projects like vLLM is a vital challenge for the community to address.

AI @amazon. All views personal!

avatar for GDP
GDP
Sun Nov 02 16:09:28
RT @saltcod: When you see the PR in slack but it’s 4:50pm.

RT @saltcod: When you see the PR in slack but it’s 4:50pm.

📈 Leading Growth @GroqInc 📌 Prev @a16z @HubSpot @TheHustle 💻 Chronically online: https://t.co/AkbwhoTr0K 📘 Wrote https://t.co/w1DBDrOZdI 🎙 Podcast at @sydlis 👇

avatar for Steph Smith
Steph Smith
Sun Nov 02 16:08:31
just discovered that someone made an entire game about this dream https://t.co/nn2TGdocKm

just discovered that someone made an entire game about this dream https://t.co/nn2TGdocKm

I eat tornadoes for breakfast. i've been using this username for 15+ years and i will not give it to you. whatever/just dont call me late to dinner

avatar for the government man
the government man
Sun Nov 02 16:07:52
woman in the cafe talking about what chatgpt says about her texts with her boyfriend. “he needs therapy”. we are so cooked.

woman in the cafe talking about what chatgpt says about her texts with her boyfriend. “he needs therapy”. we are so cooked.

https://t.co/N3tfDNkGx4 | founder @trychroma

avatar for anton 🇺🇸
anton 🇺🇸
Sun Nov 02 16:06:09
大半夜马桶堵了 😂 水还满出了了,处理起来麻烦死了,不知道 2 点前能不能睡。

大半夜马桶堵了 😂 水还满出了了,处理起来麻烦死了,不知道 2 点前能不能睡。

程序员,开发过 2 个 iOS App,几个不知名的浏览器插件,几个小程序。喜欢看动画、漫画、轻小说和网文。希望早日成为自宅警備員。

avatar for Plusye
Plusye
Sun Nov 02 16:05:13
  • Previous
  • 1
  • More pages
  • 1331
  • 1332
  • 1333
  • More pages
  • 2137
  • Next