探索 | Thread Easy - 展开 Twitter 线程｜阅读、总结与创作

RT @aidanshandle: I built an art instillation for a halloween party this year! Meet Aura Leo: a 1920s lion sculpture reborn for Halloween.…

investing @a16z // curating https://t.co/ssslqn6eo7

Ryan McEntush

Mon Nov 03 03:09:42

[On using Continuous Latent Space Vectors in the context windows of Transformers and LLMs] #SundayHarangue There is a lot of chatter about how vectors from continuous latent space can make transformers solve problems efficiently. Some of these arguments run counter to conservation of computational complexity, IMHO. The arguments/analogies revolve around viewing these tokens as "superposition" (think union) of discrete tokens. As background, transformers operate in a latent space L s.t. every (linguistic) token corresponds to a vector in L. This mapping is however one sided: not every vector in L corresponds to a unique token. You could however see these vectors (that don't have unique token mapping) as a linear combination of token-corresponding vectors. In this way, they can be seen as a union/superposition of those tokens. It should be rather obvious that the operations of the transformer see entities in the the context window as just vectors from the embedding space. In particular, the forward pass operation doesn't really care whether the vectors being processed have unique tokens corresponding to them or not. This means as far as the transformer operation is concerned, the the context window can have both "token vectors" (i.e., embedding vectors that correspond to unique tokens) and "latent vectors" (i.e., embedding vectors that don't correspond to unique tokens). As mentioned above, these latent vectors can be seen as linear combinations of the token vectors. One obvious use of this flexibility is that the intermediate tokens emitted by the transformer can well be these latent vectors; only the solution tokens (that are being passed onto the end users) need to be token vectors. Indeed, as we argue in https://t.co/f6E3c2j4dm (https://t.co/t4uYw5WTmD), as long as intermediate tokens don't seem to have any end-user semantics anyway, allowing for them to be any vector from latent space provides significantly more flexibility for learning appropriate prompt augmentations (c.f. https://t.co/jl0LyWJUys). Another argument that has been made about the use of latent vectors in the intermediate tokens is as a way to "improve efficiency of solving the underlying problems." Now, I am pretty skeptical about viewing LLMs as solving problems. Our work shows, for example, that there is little connection between the length of the intermediate tokens and the underlying complexity of the problem (c.f. https://t.co/UKgCwgHKeQ), suggesting that it is more indicative of attempts to bridge the training distribution and the test instance. Nevertheless, if we are into looking at transformers as ways of "computing solutions" (even if that is not what is actually happening in pre-trained LLMs), then letting transformers operate on latent vectors vs. token vectors seems to correspond to doing computation on disjunctive representations of entities rather than on single entities. Now, operating on disjunctive representations can improve average case efficiency over specific distributions, but not the worst case complexity. As a sanity test, abstraction and hierarchy can be viewed as operating on disjunctive representations, and neither change the worst case computational complexity of the problem; see https://t.co/aXreC5YKPN or https://t.co/UDzu2Qp7WK for arguments on planning. This is why, I am skeptical of claims that transformers with latent tokens can provably increase efficiency in all cases. For example, a recent paper https://t.co/4oQzEUIFPk argues that transformers with latent tokens can solve graph reachability in time proportional to the diameter of the graph (and throws in some citations to quantum superposition to boot!). This doesn't make sense--certainly not in the worst case--without violating conservation of complexity (or changing what it means to "solve" reachability; the paper's empirical results seem to be happy with less than 100% accuracy, for example). When we were discussing this paper in our group meeting on Friday, I told my students about the analogy with Graphplan planning algorithm--which speeds up STRIPS planning (which is closely connected to reachability). Many years back, we showed that Graphplan's speedups can be understood in terms of doing projection over sets of states rather than individual states. However, if you operate directly over union representations, you can get to a point where the representation might look like it is reaching the goal state, but it may not be possible to actually extract a valid path! (In the case of Graphplan, this extraction involves a decoding step that is exponential in cost, and if it fails, the projection over disjunctive states continues). This is illustrated in the figure below 👇and the original paper at https://t.co/s20cFEOfQk (or Figure 3 and the accompanying discussion in https://t.co/YqN0fh7vp6). tldr; I do believe that latent tokens can considerably increase the flexibility of prompt augmentations that LLMs can learn in post-training, but don't quite agree with the "they reduce the complexity of the problems under consideration".

AI researcher & teacher @SCAI_ASU. Former President of @RealAAAI; Chair of @AAAS Sec T. Here to tweach #AI. YouTube Ch: https://t.co/4beUPOmf6y Bsky: rao2z

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

Mon Nov 03 03:09:19

周五周六两天在 RTE 2025 现场和很多人聊对话式 AI，Voice AI，语音 AI 硬件，语音 AI 交互的现状和未来有资深技术党，有产品经理，有投资人有观望党，还有很多零基础收获颇多，本周陆续发一些见闻和想法

代码/设计/运营 @TenFramework 一个开源语音 AI 框架｜日常分享 AI ｜English @elliotchen200

艾略特

Mon Nov 03 03:08:57

RT @AIEMiami: The world's leading AI Engineering conference is coming to Miami! AI Engineer: Miami, April 20–21. Two days. One track. A c…

achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai

swyx

Mon Nov 03 03:08:03

RT @doodlestein: @growing_daniel One Flew Over the Cuckoo’s Nest and the response to it had absolutely disastrous consequences for American…

Former Quant Investor, now building @lumera (formerly called Pastel Network) | My Open Source Projects: https://t.co/9qbOCDlaqM

Jeffrey Emanuel

Mon Nov 03 03:07:15

RT @Jason_qeb: @indie_maker_fox 这种都会被钓鱼执法掉，别忘我怎么知道的，流程是这样。 1.发现市场上有收费api 接口，并形成一定市场规模。 2.购买这个接口，然后传一个特殊标记的参数用于搜索。 3.服务器日志中找到这个搜索记录对应的账号toke…

从投资领域转到创业：找工作、找面试题、改简历、模拟面试. 创业（冷启动）｜AI , AIGC ｜安全技术｜RAG | 时空智能 | 认知心理学｜智能体｜生命科学｜强化学习 I built open source software at https://t.co/b69DXZhcyR

Y11

Mon Nov 03 03:02:48

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

RT @aidanshandle: I built an art instillation for a halloween party this year! Meet Aura Leo: a 1920s lion sculpture reborn for Halloween.…

周五周六两天在 RTE 2025 现场和很多人聊对话式 AI，Voice AI，语音 AI 硬件，语音 AI 交互的现状和未来有资深技术党，有产品经理，有投资人有观望党，还有很多零基础收获颇多，本周陆续发一些见闻和想法

RT @AIEMiami: The world's leading AI Engineering conference is coming to Miami! AI Engineer: Miami, April 20–21. Two days. One track. A c…

RT @doodlestein: @growing_daniel One Flew Over the Cuckoo’s Nest and the response to it had absolutely disastrous consequences for American…

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

RT @aidanshandle: I built an art instillation for a halloween party this year! Meet Aura Leo: a 1920s lion sculpture reborn for Halloween.…

周五周六两天在 RTE 2025 现场和很多人聊 对话式 AI，Voice AI，语音 AI 硬件， 语音 AI 交互的现状和未来 有资深技术党，有产品经理，有投资人 有观望党，还有很多零基础 收获颇多，本周陆续发一些见闻和想法

RT @AIEMiami: The world's leading AI Engineering conference is coming to Miami! AI Engineer: Miami, April 20–21. Two days. One track. A c…

RT @doodlestein: @growing_daniel One Flew Over the Cuckoo’s Nest and the response to it had absolutely disastrous consequences for American…

周五周六两天在 RTE 2025 现场和很多人聊对话式 AI，Voice AI，语音 AI 硬件，语音 AI 交互的现状和未来有资深技术党，有产品经理，有投资人有观望党，还有很多零基础收获颇多，本周陆续发一些见闻和想法