LogoThread Easy
  • 探索
  • 線程創作
LogoThread Easy

Twitter 線程的一站式夥伴

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

The nature is healing.

The nature is healing.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Thu Oct 30 15:51:18
someone tell me what i'm missing here, because the titled claim seems trivially false to me:

they define an LLM as a function that maps sequence s in V^k to vector in R^d

assume hidden state in n-bit precision.  at some point, there are more inputs possible than hidden states:

|V|^k > 2^{n * d}
k > n d log(2) / log |V|

let's take GPT-2: n=16, d=768, V≈50,000

then collisions *must* happen starting at a context window size of 214 tokens

this seems actually kind of bad, right?

someone tell me what i'm missing here, because the titled claim seems trivially false to me: they define an LLM as a function that maps sequence s in V^k to vector in R^d assume hidden state in n-bit precision. at some point, there are more inputs possible than hidden states: |V|^k > 2^{n * d} k > n d log(2) / log |V| let's take GPT-2: n=16, d=768, V≈50,000 then collisions *must* happen starting at a context window size of 214 tokens this seems actually kind of bad, right?

phd research @cornell // language models, information theory, science of AI

avatar for Jack Morris
Jack Morris
Thu Oct 30 15:50:00
someone tell me what i'm missing here, because the titled claim seems trivially false to me:

they define an LLM as a function that maps sequence s in V^k to vector in R^d

assume hidden state in n-bit precision.  at some point, there are more inputs possible than hidden states:

|V|^k > 2^{n * d}
k > n d log(2) / log |V|

let's take GPT-2: n=16, d=768, V≈50,000

then collisions *must* happen starting at a context window size of 214 tokens

this seems actually kind of bad, right?

someone tell me what i'm missing here, because the titled claim seems trivially false to me: they define an LLM as a function that maps sequence s in V^k to vector in R^d assume hidden state in n-bit precision. at some point, there are more inputs possible than hidden states: |V|^k > 2^{n * d} k > n d log(2) / log |V| let's take GPT-2: n=16, d=768, V≈50,000 then collisions *must* happen starting at a context window size of 214 tokens this seems actually kind of bad, right?

phd research @cornell // language models, information theory, science of AI

avatar for Jack Morris
Jack Morris
Thu Oct 30 15:50:00
June - Container launch
Nov 6 - Containers live in prod 

Come to our TECH Talk: https://t.co/Jx2ayskkta

June - Container launch Nov 6 - Containers live in prod Come to our TECH Talk: https://t.co/Jx2ayskkta

Have questions, or building something cool with Cloudflare's Developer products? We're here to help. For help with your account please try @CloudflareHelp

avatar for Cloudflare Developers
Cloudflare Developers
Thu Oct 30 15:46:53
RT @bigeagle_xd: i am honored to have witnessed this great work over the past year.  
linear attn has great potential in expressiveness but…

RT @bigeagle_xd: i am honored to have witnessed this great work over the past year. linear attn has great potential in expressiveness but…

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Thu Oct 30 15:46:25
Wow. Feel seen.

Wow. Feel seen.

AI @amazon. All views personal!

avatar for GDP
GDP
Thu Oct 30 15:44:30
  • Previous
  • 1
  • More pages
  • 1742
  • 1743
  • 1744
  • More pages
  • 2137
  • Next