LogoThread Easy
  • Explorar
  • Componer hilo
LogoThread Easy

Tu compañero integral para hilos de Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

RT @elithrar: $5 @Cloudflare Workers plan + $5 @PlanetScale dev node sounding like a winning combination for building the next big thing 😎

RT @elithrar: $5 @Cloudflare Workers plan + $5 @PlanetScale dev node sounding like a winning combination for building the next big thing 😎

vp developers & ai @cloudflare ✨ and how does that error make you feel?

avatar for rita kozlov 🐀
rita kozlov 🐀
Thu Oct 30 15:52:31
The nature is healing.

The nature is healing.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Thu Oct 30 15:51:18
someone tell me what i'm missing here, because the titled claim seems trivially false to me:

they define an LLM as a function that maps sequence s in V^k to vector in R^d

assume hidden state in n-bit precision.  at some point, there are more inputs possible than hidden states:

|V|^k > 2^{n * d}
k > n d log(2) / log |V|

let's take GPT-2: n=16, d=768, V≈50,000

then collisions *must* happen starting at a context window size of 214 tokens

this seems actually kind of bad, right?

someone tell me what i'm missing here, because the titled claim seems trivially false to me: they define an LLM as a function that maps sequence s in V^k to vector in R^d assume hidden state in n-bit precision. at some point, there are more inputs possible than hidden states: |V|^k > 2^{n * d} k > n d log(2) / log |V| let's take GPT-2: n=16, d=768, V≈50,000 then collisions *must* happen starting at a context window size of 214 tokens this seems actually kind of bad, right?

phd research @cornell // language models, information theory, science of AI

avatar for Jack Morris
Jack Morris
Thu Oct 30 15:50:00
someone tell me what i'm missing here, because the titled claim seems trivially false to me:

they define an LLM as a function that maps sequence s in V^k to vector in R^d

assume hidden state in n-bit precision.  at some point, there are more inputs possible than hidden states:

|V|^k > 2^{n * d}
k > n d log(2) / log |V|

let's take GPT-2: n=16, d=768, V≈50,000

then collisions *must* happen starting at a context window size of 214 tokens

this seems actually kind of bad, right?

someone tell me what i'm missing here, because the titled claim seems trivially false to me: they define an LLM as a function that maps sequence s in V^k to vector in R^d assume hidden state in n-bit precision. at some point, there are more inputs possible than hidden states: |V|^k > 2^{n * d} k > n d log(2) / log |V| let's take GPT-2: n=16, d=768, V≈50,000 then collisions *must* happen starting at a context window size of 214 tokens this seems actually kind of bad, right?

phd research @cornell // language models, information theory, science of AI

avatar for Jack Morris
Jack Morris
Thu Oct 30 15:50:00
June - Container launch
Nov 6 - Containers live in prod 

Come to our TECH Talk: https://t.co/Jx2ayskkta

June - Container launch Nov 6 - Containers live in prod Come to our TECH Talk: https://t.co/Jx2ayskkta

Have questions, or building something cool with Cloudflare's Developer products? We're here to help. For help with your account please try @CloudflareHelp

avatar for Cloudflare Developers
Cloudflare Developers
Thu Oct 30 15:46:53
RT @bigeagle_xd: i am honored to have witnessed this great work over the past year.  
linear attn has great potential in expressiveness but…

RT @bigeagle_xd: i am honored to have witnessed this great work over the past year. linear attn has great potential in expressiveness but…

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Thu Oct 30 15:46:25
  • Previous
  • 1
  • More pages
  • 3806
  • 3807
  • 3808
  • More pages
  • 4204
  • Next