Explorar

RT @danielkoeth: Blue Skies Forever 💙

Market Design/Entrepreneurship Professor @HarvardHBS & Faculty Affiliate @Harvard Economics; Research @a16zcrypto; Editor @restatjournal; Econ @Quora; … | #QED

Scott Kominers

Thu Oct 30 15:57:42

What's going on there, how do you screw this up we've seen DS-MoEs scaling gracefully and predictably from 3B to 1T with trivial allometric changes, half of Chinese tech reports is doing just that

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Oct 30 15:56:44

When you run parallel agents in the same folder without git worktrees

Open-source ecosystem for high-leverage AI builders. GitHub's #1 AI Agents Repo with 72k+ stars. Join 200k+ active AI builders.

Unwind AI

Thu Oct 30 15:52:54

RT @elithrar: $5 @Cloudflare Workers plan + $5 @PlanetScale dev node sounding like a winning combination for building the next big thing 😎

vp developers & ai @cloudflare ✨ and how does that error make you feel?

rita kozlov 🐀

Thu Oct 30 15:52:31

The nature is healing.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Oct 30 15:51:18

$someone tell me what i'm missing here, because the titled claim seems trivially false to me: they define an LLM as a function that maps sequence s in V^k to vector in R^d assume hidden state in n-bit precision. at some point, there are more inputs possible than hidden states: |V|^k > 2^{n * d} k > n d log(2) / log |V| let's take GPT-2: n=16, d=768, V≈50,000 then collisions *must* happen starting at a context window size of 214 tokens this seems actually kind of bad, right?$

someone tell me what i'm missing here, because the titled claim seems trivially false to me: they define an LLM as a function that maps sequence s in V^k to vector in R^d assume hidden state in n-bit precision. at some point, there are more inputs possible than hidden states: |V|^k > 2^{n * d} k > n d log(2) / log |V| let's take GPT-2: n=16, d=768, V≈50,000 then collisions must happen starting at a context window size of 214 tokens this seems actually kind of bad, right?

phd research @cornell // language models, information theory, science of AI

Jack Morris

Thu Oct 30 15:50:00

Newest first — browse tweet threads