LogoThread Easy
  • Explorar
  • Componer hilo
LogoThread Easy

Tu compañero integral para hilos de Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

can we appreciate the insanity of “Older hardware underperformed in evals and was removed” 

for what is conceptually a lot of parallel General Matrix Multiplications

can we appreciate the insanity of “Older hardware underperformed in evals and was removed” for what is conceptually a lot of parallel General Matrix Multiplications

achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai

avatar for swyx
swyx
Sun Nov 02 07:22:42
Safe spaces online, for humans.

Safe spaces online, for humans.

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Sun Nov 02 07:19:53
是的,AI 编程还是 Mac 好使。学员遇到环境安装的基本 90% 都是 Windows。

零基础 AI 编程的朋友,非常推荐去苹果店取完自己 Mac 后,搭配一套我的课程!好工具➕好课程,能少走很多弯路🌝

是的,AI 编程还是 Mac 好使。学员遇到环境安装的基本 90% 都是 Windows。 零基础 AI 编程的朋友,非常推荐去苹果店取完自己 Mac 后,搭配一套我的课程!好工具➕好课程,能少走很多弯路🌝

我的AI编程课(https://t.co/HVZn3ItASW) |B站up主 | 分享创造 + 无限迭代ing

avatar for 熠辉 Indie
熠辉 Indie
Sun Nov 02 07:19:31
or do we already do that because we are minimising KL with more recency bias rather than over the entire distribution?

or do we already do that because we are minimising KL with more recency bias rather than over the entire distribution?

RL and efficient distributed pretraining • eXperiments lab • memes and training lores

avatar for tokenbender
tokenbender
Sun Nov 02 07:17:02
*monotone* oh no, not the briar patch!

*monotone* oh no, not the briar patch!

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Sun Nov 02 07:16:16
why do we accept approximating distribution from data/model with forward KL (left picture)?

why not work towards algos that look like (right picture)?

why do we accept approximating distribution from data/model with forward KL (left picture)? why not work towards algos that look like (right picture)?

or do we already do that because we are minimising KL with more recency bias rather than over the entire distribution?

avatar for tokenbender
tokenbender
Sun Nov 02 07:14:46
  • Previous
  • 1
  • More pages
  • 3447
  • 3448
  • 3449
  • More pages
  • 4210
  • Next