Explorer

RT @casper_hansen_: RL is slow and expensive while prompt optimization is fast and cheap. I'm not convinced yet that RL is the solution to…

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Fri Dec 05 04:44:33

“i told you so” > We were surprised to find that Claude Code with Opus 4.5 dramatically outperformed the CORE-Agent scaffold, even without fixing incorrect test cases (78% vs 42%). > We are unsure what led to this difference. One hypothesis is that the Claude 4.5 series of models is much better tuned to work with Claude Code. > We think studying the coupling between models and scaffolds is an important research direction going forward

so many gigabrained takes at that time, people asking in posts and discussing in GCs about what’s the reason. but almost 9 months later, only one answer wins.

tokenbender

Fri Dec 05 04:42:26

This is unserious. V3.2-thinking, one of the strongest LLMs around, is below tons of relatively weak models and even older versions of itself, like V3.1, V3.2-exp, R1-0528. Maybe the clearest case of lmarena being cooked.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Fri Dec 05 04:40:52

RT @AlexGDimakis: Both GEPA and OpenThoughts got oral presentations at the FoRLM workshop in Neurips (This Sunday). Congratulations to the…

Asst professor @MIT EECS & CSAIL (@nlp_mit). Author of https://t.co/VgyLxl0oa1 and https://t.co/ZZaSzaRaZ7 (@DSPyOSS). Prev: CS PhD @StanfordNLP. Research @Databricks.

Omar Khattab

Fri Dec 05 04:39:59

RT @isaiah_p_taylor: Today, we took the Nova Core critical for the final time. 3 weeks, 10 configurations, and 36 different critical and su…

https://t.co/N3tfDNkGx4 | founder @trychroma

anton 🇺🇸

Fri Dec 05 04:35:59

DEEP is in effect the robotics wing of ZJU. A pilgrimage there makes sense for everyone interested in the future of robotics (+ Shenzhen and Shanghai)

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Fri Dec 05 04:17:34

Newest first — browse tweet threads

Explorer

Newest first — browse tweet threads

RT @casper_hansen_: RL is slow and expensive while prompt optimization is fast and cheap. I'm not convinced yet that RL is the solution to…

This is unserious. V3.2-thinking, one of the strongest LLMs around, is below tons of relatively weak models and even older versions of itself, like V3.1, V3.2-exp, R1-0528. Maybe the clearest case of lmarena being cooked.

RT @AlexGDimakis: Both GEPA and OpenThoughts got oral presentations at the FoRLM workshop in Neurips (This Sunday). Congratulations to the…

RT @isaiah_p_taylor: Today, we took the Nova Core critical for the final time. 3 weeks, 10 configurations, and 36 different critical and su…

DEEP is in effect the robotics wing of ZJU. A pilgrimage there makes sense for everyone interested in the future of robotics (+ Shenzhen and Shanghai)