探索 | Thread Easy - 展開 Twitter 線程｜閱讀、摘要與創作

RT @RobertMSterling: My wife: “You’re not really going to start eating pumpkin pie at 9:15 AM, are you?” Me:

📈 Leading Growth @GroqInc 📌 Prev @a16z @HubSpot @TheHustle 💻 Chronically online: https://t.co/AkbwhoTr0K 📘 Wrote https://t.co/w1DBDrOZdI 🎙 Podcast at @sydlis 👇

Steph Smith

Thu Nov 27 18:15:08

RT @sainingxie: most of people didn’t know this we had been using TPUs at Facebook as far back as 2020. Kaiming led the initial developme…

🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Prev: professor @ UQ; Stanford fellow; @kaggle president; @fastmail/@enlitic/etc founder https://t.co/16UBFTX7mo

Jeremy Howard

Thu Nov 27 18:11:56

Man, we need much longer contexts… and much better refinement against overthinking. We probably will have to inference a lot of >1M chains before cutting it down where applicable. DSA in theory could support it without prohibitive compute costs

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Nov 27 18:10:24

Man, we need much longer contexts… and much better refinement against overthinking. We probably will have to inference a lot of >1M chains before cutting it down where applicable. DSA in theory could support it without prohibitive compute costs

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Nov 27 18:10:24

In case you wondered what «DeepSeek» and Longtermism» are about, it's not that they're just spamming random English tokens.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Nov 27 18:00:38

GRPO without updates is enough to fight Google for IMO Gold Whales are not in the algorithmic fetishism paper mill business, they're in the «answer the essential question with long-termism» business.

In case you wondered what «DeepSeek» and Longtermism» are about, it's not that they're just spamming random English tokens.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Nov 27 17:52:54

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

RT @RobertMSterling: My wife: “You’re not really going to start eating pumpkin pie at 9:15 AM, are you?” Me:

RT @sainingxie: most of people didn’t know this we had been using TPUs at Facebook as far back as 2020. Kaiming led the initial developme…

Man, we need much longer contexts… and much better refinement against overthinking. We probably will have to inference a lot of >1M chains before cutting it down where applicable. DSA in theory could support it without prohibitive compute costs

Man, we need much longer contexts… and much better refinement against overthinking. We probably will have to inference a lot of >1M chains before cutting it down where applicable. DSA in theory could support it without prohibitive compute costs

In case you wondered what «DeepSeek» and Longtermism» are about, it's not that they're just spamming random English tokens.

GRPO without updates is enough to fight Google for IMO Gold Whales are not in the algorithmic fetishism paper mill business, they're in the «answer the essential question with long-termism» business.

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

RT @RobertMSterling: My wife: “You’re not really going to start eating pumpkin pie at 9:15 AM, are you?” Me:

RT @sainingxie: most of people didn’t know this we had been using TPUs at *Facebook* as far back as 2020. Kaiming led the initial developme…

Man, we need much longer contexts… and much better refinement against overthinking. We probably will have to inference a lot of >1M chains before cutting it down where applicable. DSA in theory could support it without prohibitive compute costs

Man, we need much longer contexts… and much better refinement against overthinking. We probably will have to inference a lot of >1M chains before cutting it down where applicable. DSA in theory could support it without prohibitive compute costs

In case you wondered what «DeepSeek» and Longtermism» are about, it's not that they're just spamming random English tokens.

GRPO without updates is enough to fight Google for IMO Gold Whales are not in the algorithmic fetishism paper mill business, they're in the «answer the essential question with long-termism» business.

RT @sainingxie: most of people didn’t know this we had been using TPUs at Facebook as far back as 2020. Kaiming led the initial developme…