探索 | Thread Easy - Twitterスレッドを展開 | リーダー・要約・作成

What's the most careful analysis of the impacts of capabilities CoT RL on safety-tuning of models? I'm interested in a study where they take a well safety+instruct tuned model like llama 3.1, and RL it, without doing further safety tuning, then doing rigorous analysis on the effects. I'm mostly interested in the qualitative aspect: How does the CoT look? Does the CoT frequently talk in its cot about safety principles its supposed to follow? But also quantitative stuff like how much benchmarks decrease and which. Anyone know if a study like this has been done?

Interests: AI (Safety), meditation, philosophy, mathematics, algorithms If I say something you disagree with, please dm or quote tweet. I love to argue!

William Wale

Thu Dec 11 21:50:56

For more information about the program, and to apply to the safety track: https://t.co/ajwKem4my2 We’re also adding a security track. Apply here: https://t.co/WZxjII7yJW

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.

Anthropic

Thu Dec 11 21:42:06

We’re opening applications for the next two rounds of the Anthropic Fellows Program, beginning in May and July 2026. We provide funding, compute, and direct mentorship to researchers and engineers to work on real safety and security projects for four months.

40% of fellows in our first cohort have since joined Anthropic full-time, and 80% published their work as a paper. Next year, we’re expanding the program to more fellows and more research areas. To learn more about what our fellows work on: https://t.co/HSQjGy90AZ

Anthropic

Thu Dec 11 21:42:05

40% of fellows in our first cohort have since joined Anthropic full-time, and 80% published their work as a paper. Next year, we’re expanding the program to more fellows and more research areas. To learn more about what our fellows work on: https://t.co/HSQjGy90AZ

For more information about the program, and to apply to the safety track: https://t.co/ajwKem4my2 We’re also adding a security track. Apply here: https://t.co/WZxjII7yJW

Anthropic

Thu Dec 11 21:42:05

RT @eliebakouch: wtf gpt 5.2 long context improvement over gpt 5.1 is actually crazy??

AI is cool i guess

Sam Altman

Thu Dec 11 21:38:11

RT @gothburz: Last quarter I rolled out Microsoft Copilot to 4,000 employees. $30 per seat per month. $1.4 million annually. I called it…

Deeply researched product, growth, and career advice

Lenny Rachitsky

Thu Dec 11 21:35:43

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

For more information about the program, and to apply to the safety track: https://t.co/ajwKem4my2 We’re also adding a security track. Apply here: https://t.co/WZxjII7yJW

We’re opening applications for the next two rounds of the Anthropic Fellows Program, beginning in May and July 2026. We provide funding, compute, and direct mentorship to researchers and engineers to work on real safety and security projects for four months.

40% of fellows in our first cohort have since joined Anthropic full-time, and 80% published their work as a paper. Next year, we’re expanding the program to more fellows and more research areas. To learn more about what our fellows work on: https://t.co/HSQjGy90AZ

RT @eliebakouch: wtf gpt 5.2 long context improvement over gpt 5.1 is actually crazy??

RT @gothburz: Last quarter I rolled out Microsoft Copilot to 4,000 employees. $30 per seat per month. $1.4 million annually. I called it…