LogoThread Easy
  • 発見
  • スレッド作成
LogoThread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

What's the most careful analysis of the impacts of capabilities CoT RL on safety-tuning of models?

I'm interested in a study where they take a well safety+instruct tuned model like llama 3.1, and RL it, without doing further safety tuning, then doing rigorous analysis on the effects.

I'm mostly interested in the qualitative aspect: How does the CoT look? Does the CoT frequently talk in its cot about safety principles its supposed to follow? 

But also quantitative stuff like how much benchmarks decrease and which.

Anyone know if a study like this has been done?

What's the most careful analysis of the impacts of capabilities CoT RL on safety-tuning of models? I'm interested in a study where they take a well safety+instruct tuned model like llama 3.1, and RL it, without doing further safety tuning, then doing rigorous analysis on the effects. I'm mostly interested in the qualitative aspect: How does the CoT look? Does the CoT frequently talk in its cot about safety principles its supposed to follow? But also quantitative stuff like how much benchmarks decrease and which. Anyone know if a study like this has been done?

Interests: AI (Safety), meditation, philosophy, mathematics, algorithms If I say something you disagree with, please dm or quote tweet. I love to argue!

avatar for William Wale
William Wale
Thu Dec 11 21:50:56
For more information about the program, and to apply to the safety track: https://t.co/ajwKem4my2

We’re also adding a security track. Apply here: https://t.co/WZxjII7yJW

For more information about the program, and to apply to the safety track: https://t.co/ajwKem4my2 We’re also adding a security track. Apply here: https://t.co/WZxjII7yJW

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.

avatar for Anthropic
Anthropic
Thu Dec 11 21:42:06
We’re opening applications for the next two rounds of the Anthropic Fellows Program, beginning in May and July 2026. 

We provide funding, compute, and direct mentorship to researchers and engineers to work on real safety and security projects for four months.

We’re opening applications for the next two rounds of the Anthropic Fellows Program, beginning in May and July 2026. We provide funding, compute, and direct mentorship to researchers and engineers to work on real safety and security projects for four months.

40% of fellows in our first cohort have since joined Anthropic full-time, and 80% published their work as a paper. Next year, we’re expanding the program to more fellows and more research areas. To learn more about what our fellows work on: https://t.co/HSQjGy90AZ

avatar for Anthropic
Anthropic
Thu Dec 11 21:42:05
40% of fellows in our first cohort have since joined Anthropic full-time, and 80% published their work as a paper. Next year, we’re expanding the program to more fellows and more research areas.

To learn more about what our fellows work on: https://t.co/HSQjGy90AZ

40% of fellows in our first cohort have since joined Anthropic full-time, and 80% published their work as a paper. Next year, we’re expanding the program to more fellows and more research areas. To learn more about what our fellows work on: https://t.co/HSQjGy90AZ

For more information about the program, and to apply to the safety track: https://t.co/ajwKem4my2 We’re also adding a security track. Apply here: https://t.co/WZxjII7yJW

avatar for Anthropic
Anthropic
Thu Dec 11 21:42:05
RT @eliebakouch: wtf gpt 5.2 long context improvement over gpt 5.1 is actually crazy??

RT @eliebakouch: wtf gpt 5.2 long context improvement over gpt 5.1 is actually crazy??

AI is cool i guess

avatar for Sam Altman
Sam Altman
Thu Dec 11 21:38:11
RT @gothburz: Last quarter I rolled out Microsoft Copilot to 4,000 employees.

$30 per seat per month.

$1.4 million annually.

I called it…

RT @gothburz: Last quarter I rolled out Microsoft Copilot to 4,000 employees. $30 per seat per month. $1.4 million annually. I called it…

Deeply researched product, growth, and career advice

avatar for Lenny Rachitsky
Lenny Rachitsky
Thu Dec 11 21:35:43
  • Previous
  • 1
  • More pages
  • 959
  • 960
  • 961
  • More pages
  • 5634
  • Next