探索
撰写 Thread

Thread Easy

您的一体化 Twitter 线程助手

© 2025 Thread Easy All Rights Reserved.

探索

最新在前，按卡片方式浏览线程

作者账号

起始日期

结束日期

模糊预览图

开启时会模糊预览图，关闭后正常显示

The idea behind significantly improving the performance on hard real-world tasks is to train a value function, condition the model on advantages computed from the value function, and running an iterative improvement loop where the model learns from it’s own data.

The idea behind significantly improving the performance on hard real-world tasks is to train a value function, condition the model on advantages computed from the value function, and running an iterative improvement loop where the model learns from it’s own data.

Research Scientist @physical_int. Formerly Google DeepMind

Tue Nov 18 00:22:42

Previous
1
Next