探索

最新在前,按卡片方式浏览线程

开启时会模糊预览图,关闭后正常显示

5/5 What is async RL that Customer Composer model training uses?

It uses asynchronous execution at multiple levels to avoid waiting on slow operations e.g. a long roll-out generation.

As you know, for a given problem, in RL like GRPO we generate multiple trajectorier. However, some trajectories can take too long to complete.

So, once they have enough trajectories, they run the training. 

Partial samples/roll-outs are resumed later with updated model. This causes a situation where some tokens are generated by the old model/policy and some by new. 

However, this is acceptable. If you want to understand more about Async RL, please read APRIL - a project for Async RL.

5/5 What is async RL that Customer Composer model training uses? It uses asynchronous execution at multiple levels to avoid waiting on slow operations e.g. a long roll-out generation. As you know, for a given problem, in RL like GRPO we generate multiple trajectorier. However, some trajectories can take too long to complete. So, once they have enough trajectories, they run the training. Partial samples/roll-outs are resumed later with updated model. This causes a situation where some tokens are generated by the old model/policy and some by new. However, this is acceptable. If you want to understand more about Async RL, please read APRIL - a project for Async RL.

AI @amazon. All views personal!

avatar for GDP
GDP
The first big tech company to be destroyed by AI will be Salesforce.

The first big tech company to be destroyed by AI will be Salesforce.

Professor of computer science at UW and author of '2040' and 'The Master Algorithm'. Into machine learning, AI, and anything that makes me curious.

avatar for Pedro Domingos
Pedro Domingos
The Federal Reserve has announced a quarter percentage point rate cut, marking its second consecutive rate reduction.

The move brings the Fed’s benchmark interest rate down to a range of 3.75% to 4%.

The Federal Reserve has announced a quarter percentage point rate cut, marking its second consecutive rate reduction. The move brings the Fed’s benchmark interest rate down to a range of 3.75% to 4%.

The pulse of the nation in the palm of your hand.

avatar for USA TODAY
USA TODAY
https://t.co/BmBeU9Iays names '6-7' as 2025 Word of the Year. Here's what it really means.

https://t.co/BmBeU9Iays names '6-7' as 2025 Word of the Year. Here's what it really means.

The pulse of the nation in the palm of your hand.

avatar for USA TODAY
USA TODAY
Trump replaces members of arts commission reviewing White House ballroom plans

Trump replaces members of arts commission reviewing White House ballroom plans

Top and breaking news, pictures and videos from Reuters. For breaking business news, follow @ReutersBiz. Our daily podcast is here: https://t.co/KO0QFy0d3a

avatar for Reuters
Reuters
Maryland Senate President Bill Ferguson dashes Democrats' hopes the state would join the national redistricting battle, telling colleagues that the chamber would not try to redraw the state's congressional map.

Maryland Senate President Bill Ferguson dashes Democrats' hopes the state would join the national redistricting battle, telling colleagues that the chamber would not try to redraw the state's congressional map.

News updates from around the 🌎, all day, every day.

avatar for NBC News
NBC News