探索
最新在前,按卡片方式浏览线程
开启时会模糊预览图,关闭后正常显示

5/5 What is async RL that Customer Composer model training uses? It uses asynchronous execution at multiple levels to avoid waiting on slow operations e.g. a long roll-out generation. As you know, for a given problem, in RL like GRPO we generate multiple trajectorier. However, some trajectories can take too long to complete. So, once they have enough trajectories, they run the training. Partial samples/roll-outs are resumed later with updated model. This causes a situation where some tokens are generated by the old model/policy and some by new. However, this is acceptable. If you want to understand more about Async RL, please read APRIL - a project for Async RL.
AI @amazon. All views personal!


The first big tech company to be destroyed by AI will be Salesforce.
Professor of computer science at UW and author of '2040' and 'The Master Algorithm'. Into machine learning, AI, and anything that makes me curious.


The Federal Reserve has announced a quarter percentage point rate cut, marking its second consecutive rate reduction. The move brings the Fed’s benchmark interest rate down to a range of 3.75% to 4%.
The pulse of the nation in the palm of your hand.


https://t.co/BmBeU9Iays names '6-7' as 2025 Word of the Year. Here's what it really means.
The pulse of the nation in the palm of your hand.


Trump replaces members of arts commission reviewing White House ballroom plans
Top and breaking news, pictures and videos from Reuters. For breaking business news, follow @ReutersBiz. Our daily podcast is here: https://t.co/KO0QFy0d3a


Maryland Senate President Bill Ferguson dashes Democrats' hopes the state would join the national redistricting battle, telling colleagues that the chamber would not try to redraw the state's congressional map.
News updates from around the 🌎, all day, every day.
