but this feels like it has to be right in the end! 1. unsupervised learning on everything to understand the world. 1st person, 3rd person, car cameras, 2d animation, cctv, instructional videos, text, images, any and all robotics data, etc. 2. that should transfer downstream to a model you finetune with teleoperation data. your robotics model uses its deep latent understanding of “what” a coffee mug really is and what it is used for to understand your human demonstrations. also finetuning in a motor control and action head shouldn’t be hard here if data not in pretraining 3. a bit of real world on-policy RL with your model deployed in the wild (or some in sim/in lab) is what you need to seal the deal.
스레드를 불러오는 중
깔끔한 읽기 화면을 위해 X에서 원본 트윗을 가져오고 있어요.
보통 몇 초면 완료되니 잠시만 기다려 주세요.