(no title)
nee1r | 6 days ago
This is a preview of a very different type of computer use model—we train on the internet. Specifically we have 11 million hours of computer video stored on our storage cluster (previously shared https://news.ycombinator.com/item?id=45438496 !) and the model can work in 30 FPS. Since we match the fundamental form factor of computer-use, we can get our model to do CAD, browse websites, and even drive a car using arrow keys. I’m super excited to see what our model can do as we scale more, it's a fun frontier to work on (not language models :) ).
The team and I will be online responding to the comments, so drop any questions.
ilaksh|4 days ago
Any benchmark comparisons to Fara-7B or Sonnet 4.6, Qwen 3.5 etc.?
AndrewKemendo|4 days ago
In particular the Forward rollout module is very important. It aligns your (effectively) world model with what it expects from the world, and keeping those in sync I think gives this the power it needs to be able to generate the state action pairs to continuously train semi supervised
dangoodmanUT|4 days ago
nee1r|4 days ago
dr_dshiv|3 days ago
Must have been really hard. What was the breakthrough?
xianshou|3 days ago
arkmm|4 days ago