top | item 47125201

(no title)

nee1r | 6 days ago

Hey guys! I’m Neel, been holed up in our south park office for the past year working on model training. excited to share our research!

This is a preview of a very different type of computer use model—we train on the internet. Specifically we have 11 million hours of computer video stored on our storage cluster (previously shared https://news.ycombinator.com/item?id=45438496 !) and the model can work in 30 FPS. Since we match the fundamental form factor of computer-use, we can get our model to do CAD, browse websites, and even drive a car using arrow keys. I’m super excited to see what our model can do as we scale more, it's a fun frontier to work on (not language models :) ).

The team and I will be online responding to the comments, so drop any questions.

discuss

ilaksh|4 days ago

How do I access this? Any HF or API coming?

Any benchmark comparisons to Fara-7B or Sonnet 4.6, Qwen 3.5 etc.?

AndrewKemendo|4 days ago

This looks like a really promising approach

In particular the Forward rollout module is very important. It aligns your (effectively) world model with what it expects from the world, and keeping those in sync I think gives this the power it needs to be able to generate the state action pairs to continuously train semi supervised

dangoodmanUT|4 days ago

11 million hours of data is a lot, did you have to synthesize it at all, or was it purely collected?

nee1r|4 days ago

collected! no synthetic

dr_dshiv|3 days ago

Cool! Isn’t this what cursor initially tried to do before they pivoted? Hence cursor?

Must have been really hard. What was the breakthrough?

xianshou|3 days ago

Great work! Why no benchmarks though?

arkmm|4 days ago

Get ready for the acquisition offers.