top | item 45416203

(no title)

alach11 | 5 months ago

I'm really interested in the progress on computer use. These are the benchmarks to watch if you want to forecast economic disruption, IMO. Mastery of computer use takes us out of the paradigm of task-specific integrations with AI to a more generic interface that's way more scalable.

discuss

order

sipjca|5 months ago

Maybe this is true? But it's not clear to me this methodology will ever be quite as good as native tool calling. Or maybe I don't know the benchmark well enough, I just assume it's vision based

Perhaps Tesla FSD is a similar example where in practice self driving with vision should be possible (humans), but is fundamentally harder and more error prone than having better data. It seems to me very error prone and expensive in tokens to use computer screens as a fundamental unit.

But at the same rate, I'm sure there are many tasks which could be automated as well, so shrug

simianwords|5 months ago

Looks like RPA vs API debate all over again

cantor_S_drug|5 months ago

Do you think a Genie like model specifically trained on data consisting of interacting with application interfaces would be good on computer use tasks?

mrshu|5 months ago

What are some standard benchmarks you look at in this space?