top | item 47054111

(no title)

lukev | 12 days ago

This is being downvoted but it shouldn't be.

If the ultimate goal is having a LLM control a computer, round-tripping through a UX designed for bipedal bags of meat with weird jelly-filled optical sensors is wildly inefficient.

Just stay in the computer! You're already there! Vision-driven computer use is a dead end.

discuss

zmmmmm|12 days ago

you could say that about natural language as well, but it seems like having computers learn to interface with natural language at scale is easier than teaching humans to interface using computer languages at scale. Even most qualified people who work as software programmers produce such buggy piles of garbage we need entire software methodologies and testing frameworks to deal with how bad it is. It won't surprise me if visual computer use follows a similar pattern. we are so bad at describing what we want the computer to do that it's easier if it just looks at the screen and figures it out.

ashirviskas|12 days ago

Someone ping me in 5 years, I want to see if this aged like milk or wine

JSR_FDED|12 days ago

“Computer, respond to this guy in 5 years”

chasd00|12 days ago

i replied as much to a sibling comment but i think this is a way to wiggle out of robots.txt, identifying user agent strings, and other traditional ways for sites to filter for a bot.

lukev|12 days ago

Right but those things exist to prevent bots. Which this is.

So at this point we're talking about participating in the (very old) arms race between scrapers & content providers.

If enough people want agents, then services should (or will) provide agent-compatible APIs. The video round-trip remains stupid from a whole-system perspective.

mvdtnz|12 days ago

I mean if they want to "wriggle out" of robots.txt they can just ignore it. It's entirely voluntary.