top | item 44457549

Why the simplest desktop agent abstraction wins

43 points| atupem | 8 months ago |bytebot.ai

16 comments

What are the biggest issues that the agent faces at the moment? I still find these general purpose agents frustrating to use at times because people position it as if it could do anything and then when you give it a reasonably complex task it breaks down.

I guess if someone figured out way to minimize the impact of an error, like a way for it to gracefully handle it without it feeling like too much work, that would fix most of the problems.

atupem|8 months ago

Lots of interesting issues:

- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.

- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.

- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)

Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!

We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor

clbrmbr|7 months ago

Does anyone have experience getting agents to understand terminal applications? Like, in general an arbitrary ncurses application.

A more specific case I’ve struggled with is output from a long-running program like ping. You’ve got to know when to terminate.

soulofmischief|7 months ago

I wrote a terminal-based falling sand game in rust and incrementally fed the entire screen output to a multimodal LLM (for better generalization) and also got it to attempt to generate interesting initial conditions by spitting out raw characters.

noman-land|7 months ago

Instead of telling the agent to wait for something like ping, have it write a script to do it and then have it run the script.