top | item 44458349

(no title)

atupem | 8 months ago

Lots of interesting issues:

- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.

- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.

- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)

Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!

We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor

discuss

latexr|7 months ago

> showing flashes of brilliance

A “flash” of anything is also called a fluke, or a coincidence. The dumbest moron can have a flash of brilliance on occasion. So could a random word masher. Consistency is what matters.

> and we're gaining more and more conviction that this is the right form factor

Are we? Who’s “we”? Because it looks to me like the LLM approach is lacklustre if you care about truth and correctness (which you should) but the people and companies invested don’t really have a better idea and are shoving them down everyone’s throats in pursuit of personal profit.

atupem|7 months ago

Agreed, and the consistency has improved over time. I remember only a 9 months ago struggling to get a browser agent to accurately click on a checkbox. The growth trajectory is what has us excited.

"We" are a YC-backed startup: https://www.ycombinator.com/companies/bytebot.

Re: truth and correctness, their are different tolerances depending on the type of task.

lelanthran|7 months ago

> We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor

Slow down cowboy; we're seeing "flashes of brilliance" and "that this is the right form factor" for writing code only!

I'm still waiting for AI/LLM's to be posing a danger to jobs other than those in software development and the arts.

furyofantares|7 months ago

This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.

lelanthran|7 months ago

See my comprehensive reply downthread (it's very long, you cannot miss it).

While I am skeptical due to already having explored this for SMME Line of Business applications, I wish you all the best of luck.

My approach is to simply build a new system from the ground up that can take advantage of structured IO.

[EDIT: send me a message with a link to a post about your product (or this blog), I'll connect with you on linked-in and share your post with my network, meager though it may be]

atupem|7 months ago

Will do!

teruakohatu|7 months ago

What is your business model?

atupem|7 months ago

We're working with design partners as forward deployed engineers, helping setup Bytebot on their infra and tackle use cases.

We'll be launching a self-serve cloud platform soon!