(no title)
crawshaw | 21 days ago
We have built two of them now, and clearly the state of the art here can be improved. But it is hard to push too much on this while the models keep improving.
crawshaw | 21 days ago
We have built two of them now, and clearly the state of the art here can be improved. But it is hard to push too much on this while the models keep improving.
tiny-automates|20 days ago
the hard part isn't the loop itself — it's everything around failure recovery.
when a browser agent misclicks, loads a page that renders differently than expected, or hits a CAPTCHA mid-flow, the 9-line loop just retries blindly. the real harness innovation is going to be in structured state checkpointing so the agent can backtrack to the last known-good state instead of restarting the whole task. that's where the gap between "works in a demo" and "works on the 50th run" lives.