Show HN: Open-source browser for AI agents
155 points| theredsix | 20 days ago |github.com | reply
ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.
The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.
A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed
As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.
Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)
Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369
[+] [-] KurSix|19 days ago|reply
[+] [-] theredsix|19 days ago|reply
[+] [-] mahendra0203|19 days ago|reply
But here's my thought: you're solving the "stale state" problem by making the browser deterministic. Real websites aren't deterministic. WebSOcket pushes, long-polling, background fetches, animations that don't finish — freezing execution doesn't pause the server. The moment you unfreeze, the world may have moved.
90.5% on Mind2Web is great. But Mind2Web tasks are mostly "fill a form, click submit." The brutal failures happen on SPAs with optimistic UI updates, where the DOM says "saved" but the network request hasn't finished. Does ABP handle that case, or does the freeze just delay the confusion?
Genuine question — not trying to tear this down. The architecture is smart. I just wonder if "make the browser simpler for the agent", eventually hit s a wall where you need to make the agent smarter about async instead.
[+] [-] theredsix|19 days ago|reply
For async, lots of people smarter than me working on the smarter agent problem. Though there's a latency floor with inference due to prompt processing, and output generation. Without tools like ABP, the LLM is always aiming at a moving target.
[+] [-] Retr0id|20 days ago|reply
And what does opus score with "regular" browser harnesses?
[+] [-] esafak|20 days ago|reply
[+] [-] 9wzYQbTYsAIc|20 days ago|reply
[+] [-] Terretta|15 days ago|reply
Meanwhile OP theredsix comes off like the only other human here besides Retr0id...
[+] [-] dokdev|19 days ago|reply
[+] [-] canada_dry|19 days ago|reply
Your tool's method of returning element references is clever and should greatly improve llm handling of the page components (and greatly reduce token cost).
[+] [-] seanrrr|19 days ago|reply
Very cool! Sometimes when I try to debug things with chrome dev tools MCP, Claude would click something and too many things happen then it kind of comes to the wrong conclusions about the state of things, so sounds like this should give it a more accurate slice of time / snapshot of things.
[+] [-] theredsix|19 days ago|reply
[+] [-] multidude|19 days ago|reply
The freeze-then-capture approach is interesting. Curious how it handles pages with aggressive anti-bot detection that fingerprints headless Chromium forks — that's the other failure mode I keep hitting.
[+] [-] theredsix|19 days ago|reply
[+] [-] Gnobu|19 days ago|reply
In identity systems like Gnobu, we face a similar challenge: ensuring that authentication flows remain consistent across multiple services and sessions, especially in environments with multiple asynchronous actions.
Curious if you’ve considered adding deterministic checkpoints or logging hooks that could integrate with external identity systems for agent-level session management?
[+] [-] AlexeyBelov|18 days ago|reply
[+] [-] notpublic|20 days ago|reply
btw, impressive project.
[+] [-] theredsix|20 days ago|reply
[+] [-] giancarlostoro|20 days ago|reply
[+] [-] theredsix|20 days ago|reply
[+] [-] gregpr07|20 days ago|reply
[+] [-] theredsix|20 days ago|reply
[+] [-] exabrial|19 days ago|reply
Ironically, I wish this would happen for me browsing the internet too...
[+] [-] taskpod|19 days ago|reply
[+] [-] ripbozo|19 days ago|reply
(I was suspicious of this account's ai-sounding comments, saw it on the overview, and now it's gone. I suppose a human is in the loop at least somewhere, or the AI agent realized the mistake)
[+] [-] siva7|19 days ago|reply
[+] [-] theredsix|19 days ago|reply
[+] [-] theredsix|20 days ago|reply
[+] [-] esafak|20 days ago|reply
[+] [-] jlu|19 days ago|reply
https://www.browserbase.com/blog/chromium-fork-for-ai-automa...
[+] [-] jazzyjackson|20 days ago|reply
I had good luck letting Claude use an xml parser to get a tree of the file, and then write xpath selections to grab what it needed
[+] [-] appcustodian2|20 days ago|reply
[+] [-] theredsix|20 days ago|reply
[+] [-] nobrains|20 days ago|reply
[+] [-] YaraDori|10 days ago|reply
[deleted]
[+] [-] octoclaw|20 days ago|reply
[deleted]
[+] [-] robutsume|20 days ago|reply
[deleted]
[+] [-] theredsix|20 days ago|reply
[+] [-] bhekanik|20 days ago|reply
[deleted]