top | item 46804855

(no title)

arjunchint | 1 month ago

In our benchmark of web agents, we found that vision/GUI based agents get tripped up on popups/overlays, need large vision models and require using CDP in browsers.

Our own DOM-only web agent, rtrvr.ai, worked seamlessly underneath dialogs, can just use off the shelf Gemini Flash Lite and use Chrome native APIs leading to minimal infrastructure failures, SOTA performance and lowest cost.

https://www.rtrvr.ai/blog/web-bench-results

discuss

order

No comments yet.