top | item 42053954

(no title)

theredsix | 1 year ago

Awesome project, starred! Here are some other projects for agentic browser interactions:

* Cerebellum (Typescript): https://github.com/theredsix/cerebellum

* Skyvern: https://github.com/Skyvern-AI/skyvern

Disclaimer: I am the author of Cerebellum

discuss

gregpr07|1 year ago

Thanks man, starred yours too, it's super cool to see all these projects getting spun up!

I see Cerebellum is vision only. Did you try adding HTML + screenshot? I think that improves the performance like crazy and you don't have to use Claude only.

Just saw Skyvern today on previous Show HNs haha :)

theredsix|1 year ago

I had an older version that used simplified HTML, and it got to decent performance with GPT-4o and Gemini but at the cost of 10x token usage. You are right, identifying the interactable elements and pulling out their values into a prompt structure to explicitly allow the next actions can boost performance, especially if done with grammar like structured outputs or guidance-llm. However, I saw that Claude had similar levels of performance with pure vision, and I felt that vision + more training would beat a specialized DOM algorithm due to "the bitter lesson".

BTW I really like your handling of browser tabs, I think it's really clever.

dbacar|1 year ago

I starred both of you