Cool project guys! Just gave it a spin. One thing I would have wished, was if the browsers would run locally. Since the smooth browser is running in prod, it makes it harder for Claude to test dev apps
It still may not be quite ideal. For example, right now I was building a clone of Counter Strike. There's such large files that tunneling would be cumbersome.
the Claude --chrome command has a few limitations:
1. it exposes low-level tools which make your agent interact directly with the browser which is extremely slow, VERY expensive, and less effective as the agent ends up dealing with UI mechanics instead of thinking about the higher-level goal/intents
2. it makes Claude operate the browser via screenshots and coordinates-based interaction, which does not work for tasks like data extraction where it needs to be able to attend to the whole page - the agent needs to repeatedly scroll and read one little screenshot at the time and it often misses critical context outside of the viewport. It also makes the task more difficult as the model has to figure out both what to do and how to do it, which means that you need to use larger models to make this paradigm actually work
3. because it uses your local browser, it also means that it has full access to your authenticated accounts by default which might not be ideal in a world where prompt-injections are only getting started
if you actively use the --chrome command we'd love to hear your experience!
antves|23 days ago
you should be able to tell it to go to your localhost address and it should be able to navigate to your local app from the remote browser
let us know if you have any questions!
stopachka|23 days ago
znnajdla|23 days ago
EMM_386|23 days ago
antves|23 days ago
1. it exposes low-level tools which make your agent interact directly with the browser which is extremely slow, VERY expensive, and less effective as the agent ends up dealing with UI mechanics instead of thinking about the higher-level goal/intents
2. it makes Claude operate the browser via screenshots and coordinates-based interaction, which does not work for tasks like data extraction where it needs to be able to attend to the whole page - the agent needs to repeatedly scroll and read one little screenshot at the time and it often misses critical context outside of the viewport. It also makes the task more difficult as the model has to figure out both what to do and how to do it, which means that you need to use larger models to make this paradigm actually work
3. because it uses your local browser, it also means that it has full access to your authenticated accounts by default which might not be ideal in a world where prompt-injections are only getting started
if you actively use the --chrome command we'd love to hear your experience!
stopachka|23 days ago