2) this project includes source for the local mcp server, but not for its chrome extension, which is likely bundling https://github.com/ruifigueira/playwright-crx without attribution
1. Yes, the extension uses an anonymous device ID and sends an analytics event when a tool call is used. You can inspect the network traffic to verify that zero personalized or identifying information is sent.
I collect anonymized usage data to get an idea of how often people are using the extension in the same way that websites count visitors. I split my time between many projects and having a sense of how many active users there are is helpful for deciding which ones to focus on.
2. The extension is completely written by me, and I wrote in this GitHub issue why the repo currently only contains the MCP server (in short, I use a monorepo that contains code used by all my extensions and extracting this extension and maintaining multiple monorepos while keeping them in sync would require quite a bit of work): https://github.com/BrowserMCP/mcp/issues/1#issuecomment-2784...
I understand that you're frustrated with the way I've built this project, but there's really nothing nefarious going on here. Cheers!
"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."
Yeah, not really.
I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.
What do you think they might be looking for that could be detected pretty quickly? I'm wondering if it is something like they can track mouse movement and calculate when a mouse is moving too cleanly, so adding some more human like noise to the mouse movement can better bypass the system. Others have mentioned doing too many actions too fast, but what about potential timing between actions. Even if every click isn't that fast, if they have a very consistent delay that would be another non-human sign.
There's also the whole issue of captchas being in place because people cannot be trusted to behave appropriately with automation tools.
"Avoids bot detection and CAPTCHAs" - Sure asshole, but understand that's only in place because of people like you. If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't. May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.
It's just a way to provide a "library of methods" / API that the LLM models can "call", so basically giving them method names, their parameters, the type of the output, and what they are for,
and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc
Right now if you go to ChatGPT you can't really tell it "open Google maps with my account, search for bike shops near NYC, and grab their phone numbers", because all he can do is reply in text or make images
with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc
MCP is a standard to plug useful tools into AI models so they can use them. The concept looks confusingly reversed and non-obvious to a normal person, although devs don't see this because it looks like their tooling.
I know what you mean, I think MCP is being widely adopted but it's not grassroots.. its a quick entry to this market by an established AI company trying to dominate the mind/market share of developers before consensus can be reached developers.
When I go to a shopping website I want to be able to tell my browser "hey please go through all the sideboards on this list and filter out for the ones that are larger than 155cm and smaller than 100cm, prioritise the ones with dark wood and space for vinyl records which are 31.43cm tall" for example.
Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!
Hey, we’re working on MatterRank which is pretty similar to this but currently works on web search. (e.g. I want to prioritize results that talk about X and have Y bias and I want to deprioritize those that are trying to sell me something). Feel free to try it out at https://matterrank.ai
Would also be interested in hearing more about what you’re envisioning for your use case. Are you thinking a browser extension that acts on sites you’re already on, or some sort of shopping aggregator that lets you do this, or something else entirely?
Well done, just tested on Claude Desktop and it worked smoothly and a lot less clunky than playwright. This is the right direction to go in.
I don't know if you've done it already, but it would be great to pause automation when you detect a captcha on the page and then notify the user that the automation needs attention. Playwright keeps trying to plough through captchas.
Crazy, in looking up some info on the web and creating a Spreadsheet on Google Sheets to insert the results, it worked almost perfectly the first time and completely failed subsequently on 8-10 different tries.
Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?
I have a feeling the first time I tried it, I was fast enough clicking the "Allow for this chat" permissions, whereas by the time I clicked the permission on subsequent chats, the LLM just reports "It seems we had an issue with the click. Let me try again with a different reference.".
Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.
Same with identifying cells A1, B1, etc. and inserting into the rows.
Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.
Kudos to how smooth this experience is though, very nice setup & execution!
EDIT 2:
The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(
Such a rich UI like google sheets seems like a bad use case for such a general "browser automation" MCP server. Would be cool to see an MCP server like this, but with specific tools that let the LLM read and write to google sheets cells. I'm sure it would knock these tasks out of the park if it had a more specific abstraction instead of generally interacting with a webpage
What you're experiencing is commonly referred to as "luck". It's the same reason people consistently think newer versions of ChatGPT are nerfed in some way. In reality, people just got lucky originally and have unrealistic expectations based on this originally positive outcome.
There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.
Stuff like this makes me giddy for manual tasks like reimbursement requests. Its such a chore (and it doesnt help our process isnt great).
Every month, go to service providers, log in, find and download statement, create google doc with details filled in, download it, write new email and upload all the files. Maybe double chek the attachments are right but that requires downloading them again instead of being able to view in email).
Automating this is already possible (and a real expense tracking app can eliminate about half of this work) but I think AI tools have the potential to elminate a lot of the nittier-grittier specification of it. This is especially important because these sorts of workflows are often subject to little changes.
Did something similar but controls a hardware synth, allowing me to do sound design without touching the physical knobs: https://github.com/zerubeus/elektron-mcp
Imagine it controlling plugins remotely, have an LLM do mastering and sound shaping with existing tools. The complex overly-graphical UIs of VSTs might be a barrier to performance there, but you could hook into those labeled midi mapping interfaces to control the knobs and levels.
I just view it as a relative minor convenience, but it's not some game-changer IMO.
The tool use / function calling thing far predates Anthropic releasing the MCP specification and it really wasn't that onerous to do before either. You could provide a json schema spec and tell the model to generate compliant json to pass to the API in question. MCP doesn't inherently solve any of the problems that come up in that sort of workflow, but it does provide an idiomatic approach for it (so there's a non-zero value there, but not much).
I would probably call it shipping containers for LLM tool integrations.
Containers are not a big deal when viewed in isolation. But when its common size/standard for all kinds of ships, cranes and trucks, it is a big deal then.
In that sense its more about gathering community around one way to do things.
In theory there are REST APIs and OpenAPI standard, but those were not made for LLMs but code.
So you usually need some kind of friendly wrapper(like for candy) on top of REST API.
It really starts to feel like a a big deal when you work in integrating LLMs with tools.
2025-04-07T18:43:26.537Z [browsermcp] [info] Initializing server...
2025-04-07T18:43:26.603Z [browsermcp] [info] Server started and connected successfully
2025-04-07T18:43:26.610Z [browsermcp] [info] Message from client: {"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-ai","version":"0.1.0"}},"jsonrpc":"2.0","id":0}
node:internal/errors:983
const err = new Error(message);
^
Error: Command failed: FOR /F "tokens=5" %a in ('netstat -ano ^| findstr :9009') do taskkill /F /PID %a
at genericNodeError (node:internal/errors:983:15)
at wrappedFn (node:internal/errors:537:14)
at checkExecSyncError (node:child_process:882:11)
at execSync (node:child_process:954:15)
I just published a new version of the @browsermcp/mcp library (version 0.1.1) that handles the error better until I can investigate further so it should hopefully work now if you're using @browsermcp/mcp@latest.
I don't see how an MCP can be useful for browsing the net and doing things like shopping as has been suggested. Large companies such as CloudFlare have spent millions on, and made a business from, bot detection and blocking.
Do we suppose they will just create a backdoor to allow _some_ bots in? If they do that how long will it be before other bots impersonate them? It seems like a bit of a fad from my small mind.
Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?
> Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?
As opposed to the Web we now have, which is heavily optimized for... wasting human life.
What you're asking for, what "large companies such as CloudFlare have spent millions on", is verifying that on the other end of the connection is a web browser, and behind that web browser there is a human being that's being made to needlessly suffer and waste their limited lifespans, as they tediously work their way through the UI maze like a good little lab rat, watching ads at every turn of the corridor, while being constantly surveilled.
Or do you believe there is some other reason why you should care about whether you're interacting with a "human" (really: an user agent called "web browser") vs. "not human" (really: any other user agent)?
The relationship between the commercial web and its users is antagonistic - businesses make money through friction, by making it more difficult for users to accomplish their goals. That's why we never got the era of APIs and web automation for users. That's why we're dealing with tons of bespoke shitty SPAs instead of consistent interfaces - because no store wants to make it easy for you to comparison-shop, or skip their upsells, or efficiently search through the stock; no news service wants you to skip ads or make focused searches, etc.
As users, we've lost the battle for APIs and continue to be forced to use the "manual web" (with active cooperation of the browser vendors, too). MCP feels promising because we're in a moment in time, however brief, where LLMs can navigate the "manual web" for us, shielding us from all the malicious bullshit (ads, marketing copy, funneling, call to actions, confusing design, dark patterns, less dark patterns, the fact that your store is a bloated SPA instead of an endpoint for a generic database querying frontend, and so on) while remaining mostly impervious to it. This will not last long - the vendors de-facto ruling the web have every reason to shut it down (or turn it around and use LLMs against us). But for now, it works.
Adversarial interoperability is the name of the game. LLMs, especially combined with tool use (and right tools), make it much easier and much more accessible than ever before. For however brief a moment.
Most thing that do this kind of fingerprinting bot detection aren't looking for a browser that's pretending to be a human, they're looking for other programs that are pretending to be a browser.
> Do we suppose they will just create a backdoor to allow _some_ bots in?
That, and maybe they will as CF seem quite big on MCP.[0] Or people just bypass the bot detection. It's already not terribly difficult to do; people in the sneaker bot and ticket scalping communities have long had bypasses for all the major companies.
I mean, we can all imagine bad use-cases of bots, but there's also the pros: the internet wastes loads of human time. I still remember needing to browse marketplaces real estate listings with terrible search and notification functionality to find a flat... shudders. Unbelievable amount of hours wasted.
If fewer people are able to build bots that can index a larger number of sites and give better searching capabilities, for instance, where sites are unable to provide this, I'm personally all for it. For many sites, it's that they lack the in-house development expertise and probably they wouldn't even mind.
Ideally, shouldn't this be the native experience of most "sites" on the internet? We've built an entire user experience around serving users rich, two dimensional visual content that is not machine-readable and are now building a natural language command line layer on top of it. Why not get rid of the middleware and present users a direct natural language interface to the application layer?
The Puppeteer MCP server doesn't work well because it requires CSS selectors to interact with elements. It makes up CSS selectors rather than reading the page and generating working selectors.
The Playwright MCP server is great! Currently Browser MCP is largely an adaptation of the Playwright MCP server to use with your actual browser rather than creating a new one each time. This allows you to reuse your existing Chrome profile so that you don't need to log in to each service all over again and avoids bot detection which often triggers when using the fresh browser instances created by Playwright.
I also plan to add other useful tools (e.g. Browser MCP currently supports a tool to get the console logs which is useful for automated debugging) which will likely diverge from the Playwright MCP server features.
Browser MCP uses the Chrome DevTools Protocol (CDP) to automate the browser so it currently only works for Chromium-based browsers.
Unfortunately, Firefox doesn't expose WebDriver BiDi (the standardized version of CDP) to browser extensions AFAIK (someone please correct me if I'm mistaken!), so I don't think I can support it even if I tried.
In the Task Automation demo, how does it know all of the attributes of the motorcycle he is trying to sell? Is it relying on the underlying LLM's embedded knowledge? But then how would it know the price and mileage? Is there some underlying document not referenced in the demo? Because that information is not in the prompt.
I just run into a bunch of errors on my Windows machine + Chrome when connected over remote-ssh. Extension installed, tab enabled, npx updated/installed, etc.
2025-04-07 10:57:11.606 [info] rmcp: Starting new stdio process with command: npx @browsermcp/mcp@latest
2025-04-07 10:57:11.606 [error] rmcp: No server info found
---
EDIT: Ended up fixing it by patching index.js. killProcessOnPort() was the problem. Can hit me up if you have questions, I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
> I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
Not that HN supports much in the way of markup, but code blocks are actually the same as Markdown: indent (by 2 spaces or more, in HN's syntax; Markdown calls for 4 or more, so they're compatible).
Thanks for the report and the update! I'd love to hear about what you changed — how can I get in touch? I didn't see anything in your HN profile. Feel free to email me at admin@browsermcp.io
I wonder if it's possible to add such plugins to election apps (e.g.: Slack).
It would be such a nice experience if I could just connect my AI of choice to a local app.
Setting this up for claude desktop and cursor was alright.
Works well out of the box with little setup, and I like that it attached to my active browser tab. Keep up the good work.
I literally started working on the same exact idea last night haha. Great work OP. I'm curious, how are you feeding the web data to the LLM? Are you just passing the entire page contents to it and then having it interact with the page based on CSS selectors/xpath? Also, what are your thoughts on letting it do its own scripting to automate certain tasks?
What I don't like about LLMs is that people keep re-inventing the wheel over and over. For example, we've been able to control browsers using GPT for about 2 years now:
none of these have stuck right. And none of them work well enough that all web dev agencies no longer have to worry about e2e testing. (or do some of them? Maybe the market is simply that inefficient).
An extension is more user-friendly! I leave Chrome open basically 24/7 and having to create a new Chrome instance via the command line just to use Browser MCP just felt like too high of a barrier.
thank you for this. Using my own browser helps me automate tasks on sites I 'd typically get detected using automation. Works like a charm! Hope you continue to work on the repo.
We work on something similar and aim to be the huggingface hub for automations you can run in your browser[0], with built-in support for MCP SSE.
Use the pre-built Trails[1][2] as MCP servers or create and publish your own with a familiar puppeteer-like API, powered by your or your friends browsers.
From Claude I have connected to these MCP servers OK : @modelcontextprotocol/server-filesystem, @executeautomation/playwright-mcp-server.
I have connected to OP's extension (browsermcp.io) from vsCode (and clicked 1 tab button OK), but not from Claude desktop so far (I get Cannot find module 'node:path'; which is require-d in npm/lib/cli.js; tried node 18,20,22; some suggestions here : https://medium.com/@aleksej.gudkov/error-cannot-find-module-... ).
> that's a great use case! the aria snapshot that browser mcp generates is enough to write tests for playwright using its role-based locators, but i may add a get_page_html tool in the same way that they're considering: https://github.com/microsoft/playwright-mcp/issues/103
Of course, you're sending data to the AI model, but the "private" aspect is contrasting automating using a local browser vs. automating using a remote browser.
When you automate using a remote browser, another service (not the AI model) gets all of the browsing activity and any information you send (e.g. usernames and passwords) that's required for the automation.
With Browser MCP, since you're automating locally, your sensitive data and browser activity (apart from the results of MCP tool calls that's sent to the AI model) stay on your device.
Cursor is currently stuck using an outdated snapshot of the VSCode Marketplace, meaning several extensions within Cursor remain affected by high-severity CVEs that have already been patched upstream in VSCode. As a result, Cursor users unknowingly remain vulnerable to known security issues.
This issue has been acknowledged but remains unresolved: https://github.com/getcursor/cursor/issues/1602#issuecomment...
Given Cursor's rising popularity, users should be aware of this gap in security updates. Until the Cursor team resolves the marketplace sync issue, caution is advised when using certain extensions.
I am surprised that the VSCode team hasn't gone after them for mirroring the marketplace, as the Visual Studio team made it very clear that they don't want anybody to do that -- it is their marketplace.
rmac|10 months ago
1) this projects' chrome extension sends detailed telemetry to posthog and amplitude:
- https://storage.googleapis.com/cobrowser-images/telemetry.pn...
- https://storage.googleapis.com/cobrowser-images/pings.png
2) this project includes source for the local mcp server, but not for its chrome extension, which is likely bundling https://github.com/ruifigueira/playwright-crx without attribution
super suss
namuorg|10 months ago
1. Yes, the extension uses an anonymous device ID and sends an analytics event when a tool call is used. You can inspect the network traffic to verify that zero personalized or identifying information is sent.
I collect anonymized usage data to get an idea of how often people are using the extension in the same way that websites count visitors. I split my time between many projects and having a sense of how many active users there are is helpful for deciding which ones to focus on.
2. The extension is completely written by me, and I wrote in this GitHub issue why the repo currently only contains the MCP server (in short, I use a monorepo that contains code used by all my extensions and extracting this extension and maintaining multiple monorepos while keeping them in sync would require quite a bit of work): https://github.com/BrowserMCP/mcp/issues/1#issuecomment-2784...
I understand that you're frustrated with the way I've built this project, but there's really nothing nefarious going on here. Cheers!
nlarew|10 months ago
arresin|10 months ago
bhouston|10 months ago
"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."
Yeah, not really.
I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.
tempest_|10 months ago
Also I assume this extension is pretty obvious so it wont take long for CF bot detection to see it the same as playwrite or whatever else.
unixfox|10 months ago
Hence why projects like this exist: https://github.com/Kaliiiiiiiiii-Vinyzu/patchright. They hide the debugging part from JavaScript.
DeathArrow|10 months ago
SkyBelow|10 months ago
mrweasel|10 months ago
"Avoids bot detection and CAPTCHAs" - Sure asshole, but understand that's only in place because of people like you. If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't. May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.
StevenNunez|10 months ago
oulipo|10 months ago
and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc
Right now if you go to ChatGPT you can't really tell it "open Google maps with my account, search for bike shops near NYC, and grab their phone numbers", because all he can do is reply in text or make images
with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc
jastuk|10 months ago
orbital-decay|10 months ago
hedgehog-ai|10 months ago
whalesalad|10 months ago
andy_ppp|10 months ago
Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!
mfkhalil|10 months ago
Would also be interested in hearing more about what you’re envisioning for your use case. Are you thinking a browser extension that acts on sites you’re already on, or some sort of shopping aggregator that lets you do this, or something else entirely?
unknown|10 months ago
[deleted]
unixfox|10 months ago
bravura|10 months ago
neilellis|10 months ago
I don't know if you've done it already, but it would be great to pause automation when you detect a captcha on the page and then notify the user that the automation needs attention. Playwright keeps trying to plough through captchas.
thenaturalist|10 months ago
Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?
I have a feeling the first time I tried it, I was fast enough clicking the "Allow for this chat" permissions, whereas by the time I clicked the permission on subsequent chats, the LLM just reports "It seems we had an issue with the click. Let me try again with a different reference.".
Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.
Same with identifying cells A1, B1, etc. and inserting into the rows.
Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.
Kudos to how smooth this experience is though, very nice setup & execution!
EDIT 2: The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(
otherayden|10 months ago
xingwu|10 months ago
example: https://x.com/xing101/status/1903391600040083488 set up: https://github.com/xing5/mcp-google-sheets
throwaway314155|10 months ago
There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.
lizardking|10 months ago
--Error: Cannot access a chrome-extension:// URL of different extension
nonethewiser|10 months ago
Every month, go to service providers, log in, find and download statement, create google doc with details filled in, download it, write new email and upload all the files. Maybe double chek the attachments are right but that requires downloading them again instead of being able to view in email).
Automating this is already possible (and a real expense tracking app can eliminate about half of this work) but I think AI tools have the potential to elminate a lot of the nittier-grittier specification of it. This is especially important because these sorts of workflows are often subject to little changes.
doug_life|10 months ago
namuorg|10 months ago
https://docs.browsermcp.io/setup-server#node-js
wetpaws|10 months ago
[deleted]
serverlessmania|10 months ago
dmix|10 months ago
Imagine it controlling plugins remotely, have an LLM do mastering and sound shaping with existing tools. The complex overly-graphical UIs of VSTs might be a barrier to performance there, but you could hook into those labeled midi mapping interfaces to control the knobs and levels.
Gehinnn|10 months ago
mgraczyk|10 months ago
amendegree|10 months ago
spmurrayzzz|10 months ago
The tool use / function calling thing far predates Anthropic releasing the MCP specification and it really wasn't that onerous to do before either. You could provide a json schema spec and tell the model to generate compliant json to pass to the API in question. MCP doesn't inherently solve any of the problems that come up in that sort of workflow, but it does provide an idiomatic approach for it (so there's a non-zero value there, but not much).
wonderwhyer|10 months ago
Containers are not a big deal when viewed in isolation. But when its common size/standard for all kinds of ships, cranes and trucks, it is a big deal then.
In that sense its more about gathering community around one way to do things.
In theory there are REST APIs and OpenAPI standard, but those were not made for LLMs but code. So you usually need some kind of friendly wrapper(like for candy) on top of REST API.
It really starts to feel like a a big deal when you work in integrating LLMs with tools.
ajcp|10 months ago
APA (Agentic Process Automation) is the new RPA, and this is definitely one example of it.
cadence-|10 months ago
2025-04-07T18:43:26.537Z [browsermcp] [info] Initializing server... 2025-04-07T18:43:26.603Z [browsermcp] [info] Server started and connected successfully 2025-04-07T18:43:26.610Z [browsermcp] [info] Message from client: {"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-ai","version":"0.1.0"}},"jsonrpc":"2.0","id":0} node:internal/errors:983 const err = new Error(message); ^
Error: Command failed: FOR /F "tokens=5" %a in ('netstat -ano ^| findstr :9009') do taskkill /F /PID %a at genericNodeError (node:internal/errors:983:15) at wrappedFn (node:internal/errors:537:14) at checkExecSyncError (node:child_process:882:11) at execSync (node:child_process:954:15)
namuorg|10 months ago
There was another comment that mentioned that there's an issue with port killing code on Windows: https://news.ycombinator.com/item?id=43614145
I just published a new version of the @browsermcp/mcp library (version 0.1.1) that handles the error better until I can investigate further so it should hopefully work now if you're using @browsermcp/mcp@latest.
FWIW, Claude Desktop currently has a bug where it tries to start the server twice, which is why the MCP server tries to kill the process from a previous invocation: https://github.com/modelcontextprotocol/servers/issues/812
cadence-|10 months ago
1. Kill your Claude Desktop app
2. Click "Connect" in the browser extension.
3. Quickly start your Calude Desktop app.
It will work 50% of the time - I guess the timing must be just right for it to work. Hopefully, the developers can improve this.
Now on to testing :)
josefrichter|10 months ago
"Go to https://news.ycombinator.com/upvoted?id=josefrichter, summarize what topics I am interested in, and then from the homepage pick articles I might be interested in."
Works like a charm.
washedDeveloper|10 months ago
makingstuffs|10 months ago
Do we suppose they will just create a backdoor to allow _some_ bots in? If they do that how long will it be before other bots impersonate them? It seems like a bit of a fad from my small mind.
Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?
Wild.
kraftman|10 months ago
https://brightdata.com/pricing/web-unlocker https://2captcha.com/pricing
TeMPOraL|10 months ago
As opposed to the Web we now have, which is heavily optimized for... wasting human life.
What you're asking for, what "large companies such as CloudFlare have spent millions on", is verifying that on the other end of the connection is a web browser, and behind that web browser there is a human being that's being made to needlessly suffer and waste their limited lifespans, as they tediously work their way through the UI maze like a good little lab rat, watching ads at every turn of the corridor, while being constantly surveilled.
Or do you believe there is some other reason why you should care about whether you're interacting with a "human" (really: an user agent called "web browser") vs. "not human" (really: any other user agent)?
The relationship between the commercial web and its users is antagonistic - businesses make money through friction, by making it more difficult for users to accomplish their goals. That's why we never got the era of APIs and web automation for users. That's why we're dealing with tons of bespoke shitty SPAs instead of consistent interfaces - because no store wants to make it easy for you to comparison-shop, or skip their upsells, or efficiently search through the stock; no news service wants you to skip ads or make focused searches, etc.
As users, we've lost the battle for APIs and continue to be forced to use the "manual web" (with active cooperation of the browser vendors, too). MCP feels promising because we're in a moment in time, however brief, where LLMs can navigate the "manual web" for us, shielding us from all the malicious bullshit (ads, marketing copy, funneling, call to actions, confusing design, dark patterns, less dark patterns, the fact that your store is a bloated SPA instead of an endpoint for a generic database querying frontend, and so on) while remaining mostly impervious to it. This will not last long - the vendors de-facto ruling the web have every reason to shut it down (or turn it around and use LLMs against us). But for now, it works.
Adversarial interoperability is the name of the game. LLMs, especially combined with tool use (and right tools), make it much easier and much more accessible than ever before. For however brief a moment.
jedimastert|10 months ago
m11a|10 months ago
That, and maybe they will as CF seem quite big on MCP.[0] Or people just bypass the bot detection. It's already not terribly difficult to do; people in the sneaker bot and ticket scalping communities have long had bypasses for all the major companies.
I mean, we can all imagine bad use-cases of bots, but there's also the pros: the internet wastes loads of human time. I still remember needing to browse marketplaces real estate listings with terrible search and notification functionality to find a flat... shudders. Unbelievable amount of hours wasted.
If fewer people are able to build bots that can index a larger number of sites and give better searching capabilities, for instance, where sites are unable to provide this, I'm personally all for it. For many sites, it's that they lack the in-house development expertise and probably they wouldn't even mind.
[0]: https://developers.cloudflare.com/agents/model-context-proto... etc
hliyan|10 months ago
buttofthejoke|10 months ago
namuorg|10 months ago
The Playwright MCP server is great! Currently Browser MCP is largely an adaptation of the Playwright MCP server to use with your actual browser rather than creating a new one each time. This allows you to reuse your existing Chrome profile so that you don't need to log in to each service all over again and avoids bot detection which often triggers when using the fresh browser instances created by Playwright.
I also plan to add other useful tools (e.g. Browser MCP currently supports a tool to get the console logs which is useful for automated debugging) which will likely diverge from the Playwright MCP server features.
Fernicia|10 months ago
namuorg|10 months ago
Unfortunately, Firefox doesn't expose WebDriver BiDi (the standardized version of CDP) to browser extensions AFAIK (someone please correct me if I'm mistaken!), so I don't think I can support it even if I tried.
unknown|10 months ago
[deleted]
DebtDeflation|10 months ago
pavelfeldman|10 months ago
https://github.com/microsoft/playwright-mcp/blob/main/src/to... https://github.com/BrowserMCP/mcp/blob/main/src/tools/tool.t...
namuorg|10 months ago
You’re right, this is an adaptation of Playwright MCP to automate the user’s local browser as mentioned in the GitHub README and here:
- https://github.com/BrowserMCP/mcp/blob/3e6824de6f36eba7d2d3b...
- https://news.ycombinator.com/item?id=43613905
Thanks for all your work to Playwright and Playwright MCP. I’m a big fan!
(For those not familiar, Pavel is the largest contributor to both Playwright and Playwright MCP: https://github.com/microsoft/playwright/graphs/contributors, https://github.com/microsoft/playwright-mcp/graphs/contribut...)
marifjeren|10 months ago
> Credits: Browser MCP was adapted from the Playwright MCP server
icelancer|10 months ago
2025-04-07 10:57:11.606 [info] rmcp: Starting new stdio process with command: npx @browsermcp/mcp@latest
2025-04-07 10:57:11.606 [error] rmcp: Client error for command spawn npx ENOENT
2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: spawn npx ENOENT
2025-04-07 10:57:11.606 [info] rmcp: Client closed for command
2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: Client closed
2025-04-07 10:57:11.606 [info] rmcp: Handling ListOfferings action
2025-04-07 10:57:11.606 [error] rmcp: No server info found
---
EDIT: Ended up fixing it by patching index.js. killProcessOnPort() was the problem. Can hit me up if you have questions, I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
deathanatos|10 months ago
Not that HN supports much in the way of markup, but code blocks are actually the same as Markdown: indent (by 2 spaces or more, in HN's syntax; Markdown calls for 4 or more, so they're compatible).
namuorg|10 months ago
sdotdev|10 months ago
aryehof|10 months ago
esafak|10 months ago
BrandiATMuhkuh|10 months ago
I wonder if it's possible to add such plugins to election apps (e.g.: Slack). It would be such a nice experience if I could just connect my AI of choice to a local app.
decayiscreation|10 months ago
chrisweekly|10 months ago
wifipunk|10 months ago
qwertox|10 months ago
unknown|10 months ago
[deleted]
ketzo|10 months ago
otherayden|10 months ago
metadat|10 months ago
Interesting research and reading via the HN search portal: https://hn.algolia.com/?q=bot+detection
behnamoh|10 months ago
- https://github.com/mayt/BrowserGPT
- https://github.com/TaxyAI/browser-extension
- https://github.com/browser-use/browser-use
- https://github.com/Skyvern-AI/skyvern
- https://github.com/m1guelpf/browser-agent
- https://github.com/richardyc/Chrome-GPT
- https://github.com/handrew/browserpilot
- https://github.com/ishan0102/vimGPT
- https://github.com/Jiayi-Pan/GPT-V-on-Web
ajcp|10 months ago
Just because the wheel exists doesn't mean we shouldn't strive to make it better by applying new knowledge and technologies to it.
dumansizsercan|10 months ago
darepublic|10 months ago
dimgl|10 months ago
betasleep|10 months ago
[deleted]
webprofusion|10 months ago
rahimnathwani|10 months ago
namuorg|10 months ago
hannofcart|10 months ago
'Avoids bot detection and CAPTCHAs by using your real browser fingerprint.'
101008|10 months ago
handfuloflight|10 months ago
mgraczyk|10 months ago
knes|10 months ago
Also works flawlessly with augment code.com too!
picardo|10 months ago
lxe|10 months ago
plessas|10 months ago
jngiam1|10 months ago
omneity|10 months ago
Use the pre-built Trails[1][2] as MCP servers or create and publish your own with a familiar puppeteer-like API, powered by your or your friends browsers.
0: https://herd.garden
1: https://herd.garden/trails/@herd/browser
2: https://herd.garden/trails/@omneity/serp
revskill|10 months ago
mvdtnz|10 months ago
iDon|10 months ago
From Claude I have connected to these MCP servers OK : @modelcontextprotocol/server-filesystem, @executeautomation/playwright-mcp-server.
I have connected to OP's extension (browsermcp.io) from vsCode (and clicked 1 tab button OK), but not from Claude desktop so far (I get Cannot find module 'node:path'; which is require-d in npm/lib/cli.js; tried node 18,20,22; some suggestions here : https://medium.com/@aleksej.gudkov/error-cannot-find-module-... ).
pknerd|10 months ago
rahimnathwani|10 months ago
xena|10 months ago
canogat|10 months ago
randunel|10 months ago
cadence-|10 months ago
tuananh|10 months ago
toutiao6|10 months ago
[deleted]
jayunit|10 months ago
jayunit|10 months ago
> that's a great use case! the aria snapshot that browser mcp generates is enough to write tests for playwright using its role-based locators, but i may add a get_page_html tool in the same way that they're considering: https://github.com/microsoft/playwright-mcp/issues/103
https://x.com/roadtoramen/status/1909356255866733044
mrwww|10 months ago
graiz|10 months ago
johnpaulkiser|10 months ago
I think this is bullshit. Isn't the dom or whatever sent to the model api?
namuorg|10 months ago
When you automate using a remote browser, another service (not the AI model) gets all of the browsing activity and any information you send (e.g. usernames and passwords) that's required for the automation.
With Browser MCP, since you're automating locally, your sensitive data and browser activity (apart from the results of MCP tool calls that's sent to the AI model) stay on your device.
throwaway81523|10 months ago
SparkyMcUnicorn|10 months ago
tntpreneur|10 months ago
unknown|10 months ago
[deleted]
justanotheratom|10 months ago
tigrezno|10 months ago
ndr|10 months ago
Cursor is currently stuck using an outdated snapshot of the VSCode Marketplace, meaning several extensions within Cursor remain affected by high-severity CVEs that have already been patched upstream in VSCode. As a result, Cursor users unknowingly remain vulnerable to known security issues. This issue has been acknowledged but remains unresolved: https://github.com/getcursor/cursor/issues/1602#issuecomment...
Given Cursor's rising popularity, users should be aware of this gap in security updates. Until the Cursor team resolves the marketplace sync issue, caution is advised when using certain extensions.
I've flagged it here, apologies for the repost: https://news.ycombinator.com/item?id=43609572
rs186|10 months ago
unknown|10 months ago
[deleted]
khana|10 months ago
[deleted]
therealesxi2i|10 months ago
[deleted]