top | item 44515403

MCP-B: A Protocol for AI Browser Automation

336 points| miguelspizza | 8 months ago |mcp-b.ai | reply

184 comments

order
[+] jacquesm|8 months ago|reply
Prediction: this will go the same way as RSS. Companies don't like you to be in control of how you use their data.
[+] TeMPOraL|8 months ago|reply
Indeed. Though I guess a better example would be: it'll go the same way as REST APIs (which happen to be fundamentally the same thing as MCP anyway).

Remember the time when REST was the new hot thing, everyone started doing API-first design, and people thought it'll empower people by letting programs navigate services for them programmatically? Remember when "mashups" were the future?

It all died before it could come to pass, because businesses quickly remembered that all their money comes specifically from denying users those capabilities.

[+] SquareWheel|8 months ago|reply
Isn't RSS a smashing success? I changed readers after Google Reader died, but otherwise, my feeds have been working seamlessly for nearly 20 years. I rarely meet a site with updates that doesn't support RSS.
[+] latexr|8 months ago|reply
> Prediction: this will go the same way as RSS.

Meaning what? RSS remains ubiquitous. It’s rare to find a website which doesn’t support it, even if the owners don’t realise it or link to it on their page. RSS remains as useful as it ever was. Even if some websites only share partial post content via RSS, it’s still useful to know when they are available (and can be used as an automation hook to get the full thing).

RSS is alive and well. It’s like if you wrote “this will go the same way as the microwave oven”.

[+] wkat4242|8 months ago|reply
It doesn't matter. Soon the AI will be able to click and scroll like a normal user. It's going to be another arms race.
[+] theptip|8 months ago|reply
Maybe, but the market structure has inverted and the big guys now want to be in the intelligence layer, not content. (Content is being commoditized.)

Google can still sell ads as long as they own the eyeballs and the intelligence that’s engaging them.

Google did not want you using RSS because it cut out Google Search.

[+] worldsayshi|8 months ago|reply
Unless it becomes useful enough that customers will go through the hassle of switching to companies that are "AI-ready".
[+] fzysingularity|8 months ago|reply
The contributions for the Github project is quite intriguing: https://github.com/MiguelsPizza/WebMCP/graphs/contributors

MiguelsPizza | 3 commits | 89++ | 410--

claude | 2 commits | 31,799++ | 0--

[+] miguelspizza|8 months ago|reply
I did some git history re-visioning when I closed sourced the extension for a bit. So these are not super accurate. Claude code did write about 85% of the code though.
[+] efitz|8 months ago|reply
You’re going to see this pattern a lot more in the future.
[+] consumer451|8 months ago|reply
Claude's contributions graph is interesting. What is going on here? Does Claude Code commit as itself sometimes, but extremely rarely? I don't understand.

https://github.com/claude

[+] mehdibl|8 months ago|reply
From the blog post:

"The Auth problem At this point, the auth issues with MCP are well known. OAuth2.1 is great, but we are basically trying to re-invent auth for agents that act on behalf of the user. This is a good long term goal, but we are quickly realizing that LLM sessions with no distinguishable credentials of their own are difficult to authorize and will require a complete re-imagining of our authorization systems. Data leakage in multi-tenant apps that have MCP servers is just not a solved problem yet.

I think a very strong case for MCP is to limit the amount of damage the model can do and the amount of data it will ever have access to. The nice thing about client side APIs in multi-tenant apps is they are hopefully already scoped to the user. If we just give the model access to that, there's not much damage they can do.

It's also worth mentioning that OAuth2.1 is basically incompatible with internal Auth at Amazon (where I work). I won't go to much into this, but the implications of this reach beyond Amazon internal."

1. Oauth is not working in Amazon ==> need solution.

2. Oauth are difficult to authorize

3. limit the amount of damage the model can do WHILE "ulti-tenant apps is they are hopefully already scoped to the user".

I feel from a security side there is an issue here in this logic.

Oauth for apps can be far more tuned than current web user permission as usually, user have modification permission, that you may not want to provide.

Oauth not implemented in Amazon, is not really an issue.

Also this means you backdoor the App with another APP you establish trust with it. ==> This is a major no go for security as all actions on MCP app will be logged in the same scope as USER access.

You might just copy your session ID/ Cookie and do the same with an MCP.

I may be wrong the idea seem intersting but from a security side, I feel it's a bypass that will have a lot of issues with compliance.

[+] miguelspizza|8 months ago|reply
Not sure I understand. The model has no more access than the user does. proper security implementation still lies with the website owner
[+] slt2021|8 months ago|reply
Could all of this be replaced simply by publishing OpenAPI (Swagger) spec and using universal swagger mcp client ???

This basically leaves up to the user to establish authenticated session manually.

Assuming claude is smart enough to pick up API key from prompt/config, and can use swagger based api client, wouldnt that be the same?

[+] miguelspizza|8 months ago|reply
That was everyone's first thought when MCP came out. Turns out it doesn't work too well since there is generally too many tools. People are doing interesting work in this space though
[+] nilslice|8 months ago|reply
pls don't put an api key in a prompt
[+] loandbehold|8 months ago|reply
I found i can have Claude Code consume API just by giving it link to swagger.json in CLAUDE.md. it's very useful for adhoc testing.
[+] SchemaLoad|8 months ago|reply
Not sure who the intended user is here? For frontend testing you actually do somewhat want the tests to break when the UI changes in major ways. And for other automation you'd be better off providing an actual API to use.
[+] nicman23|8 months ago|reply
scrappers and me buying milk with a vlm
[+] throwanem|8 months ago|reply
> If I asked you to build a table and gave you a Home Depot you probably would have a harder time than if I gave you a saw, a hammer and some nails.

I doubt that, first and not least because Home Depot stocks lumber.

[+] bobmcnamara|8 months ago|reply
Home Depot also sells tables.
[+] leptons|8 months ago|reply
You're supposed to "hallucinate" the lumber.
[+] latexr|8 months ago|reply
And, I imagine, Home Depot might have better and more precision tools available, plus professionals who know how to use them.
[+] Abishek_Muthian|8 months ago|reply
I’ve haven’t used any MCP so far but as a disabled person I see use cases in accessibility for MCPs doing browser/smartphone automation.

But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

Has anyone tried any MCP for improving accessibility?

[+] krashidov|8 months ago|reply
> But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

How so?

[+] orliesaurus|8 months ago|reply
I don't get it from the homepage, feels like Selenium on the browser, since you built it can you explain ?
[+] miguelspizza|8 months ago|reply
Similar but also very different. Playwright and Selenium are browser automation frameworks. There is a Playwright-MCP server which let's your agent use Playwright for browser automation.

MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.

Instead of visual parsing like Playwright, you get standard deterministic function calls.

You can see the blog post for code examples: https://mcp-b.ai/blogs

[+] lewisjoe|8 months ago|reply
This looks promising - thanks for open-sourcing this. This addresses the gap that most work happens in browsers while MCP assumes that work happens with AI clients.

I have a fundamental question though: how is it different from directly connecting my web app's JS APIs with tool calling functions and talking directly with a LLM server with tool-call support?

Is it the same thing, but with a protocol? or am I missing the bigger picture?

[+] miguelspizza|8 months ago|reply
Np thanks for reading! The difference is with MCP-B you don't have to integrate or maintain any AI chat functionality yourself.

It's a protocol which allows the user to bring their own model to interact with the tools on your website

[+] Flux159|8 months ago|reply
This is an interesting take since web developers could add mcp tools into their apps rather than having browser agents having to figure out how to perform actions manually.

Is the extension itself open source? Or only the extension-tools?

In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?

[+] muratsu|8 months ago|reply
This puts the burden on the website owner. If I go through the trouble of creating and publishing an MCP server for my website, I assume that through some directory or method I'll be able to communicate that with consumers (browsers & other clients). It would be much more valuable for website owners if you can automate the MCP creation & maintenance.
[+] p0w3n3d|8 months ago|reply
I can see with my prophetic/logic eyes that free models will start to require captcha because of people start using MCP to automate browsers to use free LLMs. But captchas are ineffective against LLM so LLMs will fight automated LLMs from using them...

Sounds like a very strange world of robots fighting robots

[+] falcor84|8 months ago|reply
In the stories, the robots eventually realize that they actually share common goals ...
[+] handfuloflight|8 months ago|reply
Would it be possible to do this with any arbitrary website since we can execute JS client side?
[+] abrookewood|8 months ago|reply
Looks similar to Elixir's Tidewave MCP server, which currently also supports Ruby: https://tidewave.ai/

Paraphrasing: Connect your editor's assistant to your web framework runtime via MCP and augment your agentic workflows and chats with: Database integration; Logs and runtime introspection; Code evaluation; and Documentation context.

Edit: Re-reading MCP-B docs, that is more geared towards allowing visitors to your site to use MCP, while Tidewave is definitely focussed on Developers.

[+] Johnny_Bonk|8 months ago|reply
So if I'm using claude code and developing a web app, its running on localhost:3000, can I use claude code to basically get ui information, browser console logs and other web dev feedback and useful information? Cause I installed it and added that file but all I see is the 55 tools and 6 apis when i open the browser extension. not the stuff i need. and i also installed the extension tools i think it was called.
[+] miguelspizza|8 months ago|reply
Ah maybe I should make that more clear. The web app is an example of a MCP-B server and the extension is a client. When you visit MCP-b.ai with the extension, it's tools will register
[+] miguelspizza|8 months ago|reply
Hey HN,

This was an idea I had while trying to build MCP servers internally at Amazon. Today I am open sourcing it. TLDR it's an extension of the Model Context Protocol which allows you to treat your website as an MCP server which can be discovered and called by MCP-B compliant web extensions.

You can read a more detailed and breakdown here (with gifs): https://mcp-b.ai/blogs

[+] xnx|8 months ago|reply
AI automation is exciting because it doesn't require any cooperation from the site.

It's nice when a site is user friendly (RSS, APIs, obvious JSON, etc.) but it is more powerful to be self sufficient.

[+] netrem|8 months ago|reply
The product seems interesting, but the landing page I found very chaotic and gave up reading it. The individual pieces of information are fine I think, but the flow is poor and some info repeats. Was it AI generated?
[+] nurettin|8 months ago|reply
This gave me an idea. Instead of writing/maintaining servers and whatnot, why not just open the browser and give [$LLM] access to the development port and let it rip using the puppeteer protocol?