top | item 44962869

Closer to the Metal: Leaving Playwright for CDP

182 points| gregpr07 | 6 months ago |browser-use.com

118 comments

order

dataviz1000|6 months ago

I made this comment yesterday but really applies to this conversation.

> In the past 3 weeks I ported Playwright to run completely inside a Chrome extension without Chrome DevTools Protocol (CDP) using purely DOM APIs and Chrome extension APIs, I ported a TypeScript port of Browser Use to run in a Chrome extension side panel using my port of Playwright, in 2 days I ported Selenium ChromeDriver to run inside a Chrome Extension using chrome.debugger APIs which I call ChromeExtensionDriver, and today I'm porting Stagehand to also run in a Chrome extension using the Playwright port. This is following using VSCode's core libraries in a Chrome extension and having them drive a Chrome extension instead of an electron app.

The most difficult part is managing the lifecycle of Windows, Pages, and Frames and handling race conditions, in the case of automating a user's browser, where, for example, the user switches to another tab or closes the tab.

nikisweeting|6 months ago

Extensions are ok but they have limitations too, for example you cannot use extensions to automate other extensions.

We need the agent to be able to drive 1password, Privacy.com, etc. to request per-task credentials, change adblock settings, get 2fa codes, and more.

The holy grail really is CDP + control over browser launch flags + an extension bridge to get to the more ergonomic `chrome.*` APIs. We're also working on a custom Chromium fork.

wonger_|6 months ago

What is the benefit of porting all those tools to extensions? Have you ran into any other extension-based challenges besides lifecycles and race conditions?

Tsarp|6 months ago

Wouldnt having chrome.debugger=true also flag your requests?

sandGorgon|6 months ago

is this open source ? just curious to see this. sounds fascinating!

keepamovin|6 months ago

Yes, that is the most difficult part. But none of the frameworks adequately handle that and that was a major reason I’ve used CDP since the beginning.

From day one with BrowserBox, we have been using CDP, unadulterated by any higher abstractions. Despite the apparent risk that terrifying changes in the tip-of-tree protocol would lead to disastrous code migrations, none of that ever occurred. The most rewritten code in the application is consistently user interface and core features.

Over the nearly 8 years of BrowserBox’s existence, CDP-related changes due to domain and method deprecations, or subtle changes in parameters or behavior, have been only a very minor maintenance burden. A similar parallel could probably be drawn by examining the Chrome DevTools front-end, another gold-standard CDP-based application, and even digging into its commit history to see how often changes regarding CDP were actually due to protocol-breaking changes.

That was my sense when I began this project: that the protocol is not going to change that much, and we can handle it. My other reason for not choosing Puppeteer or Playwright was that I was dissatisfied with the abstractions they imposed atop CDP, and I found them insufficiently expressive or flexible for the actual demanding use cases of virtualizing a browser in all its aspects — including multiple tabs, managing and bookkeeping all of that state required to do that.

The CDP protocol is still the gold standard for browser instrumentation. It would be nice if Firefox had not deprecated support, and it would be even nicer if WebDriver BiDi was a sufficient and adequate replacement for CDP, which for now it is not. The behavior, logic, and abstractions of CDP are well thought out and highly appropriate for its problem domain. It’s like separating a browser’s engine from its user interface, which is one of the core things BrowserBox accomplishes.

Working with CDP is apparently “difficult,” but that’s just another myth. It’s incredibly easy to write a hundred-or-so-line promise-resolving logic library to ensure you get responses. I’ve done this, and it works. I have used CDP alone in two major, thousands-of-stars, thousands-of-users, significant browser-related projects (the other is DiskerNet), and I have never regretted that choice, nor ever wished that I had switched to Puppeteer or Playwright.

That said, I think the sweet spot for Puppeteer and Playwright is quickly putting together not-overly-complex automation tasks, or other specific browser-instrumentation-related tasks with a fairly narrow scope. The main reason I used CDP was because I wanted the power of access to the full protocol, and I knew that would be the best choice — and it was.

So if your browser-related project is going to require deep integration with the browser and access to everything that’s exposed, don’t even think twice about using CDP. Just use it. The only caveat I would make to that is: keep an eye on whether WebDriver BiDi capabilities become sufficient for your use case, and seriously consider a WebDriver BiDi implementation, because that gives you a broader swath of browsers you’ll be able to use as the engine.

[1] https://chromedevtools.github.io/devtools-protocol/

[2] https://github.com/ChromeDevTools/devtools-frontend

[3] https://w3c.github.io/webdriver-bidi/

[4] https://pptr.dev/

[5] https://playwright.dev/

arm32|6 months ago

Ah, yes, the classic "Playwright isn't fast enough so we're reinventing Puppeteer" trope. I'd be lying if I haven't seen this done a few times already.

Now that I got my snarky remark out of the way:

Puppeteer uses CDP under the hood. Just use Puppeteer.

haolez|6 months ago

I've seen a team implement Go workers that would download the HTML from a target, then download some of the referenced JavaScript files, then run these JavaScript files in an embedded JavaScript engine so that they could consume less resources to get the specific things that they needed without using a full browser. It's like a browser homunculus! Of course, each new site would require custom code. This was for quant stuff. Quite cool!

boredtofears|6 months ago

Is the case for playwright over puppeteer just in it's crossbrowser support?

We're currently using Cypress for some automated testing on a recent project and its extremely brittle. Considering moving to playwright or puppeteer but not sure if that will fix the brittleness.

nikisweeting|6 months ago

sir we are a python library, puppeteer-python was abandoned, how exactly do you propose we use puppeteer?

wredcoll|6 months ago

Wait, does playwright not use cdp? What does it do?!

steveklabnik|6 months ago

Describing "2011–2017" as "the dark ages" makes me feel so old.

There was a ton of this stuff before Chrome or WebKit even existed! Back in my day, we used Selenium and hated it. (I was lucky enough to start after Mercury...)

hugs|6 months ago

selenium creator here. hi!

vasusen|6 months ago

2011 were definitely not the dark ages!! I used to use Selenium for everything back in the day. I was able to scrape all of Wikipedia in 2011 entirely on my laptop and pipe it to Stanford NLTK to create a very cool adjective recommender for nouns.

fzzzy|6 months ago

Lol I came here to write this exact comment about the dark ages and selenium. I, too, feel old.

benmmurphy|6 months ago

direct CDP has been used by the scraping community for a long time in order to have a cleaner browser environment that is harder to fingerprint. for example nodriver (https://github.com/ultrafunkamsterdam/nodriver) was started in Feb 2024 and I suspect this technique was popular before that project started.

gregpr07|6 months ago

I really like both nodriver and pydoll. I am definitely keeping the option of switching to them open, but we just wanted to have full control for now and see how painful CDP-use is to maintain first and then reconsider.

ipsum2|6 months ago

Nice thorough write up, I've had my share of annoyances with playwright for automating some menial tasks due to being blocked by captcha or other waf (I'm just logging into my own accounts and scraping my account balance, nothing nefarious), I'll try out pydoll or your library next time.

spullara|6 months ago

this is exactly what I did when I wrote my first agent with scraping. later we switched to taking control of the users browser through a browser extension.

johnsmith1840|6 months ago

Why not cdp snapshot?

nikisweeting|6 months ago

What do you mean? We use CDP page snapshots extensively to get full html across frames but it's not nearly enough on its own, there are lots of checks still needed for individual OOPIFs or elements.

Robdel12|6 months ago

All of the approaches of driving the browser outside of the browser is going to be slow (webdriver, playwright, puppeteer, etc).

Karma like approaches are where I’m at (execute in the browser)

appcustodian2|6 months ago

> All of the approaches of driving the browser outside of the browser is going to be slow

Why? I would think any cross-process communication through the CDP websocket would have imperceptible overhead compared to what already takes long in the browser: a ton of HTTP I/O

What is Karma? What are you executing in the browser?

nikisweeting|6 months ago

CDP rountrip time on a local machine is 100µs (0.1ms), it's not slow haha

patrickhogan1|6 months ago

Selenium was very usable before 2011.

This post is like saying Grafana and not mentioning Nagios

nikisweeting|6 months ago

It was, but I feel like the advent of headless browsers marked a step function explosion in browser automation. Also any earlier than 2010 is when I was like 13yo, so it's more like "the dark ages in my own memory" than "objectively dark ages in automation history".

aitchnyu|6 months ago

Umm, will this run on Firefox too? They deprecated CDP and favors Webdriver Bidi.

hugs|6 months ago

i like that the post uses the phrase "time is a flat circle". it is indeed. once upon a time, most devs only cared about one browser -- internet explorer. then for a good chunk of time, cross-browser compatibility was highly valued. now, most devs only care about one browser -- google chrome.

it's a bummer, but also a market reality... the best way to get more devs to care about non-chrome browsers is to get more people to use non-chrome browsers. easier said than done, though.

nikisweeting|6 months ago

No we are not planning to support Firefox. We do support Brave, Edge, and ungoogled-chromium though if you have a problem with Google.

saberience|6 months ago

Talk about "not built here" mentality. This is a project doomed to failure. Using VC money to re-write better built software which has been around for years.

Good luck guys!

johnsmith1840|6 months ago

From their blog its not obvious the value but pure cdp as a framework is powerful for other reasons. If you have very high performace requirements it makes sense.

I build something like an automation system pure cdp to shave ms off. But I'm a real time user interaction system plus automation not pure ai automation.

Doesn't make much sense to shave ms when an LLM call is hundreds of ms ans that's the only "user"

Tostino|6 months ago

Exactly what I was thinking. Instead of attempting to contribute back to Playwright to fix those hangups, or even creating a private patch to do so as a POC, they went right to building their own framework from scratch.

That isn't how you launch a product.