Very interesting. I worry that if I use your cloud, and a lot of other people do, all of your IP addresses will get banned by all the big players. It will definitely be a fun cat and mouse game!
Related story: Way back in the day, PayPal was just getting started, and decided eBay transactions would be their perfect customer. The only problem is that eBay didn't allow scraping. So they built an entire proxy infrastructure to go around eBay's rules and scrape them.
It worked. It worked so well, eBay bought PayPal.
The side effect of this is that I got control of the PayPal proxy infrastructure since I was on the security team for both eBay and PayPal after the acquisition.
We used that proxy farm to scrape the rest of the web looking for fake eBay sites (because they would block traffic from eBay's IPs) and we had the guys who built it help us build proxy defense for eBay and PayPal.
So this could work in your favor, if you manage to constantly scrape a large target who might want to buy you. :)
That honestly sounds so fun! Who better to build your defences than your attackers ahaha
Tying into your concern, keeping IPs fresh and high quality will definitely be a balancing act as we get bigger. It'd be one today too if we were to try to offer super granular location controls because there's only so many proxies in X state, let alone X city. Currently, we get to aggregate & QA from multiple proxy providers, so our total pool is 300M+ IPs in the US and so far we've had a 99.95% rate of getting a fresh IP address in a session. So so far so good :)
As for that last point, I guess we'll see what the future has in store for us :P
Hello Hacker News! We’re Nas and Huss, co-founders of steel.dev (http://steel.dev). Steel is an open-source browser API for AI agents and apps. We make it easy for AI devs to build browser automation into their products without getting flagged as a bot or worrying about browser infra.
over the last year or so, we’ve built quite a few AI apps that interact with the web and noticed - a. it was magical when you could get an llm to use the web and it worked and b. our browser infra was the source of 80% of our development time. Maintaining our browser infrastructure became its own engineering challenge - keeping browser pools healthy, managing session states and cookies, rotating proxies, handling CAPTCHA solving, and ensuring clean process termination. We got really good at running browser infrastructure at scale, but maintaining it was still stealing time away from building our actual products. So we wanted to build the product we wish we had.
Steel allows you to run any automation logic on our hosted instances of chromium. When you start a dedicated browser session you get stealth, proxies, and captcha solving out of the box. We do this by exposing websocket and http endpoints so you can connect to these instances with puppeteer, playwright, selenium(in beta), or raw CDP commands if you’re built like that.
Behind the scenes, we host several browser instances and route incoming connection requests to one of these instances. Our core design principle was to allow for every session to have its own dedicated browser instance + resources (currently 2gb vram and 2gb vcpu) while still allowing for quick session creation/connection times. Our first thought was to have separate nodes running in a Kubernetes cluster, but the cost of hosting warm browser instances would be expensive (which would be reflected in the pricing), and the boot times would be too slow to handle the scale that some customers required. We got around this by deploying our browser instance image on a firecracker VM, taking advantage of the lightning-fast boot times and ability to share a root FS.
Today, we’re open-sourcing the code for the steel browser instance, with plans to open-source the orchestration layer soon. With the open-source repo, you get backwards compatibility with our node/python SDKs, a lighter version of our session viewer, and most of the features that come with Steel Cloud. You can run this locally to test Steel out at an individual session level or one-click deploy to render/railway to run remotely.
We're really happy we get to show this to you all, thank you for reading about it! Please let us know your thoughts and questions in the comments.
Very interesting. I’m not sure I immediately see your application either however I have been having similar thoughts.
After playing a popular indi game (Kenshi) I was wondering about the very simple automation interface the game relies upon. Why not a virtual world (with interfaces attaching any external source) in which business logic agents interact through the available interfaces of the environment, and other agents. Though tbh, I imagine the entire environment as implemented in layers of YAML style schemas and profiles. So all data, whether in a datastore, active instance, streamed or serialized can be related to in the same way. An envelope with attributes and content specified by the type attribute. The only code would then be the rendering environment, and whatever these agents call for stream processing.
Sort of a gamification of automation, though what can’t be beat is dead simple account of what any one thing is doing at a given time.
Can you say more about your product plans? It looks like this is directly competing with Browserbase — how will you differentiate? Looking forward to seeing how the product and company grows
Our end goal is to build the LLM OS. We think the best place to start is the hardest but most hairy problem - getting agents to use an internet design for humans on our behalf. So, as we go, we want to keep building towards relieving blockers by handling auth securely, translating webpages, and creating the right toolset for driving sessions. And we want to do this in an open-source and communal way. That's also why we're attacking it bottom-up with infra, so we can build a community of people much smarter than us around getting there. Join the discord if you want to stay on top of our journey :)
Happy to see there's a way to get browser automation for AI without building infrastructure to support it. Yet I don't see examples of connecting an LLM to drive a web session, just examples of using Puppeteer or Playwright or Selenium to drive a web session. Presumably your user base knows how to write custom code for an interface between Claude or OpenAI API and Puppeteer/Playwright/Selenium. Sadly, I don't know how to do that. Would it be fair to expect your documentation to help? What would you suggest to get started?
Is the interface between Steel, or Puppeteer/Playwright/Selenium, something that might be implemented in the new Anthropic Model Context Protocol, so there's less custom code required?
Good point! The space is so early, and it's 100% on us to help people get started building web agents. We're actually re-working this repo (+ a tutorial with it): https://github.com/steel-dev/claude-browser - which implements a web agent by reworking the claude computer use repo + page screenshots for vision.
We also have more AI-specific examples, tutorials, and an MCP server coming out really soon (like really soon).
You can keep an eye out on our releases on discord/twitter where we'll be posting a bunch of these example repos.
I’d recommend checking out Stagehand if you want to use something that’s more AI first! It’s like the AI powered successor to playwright: https://github.com/browserbase/stagehand
We don't compete with Puppeteer/Playwright/Selenium but we're focused on solving the problem of hosting Chromium browsers in the cloud.
The workflow jump we've seen ourselves and from customers is that it's pretty straightforward to build these scripts locally, but the moment you want to run them in prod, you now need to deal with a ton of overhead around packaging up to docker, resource management, scaling, etc. We want to make that trivial :)
ah, we could definitely do better in communicating that. Most of the work is ahead of us but we currently help ai devs in a few ways:
- designing with sessions as the atomic unit -- we wanted them to be spun up and discarded in a serverless way w quick boot times. That way you would have these quick or long-running sessions that agents can just jump in and out of or as a building block for custom tools. We also expose debug urls so you can embed viewable sessions into you app in a Devin-esque UI.
- In the Open-source API, we expose some features that allow you to turn webpages into more LLM-friendly formats for consumption like: markdown, readability, screenshots, and pdfs.
It does just so happen that it ends up being useful for non AI apps as well
Haven't tried to build one myself, but that would be a cool project.
If it's a login page that gets redirected to when you try to access a page on your browser, then theoretically you should be able to do that with the open source repo and custom AI logic.
Having to constantly fill out those login pages (especially for networks that expire) is such a pain so would love to use this myself lol
A browser extension would have to run on individual/personal browser; we're focused on providing infrastructure to run hundreds of browsers that you can connect to with code -- whether for automation or information retrieval.
Curious, what made you think browser extension first? :)
will definitely check this out. i saw the pricing model on the site, what's the motivation around being open-source here if you're providing the infra free of charge?
Yeah, good question! The reason we went open-source is really about transparency and flexibility. We want people to trust what they’re using and have the option to self-host if that’s what makes the most sense for them. Open-sourcing the browser API also lets us build a better product with input and contributions from the community—it’s a win-win.
As for the infra, it’s not totally free—our managed service is there for anyone who doesn’t want the hassle of hosting and scaling everything themselves (although it does have a generous free plan). Going open-source just gives people the choice: run it yourself if you want full control, or use our managed option for convenience and scale. It’s all about making browser automation more accessible without forcing people into one path.
You can use any model on top of us! We only provide browser infrastructure, so multiple warm browsers that you can connect to and drive using puppeteer, playwright, selenium, etc.
So you would just need to add some wrapper code and call those actions using a model of your choice.
We're working on more examples and guides that show how you can do this, so keep an eye out on our releases or hop in the discord and you should see some specific examples very soon!
Hey! So we don't explicitly support perimeterX, but some of our users have been able to consistently bypass perimeterX using custom proxies (and some of our own as well).
We'll probably take a deeper look at this in the next few weeks and expand on what we work with vs what we don't.
jedberg|1 year ago
Related story: Way back in the day, PayPal was just getting started, and decided eBay transactions would be their perfect customer. The only problem is that eBay didn't allow scraping. So they built an entire proxy infrastructure to go around eBay's rules and scrape them.
It worked. It worked so well, eBay bought PayPal.
The side effect of this is that I got control of the PayPal proxy infrastructure since I was on the security team for both eBay and PayPal after the acquisition.
We used that proxy farm to scrape the rest of the web looking for fake eBay sites (because they would block traffic from eBay's IPs) and we had the guys who built it help us build proxy defense for eBay and PayPal.
So this could work in your favor, if you manage to constantly scrape a large target who might want to buy you. :)
huss97|1 year ago
Tying into your concern, keeping IPs fresh and high quality will definitely be a balancing act as we get bigger. It'd be one today too if we were to try to offer super granular location controls because there's only so many proxies in X state, let alone X city. Currently, we get to aggregate & QA from multiple proxy providers, so our total pool is 300M+ IPs in the US and so far we've had a 99.95% rate of getting a fresh IP address in a session. So so far so good :)
As for that last point, I guess we'll see what the future has in store for us :P
j45|1 year ago
huss97|1 year ago
over the last year or so, we’ve built quite a few AI apps that interact with the web and noticed - a. it was magical when you could get an llm to use the web and it worked and b. our browser infra was the source of 80% of our development time. Maintaining our browser infrastructure became its own engineering challenge - keeping browser pools healthy, managing session states and cookies, rotating proxies, handling CAPTCHA solving, and ensuring clean process termination. We got really good at running browser infrastructure at scale, but maintaining it was still stealing time away from building our actual products. So we wanted to build the product we wish we had.
Steel allows you to run any automation logic on our hosted instances of chromium. When you start a dedicated browser session you get stealth, proxies, and captcha solving out of the box. We do this by exposing websocket and http endpoints so you can connect to these instances with puppeteer, playwright, selenium(in beta), or raw CDP commands if you’re built like that.
Behind the scenes, we host several browser instances and route incoming connection requests to one of these instances. Our core design principle was to allow for every session to have its own dedicated browser instance + resources (currently 2gb vram and 2gb vcpu) while still allowing for quick session creation/connection times. Our first thought was to have separate nodes running in a Kubernetes cluster, but the cost of hosting warm browser instances would be expensive (which would be reflected in the pricing), and the boot times would be too slow to handle the scale that some customers required. We got around this by deploying our browser instance image on a firecracker VM, taking advantage of the lightning-fast boot times and ability to share a root FS.
Today, we’re open-sourcing the code for the steel browser instance, with plans to open-source the orchestration layer soon. With the open-source repo, you get backwards compatibility with our node/python SDKs, a lighter version of our session viewer, and most of the features that come with Steel Cloud. You can run this locally to test Steel out at an individual session level or one-click deploy to render/railway to run remotely.
We're really happy we get to show this to you all, thank you for reading about it! Please let us know your thoughts and questions in the comments.
overu589|1 year ago
After playing a popular indi game (Kenshi) I was wondering about the very simple automation interface the game relies upon. Why not a virtual world (with interfaces attaching any external source) in which business logic agents interact through the available interfaces of the environment, and other agents. Though tbh, I imagine the entire environment as implemented in layers of YAML style schemas and profiles. So all data, whether in a datastore, active instance, streamed or serialized can be related to in the same way. An envelope with attributes and content specified by the type attribute. The only code would then be the rendering environment, and whatever these agents call for stream processing.
Sort of a gamification of automation, though what can’t be beat is dead simple account of what any one thing is doing at a given time.
HatchedLake721|1 year ago
Oras|1 year ago
orliesaurus|1 year ago
capiki|1 year ago
huss97|1 year ago
DanielKehoe|1 year ago
Is the interface between Steel, or Puppeteer/Playwright/Selenium, something that might be implemented in the new Anthropic Model Context Protocol, so there's less custom code required?
ohthatsnas|1 year ago
We also have more AI-specific examples, tutorials, and an MCP server coming out really soon (like really soon).
You can keep an eye out on our releases on discord/twitter where we'll be posting a bunch of these example repos.
pkiv|1 year ago
(I am one of the authors!)
cranberryturkey|1 year ago
huss97|1 year ago
We don't compete with Puppeteer/Playwright/Selenium but we're focused on solving the problem of hosting Chromium browsers in the cloud.
The workflow jump we've seen ourselves and from customers is that it's pretty straightforward to build these scripts locally, but the moment you want to run them in prod, you now need to deal with a ton of overhead around packaging up to docker, resource management, scaling, etc. We want to make that trivial :)
gregpr07|1 year ago
huss97|1 year ago
potamic|1 year ago
huss97|1 year ago
It does just so happen that it ends up being useful for non AI apps as well
Oras|1 year ago
Btw, there is inconsistency between pricing page and pricing on docs.
Pricing page for developers is $59 Pricing in docs for developers is $99
huss97|1 year ago
great catch on pricing mismatch, just fixed, thank you :)
benzguo|1 year ago
marclave|1 year ago
https://scalar.com https://github.com/scalar/scalar
(disclaimer: im the co-founder & ceo of scalar)
amelius|1 year ago
ohthatsnas|1 year ago
If it's a login page that gets redirected to when you try to access a page on your browser, then theoretically you should be able to do that with the open source repo and custom AI logic.
Having to constantly fill out those login pages (especially for networks that expire) is such a pain so would love to use this myself lol
meiraleal|1 year ago
ohthatsnas|1 year ago
Curious, what made you think browser extension first? :)
doritopope|1 year ago
ohthatsnas|1 year ago
As for the infra, it’s not totally free—our managed service is there for anyone who doesn’t want the hassle of hosting and scaling everything themselves (although it does have a generous free plan). Going open-source just gives people the choice: run it yourself if you want full control, or use our managed option for convenience and scale. It’s all about making browser automation more accessible without forcing people into one path.
puppycodes|1 year ago
ohthatsnas|1 year ago
So you would just need to add some wrapper code and call those actions using a model of your choice.
We're working on more examples and guides that show how you can do this, so keep an eye out on our releases or hop in the discord and you should see some specific examples very soon!
tndibona|1 year ago
ohthatsnas|1 year ago
We'll probably take a deeper look at this in the next few weeks and expand on what we work with vs what we don't.