Show HN: Finic – Open source platform for building browser automations
143 points| jasonwcfan | 1 year ago |github.com
This was our launch: https://news.ycombinator.com/item?id=36032081
We recently decided to revive and rebrand the project after seeing a sudden spike in interest from people who wanted to connect LLMs to data - but specifically through browsers. It's also a problem we've experienced firsthand, having built scraping features into Psychic and previously working on bot detection at Robinhood.
If you haven’t built a web scraper or browser automation before, you might assume it’s very straightforward. People have been building scrapers for as long as the internet has existed, so there must be many tools for the job.
The truth is that web scraping strategies need to constantly adapt as web standard change, and as companies that don’t want to be scraped adopt new technologies to try and block it. The old standards never completely go away, so the longer the internet exists, the more edge cases you’ll need to account for. This adds up to a LOT of infrastructure that needs to be set up and a lot of schlep developers have to go through to get up and running.
Scraping is no easier today than it was 10 years ago - the problems are just different.
Finic is an open source platform for building and deploying browser agents. Browser agents are bots deployed to the cloud that mimic the behaviour of humans, like web scrapers or remote process automation (RPA) jobs. Simple examples include scripts that scrape static websites like the SEC's EDGAR database. More complex use cases include integrating with legacy applications that don’t have public APIs, where the best way to automate data entry is to just manipulate HTML selectors (EHRs for example).
Our goal is to make Finic the easiest way to deploy a Playwright-based browser automation. With this launch, you can already do so in just 4 steps. Check out our docs for more info: https://docs.finic.io/quickstart
ghxst|1 year ago
jasonwcfan|1 year ago
This actually creates an evergreen problem that companies need to overcome, and our paid version will probably involve helping companies overcome these barriers.
Also I should clarify that we're explicitly not trying to build a playwright abstraction - we're trying to remain as unopinionated as possible about how developers code the bot, and just help with the network-level infrastructure they'll need to make it reliable and make it scale.
It's good feedback for us, we'll make that point more clear!
suriya-ganesh|1 year ago
If I remember correctly, Skyvern also has an implementation of scaling these browser tasks built in.
ps. Is it not called Robotic Process Automation? First time I'm hearing it as Remote process Automation.
[1]https://github.com/ProductLoft/arachne
[2]https://www.skyvern.com/
[3]https://github.com/reworkd/tarsier
mdaniel|1 year ago
ayanb9440|1 year ago
Based on the feedback in this thread we're going to be releasing an updated version that focuses more around tooling for the browser agents themselves as opposed to scaling/scheduling, so stay tuned for that!
mdaniel|1 year ago
dataviz1000|1 year ago
If you don't already have this feature for your system, I would recommend it.
ghxst|1 year ago
ayanb9440|1 year ago
What does this check look like for you? Do you just diff the html to see if there are any changes?
Oras|1 year ago
jasonwcfan|1 year ago
Thanks for the feedback! I just updated the repo to make it more clear that it's Playwright based. Once my cofounder wakes up I'll see if he can re-record the video as well.
mdaniel|1 year ago
I have never, ever understood anyone who goes to the trouble of booting up a browser, and then uses a python library to do static HTML parsing
Anyway, I was surfing around the repo trying to find what, exactly "Safely store and access credentials using Finic’s built-in secret manager" means
ayanb9440|1 year ago
0x3444ac53|1 year ago
msp26|1 year ago
krick|1 year ago
_boffin_|1 year ago
- connect to it remotely
- ghost cursor and friends
- save cookies and friends to data dir
- run from residential ip
- if get served captcha or cloudflare, direct to solver and to then route back.
- mobile ip if possible
…can’t go into anymore specifics than that
…I forget the site right now, but there a guy that gives a good rundown of this stuff. I’ll see id I can find it.
thealchemi1st|1 year ago
sebmellen|1 year ago
djbusby|1 year ago
It seems that some sites can determine when using headless or web-driver enabled profile.
Sometimes I'm through a VPN.
The automation is the easy part.
_boffin_|1 year ago
One thing I’ve also been doing recently when I find a site that I just want an api is just use python and execute a curl via python. I populate the curl from chrome’s network tab. I also have a purpose built extension I have in my browser that saves cookies to a lan Postgres DB and then the use those values for the script.
Can even probably do more by automating the browser to navigate there on failure.
kfrzcode|1 year ago
bobbylarrybobby|1 year ago
iansinnott|1 year ago
This is not always possible, but if the product in question has a mobile app or a wearable talking to a server, you might be able to utilize the same API it's using:
- intercept requests from the device - find relevant auth headers/cookies/params - use that auth to access the API
whilenot-dev|1 year ago
lambdaba|1 year ago
whatnotests2|1 year ago
I can see a few years from now almost all web traffic is agents.
jasonwcfan|1 year ago
I don't think the dead internet theory is true today, but I think it will be true soon. IMO that's actually a good thing, more agents representing us online = more time spent in the real world.
j0r0b0|1 year ago
Your sign up flow might be broken. I tried creating an account (with my own email), received the confirmation email, but couldn't get my account to be verified. I get "Email not confirmed" when I try to log in.
Also, the verification email was sent from accounts@godealwise.com, which is a bit confusing.
jasonwcfan|1 year ago
ayanb9440|1 year ago
skeptrune|1 year ago
ayanb9440|1 year ago
computershit|1 year ago
jasonwcfan|1 year ago
If you're trying to build an agent for a long-running job like that, you run into different problems: - Failures are magnified as a workflow has multiple upstream dependencies and most scraping jobs don't. - You have to account for different auth schemes (Oauth, password, magic link, etc) - You have to implement token refresh logic for when sessions expire, unless you want to manually login several times per day
We don't have most of these features yet, but it's where we plan to focus.
And finally, we've licensed Finic under Apache 2.0 whereas Browserless is only available under a commercial license.
ushakov|1 year ago
Also, curious why your unstructured idea did not pan out?
ayanb9440|1 year ago
Our approach is a bit different. With finic you just write the script. We handle the entire job deployment and scaling on our end.
ilrwbwrkhv|1 year ago
ayanb9440|1 year ago
1. Developer tooling should be open source by default 2. Open source doesn't meaningfully affect revenue/scaling because developers that would use your self-hosted version would build in-house anyway.
yard2010|1 year ago
slewis|1 year ago
ayanb9440|1 year ago
sebmellen|1 year ago