top | item 18714461

Show HN: CoolQLCool – Turn Websites into GraphQL Accessible APIs

124 points| gavino | 7 years ago |coolql.cool

22 comments

order

lachenmayer|7 years ago

I've previously written a very similar project called "graphql-scraper" (which is arguably a far less cool name...), you can check it out at http://github.com/lachenmayer/graphql-scraper

It works very similarly, with only superficial differences under the hood (eg. I used jsdom, and this uses cheerio). The `waitForSelector` feature is very cool!

You can see a live demo of the HN example using graphql-scraper at https://graphqlbin.com/v2/lxNohP

This example is deployed on Glitch - you can easily spin up your own using https://github.com/lachenmayer/graphql-scraper-server (with 1-click deploys to Heroku, Now & Glitch)

Of course (as mentioned already) there is also https://github.com/syrusakbary/gdom which uses Python+Graphene.

gavino|7 years ago

I remember seeing GDOM a while back when I first started this project, but forgot to write it down as a source of inspiration. I'm gonna add all of these as alternatives, because they're all great :D

bryanrasmussen|7 years ago

Are you planning to build anything on top of this - service,company? I was thinking it would be a good way to build an api for some projects I've been thinking of working on, although I would probably want to switch out cheerio for https://github.com/intoli/remote-browser/

gavino|7 years ago

Nah, I don't really plan on turning it into a company. I'd gladly accept any PR to swap out cheerio, I haven't touched that part in close to a year :D

canadev|7 years ago

This is a tangent but they link to a serverless deployment service where you upload your code as a function and they execute it. Pretty interesting.

pdxandi|7 years ago

I've been looking for something like this! I'm trying to play around with it but can't seem to get the selector right. How do I grab a table `td` by its nth selector (tried `td:nth-of-type(n)` to no avail)?

VMG|7 years ago

Awesome name, awesome project!

conceptpad|7 years ago

Great project! I can imagine this may greatly improve web certain classes of scraping. @gavino I'm curious what tooling and architecture you used to put this together?

gavino|7 years ago

Sure! The backend is actually pretty straight forward, it's a NextJS app deployed on Now with a few added endpoints to handle the incoming GraphQL queries.

Then for actually turning the query into a digestable output I used the GraphQL schema builder that handles accepts HTML nodes from the requested page and grabs the right variables.

nurettin|7 years ago

Not sure what to make of this. How does it handle throttling or captchas?

halfjew22|7 years ago

If this is a community reference I’m going to be very happy.

gavino|7 years ago

Troy and Abed scraping websites!! :D

jarjar12|7 years ago

Sorry a dumb question. What are the use cases ? Thx

ralusek|7 years ago

1.) You have a website with data you'd like to consume.

2.) That website doesn't expose an api, but returns statically rendered html.

3.) You don't like parsing statically rendered html for the data you're looking for, and you'd prefer getting the data using a GraphQL interface.