top | item 41327515

(no title)

markjgx | 1 year ago

"Surfer: The World's First Digital Footprint Exporter" is dubious—it's clearly not the first. Kicking off with such a bold claim while only supporting seven major platforms? A scraper like this is only valuable if it has hundreds of integrations; the more niche, the better. The idea is great, but this needs a lot more time in the oven.

I would prefer a cli tool with partial gather support. Something that I could easily setup to run on a cheap instance somewhere and have it scrape all my data continuously at set intervals, and then give me the data in the most readable format possible through an easy access path. I've been thinking of making something like that, but with https://github.com/microsoft/graphrag at the center of it. A continuously rebuilt GraphRAG of all your data.

discuss

order

madamelic|1 year ago

Take a look at https://github.com/karlicoss/HPI

It builds an entire ecosystem around your data where it is programmatic rather than just dumping text files. The point of HPI is to build your own stuff onto it and it all integrates seamlessly together into one Python package.

The next stop after Karlicoss is https://github.com/seanbreckenridge/HPI_API which creates a REST API on top of your HPI without any additional configuration.

If you want to get more fancy / antithetical to HPI, you can use https://github.com/hpi/authenticated_hpi_api or https://github.com/hpi/hpi-graph so you can theoretically expose it to the web (I am squatting the HPI org, I am not the creator of HPI). I made the authentication method JWTs so you can create JWTs where it will give access to only certain services' data. (Beware, hpi-graph is very out of date and I haven't touched it lately but my HPI stuff has been chugging away downloading data).

Some of the /hpi stuff I made is a bit mish-mash because it was rip-and-replace from a project I was making so you'll see references to "Archivist" or things that aren't local-first and depend on Vercel applications.

bdcravens|1 year ago

The amount of built-in platforms isn't necessarily the problem. The best systems are those that establish a plugin ecosystem.

michaelmior|1 year ago

While I agree that it's not the first, I think it's unfair to say that it's not valuable without hundreds of integrations.

slalani304|1 year ago

Yeah it was honestly more of a marketing statement lol, but removing it for sure. Adding daily/interval exporting is one of our top priorities right now and after that and making the scraping more reliable, we'll add something similar to GraphRAG. Curious to hear what other integrations you would want built into this system.