Can IPFS or torrent and large local databases decentralised by people be a solution to this? I personally have the resources to share and host TBs of data but didn't find a good use to it.
For that to work, a website has to push a mirror into that alternate system, and the scraper has to know the associated mirror exists.
That's two big "ifs" for something I'm not aware of a standardized way of announcing. And the entire thing crumbles as soon as someone who wants every drop of data possible says "crawl their sites anyway to make sure they didn't forget to publish anything into the 2nd system."
I doubt, as the article mentions scraping the same resource after just 6 hours. AI companies want to make sure they have fresh data, whileit would be hard to keep such a database updated.
finnthehuman|11 months ago
That's two big "ifs" for something I'm not aware of a standardized way of announcing. And the entire thing crumbles as soon as someone who wants every drop of data possible says "crawl their sites anyway to make sure they didn't forget to publish anything into the 2nd system."
GTP|11 months ago
Self-Perfection|11 months ago
This way crawlers might contribute back by providing extra storage and bandwidth.
Though something like ZeroNet seems a better approach to allow dynamic content.
smashah|11 months ago