top | item 41513576

(no title)

ktta | 1 year ago

I don't think so. Most of the web is behind a login and/or unlinkable. So you're left with 'open web'. This part is much smaller. So not impossible to archive a meaningful part of it, pretty tangible, especially the useful parts.

discuss

order

p0358|1 year ago

The meaningful part of open web is small, yes. Sadly there's so much junky pages, nowadays also partially generated by AI, previously by just copy-pasting randomly content of other pages, cluttering search results. It somehow needs to be all filtered out, otherwise it'll end up taking place instead of something more useful... So I'd really wonder how much of the open web is some kind of original content and how much is duplicate/auto-generated junk.

Etheryte|1 year ago

I'm not convinced. There are various different estimates for how large the internet is with varying confidence, but most I found average around a few hundred zettabytes. The Internet Archive seems to be in the ballpark of a hundred petabytes. So unless I got it wrong, the archive currently covers about 0.01% of the whole thing. How much we need to cover the useful bits is a separate discussion of course.