top | item 19180797

(no title)

redahs | 7 years ago

If the desire is to make online content more readable, it might be worth starting with the assumption that all content downloaded from the network will be read on a black-and-white ereader device with no persistent internet connection.

This assumption might require substantially reworking the hyperlink model of the internet, so that external references to content delivered by third-parties is sharply distinguished from internal references to other pages within the same work.

discuss

order

ivansavz|7 years ago

Your idea of hypermedia with an offline browsing assumption is very good! Imagine an "offline archive" format that contains a document D + a pre-downloaded copy of all referred documents R1, R2, ..., Rn, along with necessary assets to render R1,R2..Rn in some useful manner (e.g. save html + main-narrative images from each page Ri, but skip everything else).

This "offline archive format" has numerous benefits: (A) Cognitive benefits of a limited/standard UI for information (e.g. "read on a black-and-white ereader device"), (B) Accessibility: standardizing on text would make life easier for people using screen readers, (C) Performance (since accessing everything on localhost), (D) async access (reaching the "edge" of the subgraph of the internet you have pre-downloaded on your localnet could be recorded and queued up for async retrieval by "opportunistic means", e.g., next time you connect to free wifi somewhere you retrieve the content and resolve those queued "HTTP promises", (E) cognitive benefits of staying on task when doing research (read the actual paper you wanted to read, instead of getting lost reading the references, and the references's references).

I'm not sure what "standard" for offline media (A) would should target... Do we allow video or not? On the one hand video has great usefulness as communication medium on the other it's very passive medium, often associated with entertainment rather than information. Hard choice if you ask me.

I'm sure such "pre-fetched HTTP" exists already of some sort, no? Or is it just not that useful if you only have "one hop" in the graph? How hard would it be to crawl/scrape 2 hops? 3 hops? I think we could have pretty good offline internet experience with a few hops. For me personally, I think async interactions with the internet limited to 3 hops would improve my focus—I'm thinking of hckrnews crawled + 3 hops of web content linked, a clone of any github repo encountered (if <10MB), and maybe doi links resolved to actual paper from sci-hub. Having access to this would be 80%+ of the daily "internet value" delivered for me, and more importantly allow me to cutoff from the useless information like news and youtube entertainment.

update: found WARC https://en.wikipedia.org/wiki/Web_ARChive http://archive-access.sourceforge.net/warc/warc_file_format-...

ethbro|7 years ago

The issue is this thrashes caching at both the local and network levels, decreases overall hit rate, and doesn't scale as links-per-page increase.

How many links from any given page are ever taken? And is it worth network capacity and storage to cache any given one?