top | item 8215419

There’s a simple alternative to the current web

150 points| mgunes | 11 years ago |hapgood.us

143 comments

order
[+] idlewords|11 years ago|reply
Clearly I'm a biased observer, but I really think people should take steps to archive stuff that is important to them. Of course it's terrible when large sites go offline and take vast swaths of the Internet with them, and we should continue to shame the ones that do it. At the same time, if something is really important to you, you shouldn't store it in the form of links to random third-party servers.

One problem we need to solve as coders is giving people better tools for saving stuff. It's really hard right now to save a webpage (or worse, series of connected pages) with any confidence that you've captured everything you need to see it again if the original server disappears.

A project that I think has struck a really good balance between permanence and retaining authors' control over their writing is the Archive of Our Own (AO3). A bunch of fanfic authors got tired of sites falling out from under them, and decided to implement their own system, along with sensible governance and a way to fund its ongoing operations. The only broken links I've ever seen to AO3 are ones where the author consciously decided to take the material offline.

[+] rando289|11 years ago|reply
It always seemed like such a regression to me when browsers disabled entire page caches and only ever show the freshest content. Before, if a page served 404, I could easily see my local cached version from my previous visit. Now, I'm shit out of luck.
[+] Terretta|11 years ago|reply
> I really think people should take steps to archive stuff that is important to them

I understand why you didn't post a "self promoting" link, but I want others to know about this option:

https://pinboard.in/tour/#archive

Currently $10 for a lifetime bookmarking account, plus $25/year to archive every bookmarked page:

"Pinboard offers a bookmark archiving service for an annual fee of $25. The site will crawl and store a copy of every bookmark in your account, and display a special icon you can click to see the cached copy. If the page you bookmarked goes offline, you'll still be able to see the archived copy indefinitely."

[+] sroerick|11 years ago|reply
My second reply, but, I think that this is really important.

We only need to look at early film history to know how easy it is to lose massive parts of our history.

Going back to old pages, I frequently get 404 results. For politically sensitive documents, the problem is much more widespread.

I would like something that not only archives pages I visit, but also versions them and tracks changes. If there was a bookmarking tool that did this, you could easily have an opt-in feature that shared content. This type of system would be a huge boost to something like the wayback machine.

[+] sroerick|11 years ago|reply
Slightly related to this, the other day I tried to download every youtube video in my watch history.

Turns out it's pretty much impossible. The Youtube API only delivers about 20 results, which is a bug that has existed for about 2 years. I tried manually loading the watch history page, and was only able to get about 1000 out of ~8000 results. When I selected those 1000 results and tried to add them to a playlist, the interface crashed.

Does anyone have a solution for this, or is my watch history just at the mercy of Google?

[+] MetaCosm|11 years ago|reply
> Clearly I'm a biased observer, but I really think people should take steps to archive stuff that is important to them.

I recently converted all my bookmarks to saved copies of the pages in Evernote. This means I have full text search over everything I "bookmark" -- still have the links -- and never lose it regardless of what happens.

In this process, I was shocked at the number of 404s I encountered from relatively recent (last 18 months) bookmarks.

Evernote is a bit of a wreck, but so far the combination of decent copy of page, browser plugins, tags, search, accessibility (phone clients, web client, some native clients) have made it my best option.

[+] jaimebuelta|11 years ago|reply
This is the only real solution (even if it's through an intermediate agent, which may have the same problem. Remember Google Reader). Of course, then the problem to keep that info in a good way (indexable, accessible, backed up, etc...) is another interesting problem. I'm afraid that most of the people are simply not aware of all of this (both on the content supplier or the content consumer)

This is also somehow of a problem on printed books. There are tons of books that are out of print and cannot be easily found.

[+] oridecon|11 years ago|reply
Luckly if Stack Overflow goes down the one billion shady scrapper bots will have (at least) the most viewed questions.
[+] DanBC|11 years ago|reply
Many people just don't have the clue to make usable local archives.

You're also proposing mass copyright infringement which - stupidly in this example - is not legal.

[+] spindritf|11 years ago|reply
It’s interesting that Andreessen can’t see the solution, but perhaps expected.

What a weird dig. It's neither expected, nor established that he can't see a solution. I'm not as smart as Andreessen and I could come up with half a dozen solutions.

Author's favourite is fine but far, far from obvious. How viable is it to run your own federated wiki anyway? Are there packages for popular systems? Are there plugins for major browsers? Is there any federation actually happening? I skimmed the resources[1] and don't know. Does anyone here run one? That would be a solution, this seems more like an idea.

And it's not like no one's doing anything. There are services like Pocket or Readability to store an article until you want to read it, Evernotes, Google Keeps... Our very own 'idlewords will archive the contents of your bookmarks for a fee[2]. Finally, there's archive.org.

[1] https://github.com/WardCunningham/Smallest-Federated-Wiki#ho...

[2] https://pinboard.in/tour/#archive

[+] purplerails|11 years ago|reply
I've been thinking about this for a while now. Please check out my web app to solve this problem: https://www.purplerails.com/

The main idea is to use a browser extension to automatically save pages that you read to the cloud (including the images, stylesheets etc) in the background. Saved pages are searchable and sharable.

[+] zargon|11 years ago|reply
This sounded really great until I went to the website and saw that I can't use my own cloud storage, only purplerails'. As soon as purplerails disappears all my saved pages are gone. I already have this functionality with diigo and it makes me very uncomfortable not to have a copy of the data.
[+] idlewords|11 years ago|reply
An early design idea I had for Pinboard was as a browser plugin that just saved everything it saw in passing to an upstream server. But the problem that stumped me was that there's much more downstream bandwidth than upstream on a typical residential connection, so it was hard to push things to a server in anything like real time. How did you end up dealing with this issue?
[+] hollerith|11 years ago|reply
I'd prefer for the pages to be saved to the hard drive of the machine running the web browser.

But maybe browser extensions cannot obtain permission to do that?

[+] asaddhamani|11 years ago|reply
I tried giving purplerails a shot. I use LastPass for password management, and I am not going to type in a 25/30 character password every single time I want to log into an application. I think you're going way too hard on that part. This is the first time I've had this happen to me when using a web app, and it immediately made me close the page.
[+] lingben|11 years ago|reply
sounds like evernote, if I'm mistaken, please enlighten me :)
[+] dilap|11 years ago|reply
"The Tyranny of Print" has a nice ring to it, but mediums that give the creator more control over appearance+behavior are going to lend themselves to crafting more compelling experiences.

Sure, not disappearing in 10 years (or whenever the original server goes poof) would also be nice, but it's of little benefit if no one ever sees the thing in the first place.

And disappearing is the default, natural state of things.

If I see some people playing music on a corner and return the next night to see they've left, I may be wistful, but it would be silly to argue "playing live music is broken and we should fix it".

If you think of web sites as performances put on for a limited time by the server, it doesn't seem so terrible that they disappear after a while.

[+] mgunes|11 years ago|reply
> And disappearing is the default, natural state of things.

Books, clay tablets, scrolls, engraved stone, to which humans owe their entire knowledge of their premodern history, seem to have put up pretty well against entropy. The same is not the case for information disseminated in a controlled manner from privately owned servers.

> If I see some people playing music on a corner and return the next night to see they've left, I may be wistful, but it would be silly to argue "playing live music is broken and we should fix it".

> If you think of web sites as performances put on for a limited time by the server, it doesn't seem so terrible that they disappear after a while.

Thankfully, the generations who produced and preserved knowledge on paper, clay and stone before the onset of digital technology - that is, every generation of humans that has ever lived, except ours - did not think of books and libraries as throwaway pamphlets. And it would take more than an arbitrary interchange of modes of cultural production to argue that we should be doing otherwise in the technological circumstance we find ourselves in.

The "tyrants of the server" are not thinking of server-centric aggregation and dissemination of as a performance put on for a limited time: they are betting on it as the future of all human literary activity. Google doesn't want to read you a paragraph, take your money and say goodbye; it wants to swallow all the world's books and information, chop it to tiny pieces, store and own it forever, and extract the maximum profit from each tiny piece, without having you pay a penny. And it wants you to come back for more. The persistence of the server-centric model of content dissemination is not an accident; it is dictated by the political economy of the web brought about by the Googles of the world.

[+] seiji|11 years ago|reply
The mental model of web browsing and bookmarking is: "If I see it, I can get to it again." There's a partial feeling of ownership. "I've read it, so I should be able to refer back to it later."

Nobody (at least, no sane regular person) reads a webpage and thinks "I have a time-limited license from the originator of this content to consume the material and only use it for their expressly condoned purposes."

The vanishing content problem is like if books in your house randomly walked away just because it's the "natural state" of things to disappear.

[+] idlewords|11 years ago|reply
There should be room for a spectrum of stuff online, from evanescent to permanent. It's one thing for the musicians on that corner to be gone the next night, but you do expect the corner to still be there. Online, it's depressingly common for even large bits of infrastructure (like GeoCities) to just go poof.
[+] AnthonyMouse|11 years ago|reply
> mediums that give the creator more control over appearance+behavior are going to lend themselves to crafting more compelling experiences.

There is a difference between the creator and the server. Most of the content you consume is created by people who don't own the servers. Separating appearance+behavior from content source would help actual creators because they wouldn't have to worry about their host deciding one day to delete all their content because the service is being discontinued or the creator is competing with some business interest of the host.

> If you think of web sites as performances put on for a limited time by the server, it doesn't seem so terrible that they disappear after a while.

The problem is that the web is being used for everything, even things that can and should work like books rather than like live performances.

[+] BerislavLopac|11 years ago|reply
> If I see some people playing music on a corner and return the next night to see they've left, I may be wistful, but it would be silly to argue "playing live music is broken and we should fix it".

Actually, that's exactly what the inventors of (various) recording machines did. Something might have disappeared in one form, only to return in another. Just ask the Project Gutenberg people.

[+] hackaflocka|11 years ago|reply
I will pay good money for a Chrome extension that does the following:

1) I can select (or do select all) Chrome bookmarks that I want to keep offline page backups/archives of (saved to google drive or dropbox or some such).

2) Whenever I want, instead of seeing the current online version of that bookmarked page, I can look up the originally bookmarked archived page.

3) It allows me to choose the level of links to the bookmarked page to also backup/archive (e.g., every single page that is linked to that page, x links deep, is also automatically archived -- think httrack or wget).

As someone on Hacker News once said to me: my bookmarks are my knowledge graph. As important to me as any library.

[+] Detrus|11 years ago|reply
Pinboard archiving costs about $25 a month. Not sure it does the deep link archiving.
[+] TelmoMenezes|11 years ago|reply
Coming up with architectures to decentralize servers is the fun part. Convincing people outside of our bubble to use the new system is the very hard part. It has to be able to do something the regular person really wants that the previous system didn't allow. This is why Linux never caught up on the desktop.

Now excuse me while I go curate my socks collection.

[+] nb13|11 years ago|reply
This wouldn't work for any web page that has dynamic content stored in a database. If the database no longer exists a decade from now this doesn't solve that problem.

Also, wouldn't this break analytics and reporting for most websites too? It'll be much tougher to track user behavior to improve user experience. And debugging using log data? I get what the author is suggesting but "fixing the web" this way would break more things that large websites and companies rely on.

[+] Houshalter|11 years ago|reply
Link rot is a serious problem: http://www.gwern.net/Archiving%20URLs#link-rot

>In a 2003 experiment, Fetterly et al. discovered that about one link out of every 200 disappeared each week from the Internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication [the irony!], and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year.

>Bruce Schneier remarks that one friend experienced 50% linkrot in one of his pages over less than 9 years (not that the situation was any better in 1998), and that his own blog posts link to news articles that go dead in days; the Internet Archive has estimated the average lifespan of a Web page at 100 days. A Science study looked at articles in prestigious journals; they didn’t use many Internet links, but when they did, 2 years later ~13% were dead. The French company Linterweb studied external links on the French Wikipedia before setting up their cache of French external links, and found - back in 2008 - already 5% were dead. (The English Wikipedia has seen a 2010-2011 spike from a few thousand dead links to ~110,000 out of ~17.5m live links.) The dismal studies just go on and on and on (and on). Even in a highly stable, funded, curated environment, link rot happens anyway. For example, about 11% of Arab Spring-related tweets were gone within a year (even though Twitter is - currently - still around).

[+] idlewords|11 years ago|reply
My own research (which I hope to publish soon) shows a slightly better link rot rate for bookmarked URLs (which are presumably ones people are most interested in keeping). The attrition rate I see so far is roughly linear and about 5% a year. Which is still shocking by any non-web standard, but a little better than the figures cited above.
[+] magila|11 years ago|reply
The fact that he thinks a federated wiki would be "simple" or "easy" leads me to believe he has not actually thought through the details of how it would work in practice.
[+] azakai|11 years ago|reply
Yes. 2 problems that immediately come to mind are

1. Copyright law.

2. Dynamic content.

[+] michaelchisari|11 years ago|reply
It would not be simple or easy, but crypto-currency blockchains make it more possible than ever.
[+] grmarcil|11 years ago|reply
I find Bret Victor's comparison between the internet and the LOC a little weird. I've always thought of the internet as a publishing/sharing medium, not an archive.

There are plenty of books that go out of print within ten years, we just happen to have infrastructure beyond publishers (libraries) that preserve published copies.

[+] idlewords|11 years ago|reply
I think it's significant that the Library of Congress has funding, an official mandate, employees, a clear legal status, and stores complete copies of the works in its catalog. A similar model would work great for the Internet (and archive.org is doing its best to fill the role).
[+] wyager|11 years ago|reply
This would not work with dynamic content.

We already have systems like this (bittorrent, freenet, etc.), and almost no one sees them as a viable replacement for the web because they can't do 99.9% of the things we want (social networks, forums, email, etc.)

[+] lutusp|11 years ago|reply
> There’s actually a pretty simple alternative to the current web. In federated wiki, when you find a page you like, you curate it to your own server (which may even be running on your laptop). That forms part of a named-content system, and if later that page disappears at the source, the system can find dozens of curated copies across the web.

This is a simple and very bad idea. If it were the norm, instead of one or no copies of a particular work online, you would have any number of "curated" copies of uncertain vintage, downloaded at different times in the lifetime of an original whose content might well have changed as time passed. You would have curations of curations, and curations of those, ad infinitum.

Pages that depended on remote Web content (increasingly common) and/or that linked to online references, would gradually become unreadable or incomprehensible as its links vanished into other offline "curations".

Not to mention the copyright issues. And I'm not crazy about the term "curation" either -- it's obviously meant to try to elevate the practice of downloading anything we please, without regard to copyright.

[+] pjbrunet|11 years ago|reply
I'd rather a page go offline than have it taken out of context. As if plagiarism wasn't bad enough already. (Yahoo Answers cough)

These crooks will even steal your copyright notice. It's quite possible the original content producers are offline because scraper thieves stole so much content that it's no longer possible to earn a living.

As an artist, this reminds me of the condescending attitude that gave us fake Rolexes, Facebook & North Korea's 28 state-approved haircuts. Either it's "just content" to stuff in a database somewhere or you understand the medium is the message too.

[+] sroerick|11 years ago|reply
As someone with a teensy bit of film background, I have to disagree. The number of early Hollywood films that were lost is astounding. This is a massive part of our visual history that is completely gone. It will never be restored.

With the current environment on the internet, with DRM'd video, music and text, I have to assume that we will lose far more from this time period than we ever had before.

While I don't pirate things (I'd rather just consume Creative Commons and Public Domain content), I wholehartedly support people who are trying to archive the things that are part of our collective culture. When I have kids, I'd like to be able to show them where they came from.

[+] samdroid|11 years ago|reply
The author is not suggesting taking others content and calling it their own - he is suggesting almost keeping a backup with a bibliography. Infact, this already exists today - do you hate archive.org?
[+] mark_l_watson|11 years ago|reply
I like the idea of the federated wiki, but search engines rank copies of pages poorly, so it is not clear how visible copies would be after the original content disappears.

I used Evernote for years, but recently canceled the service because I spent too much time curating compared to reading old material.

One option that I am considering is archiving really good web content as web archive files and saving them locally in folders indicating the year of capture. Local file search would quickly find old stuff and if I stored the yearly web archive folders in Dropbox, I would have them available on different systems.

[+] CCs|11 years ago|reply
Hosting your own server might not be a scalable solution either. There's a reason why SaaS is popular: it's not that easy.

On the downside: stuff hosted by others might go away. Web pages, web services, apps requiring server side support...

Investing a lot in a service makes it more painful to lose, like the apparently discontinued Amazon Cloud Drive (supposed to be a cheaper Dropbox): https://news.ycombinator.com/item?id=8219257

[+] ilaksh|11 years ago|reply
Named data networking of some kind is likely to become popular at some point. This is that kind of idea but doesn't look like a really general protocol since he mentions a specific wiki.

I wonder if there are browser extensions that do p2p caching/distribution of content. Then you could standardize a protocol used for that type of communication.

I believe there are many efforts along these lines. The trick is as usual getting everyone on the same page or at least working together more.

[+] twoodfin|11 years ago|reply
I'd love it if browsers natively supported URI's derived from cryptographic hashes of content by looking them up in a distributed store a la BitTorrent. Imagine if Chrome supported such a thing, for example. Perfectly reliable cache-ability (or archive-ability), P2P hosting, ... All the good stuff for any web content that its creator wants to so expose, albeit at the price of immutability.
[+] ottonomy|11 years ago|reply
Especially under the current copyright regime, finding some solution that preserves the intent of the creator to publish in a fixed format, would be a great component of a distributed publishing system. I don't think this proposal has as good a fair use defense as the Internet Archive wayback machine does.

In the world today, we often think of publishing online as providing access to something under our control. I think a technology that aims to solve these problems should embody a different spirit, one closer to "making public". The word "mine" doesn't need to imply ownership in the sense of exclusive control. I mean, "My children" is at least as meaningful a relationship as "my property". Some kind of copyright license ability built into a distributed document publishing system would be nice.

[+] bajsejohannes|11 years ago|reply
I think the federated wiki is a neat idea, but in it's current incarnation, I find it to be exceedingly unlikely that a page I'm looking at there will stay around 10 years.

Even if I'm making a copy of every page I see, I'm not sure I'll still run a federated wiki on my server in 10 years.

I don't think this is a real solution to the problem posed by Bret Victor.

[+] ricardolopes|11 years ago|reply
It's a great idea for a real problem that needs to be solved. Still, for dynamic pages, what would be the desired behaviour? Updating it whenever possible, which could lead to the specific info we wanted to save potentially disappearing or changing? Leaving it outdated? It is really something that I cannot answer.