Show HN: Mummify – Preserve web content, fight link rot

[+] donpdonp|12 years ago|reply

Its worth mentioning the http://archiveteam.org/ which does a lot of important web archival work.

They use the Web ARChive format (WARC http://fileformats.archiveteam.org/wiki/WARC ) which I hope mummify and other such services will standardize on.

[+] nine_k|12 years ago|reply

The approach is nice, the problem being solved is real. Hopefully paying customers will flock in.

The question: is mummify.it itself going to go under some day?

So I'll wait for an open-source analog of this service, to run on my own tiny server. (Or carve some time to write it myself, of course.)

[+] toomuchtodo|12 years ago|reply

There already is an open source analog:

http://www.archiveteam.org/index.php?title=Wget_with_WARC_ou... + http://archive.org/web/web.php

[+] lesinski|12 years ago|reply

How is Mummify going to get around the copyright implications of ripping off the now removed piece of content which they don't own?

Also wondering: what happens if the publisher redirects the old URL to a new place -- maybe an "update"... or maybe a useless hub page?

[+] arb99|12 years ago|reply

Weird pricing plan. At even the most expensive plan (of 15$/mo) only 50 'mummies' a month. Seems like an arbitrarily low number

[+] hk__2|12 years ago|reply

There are free alternatives, like Peeep.us [1] and Archive.is.

[1]: http://www.peeep.us/

[2]: http://archive.is/

[+] desireco42|12 years ago|reply

I don't get the number of free/paid mummifications? Seems very low, space or bandwidth are abundant.

I think, and with respect to original developer, this is more a feature then an app and probably would help if it would be developed further to target more specific problem/group.

Having said that, I wish you best.

[+] pasbesoin|12 years ago|reply

The OP depends upon Javascript for any and all content delivery/rendition. Given the topic at hand, I find this more than a bit ironic.

I gather from the comments that this is some sort of online storage of a copy. That may serve some use cases over the short term.

If you really want to avoid loss or "link rot", maintain your own copy on your own equipment.

I've been around to observe everything from personal interest changes, death, corporate policy changes, ownership transfers, deliberate manipulation... etc. -- you get the idea -- effect the ability to pull even what were formerly considered very stable and long-standing, aka "permanent", resources.

If you want to ensure you have access, save your own copy onto hardware that you own. End of story.

[+] AznHisoka|12 years ago|reply

50 mummies per month for the highest plan is way way way too low. It should 10,000 or even 50,000. Think about it this way: the people who are willing to pay you for this type of service are probably huge publishers who would use this service to share their URLs to their followers in Twitter, Facebook, etc.

[+] zackbloom|12 years ago|reply

Doesn't help them much if a huge publisher is paying them $15 a month. That type of sale would require a SLA too, they really just need a contact email address.

[+] alok-g|12 years ago|reply

I would be happy with 1000 per month. Typically 10 from HN everyday and 10 from elsewhere. But I would not pay $50 per month for this.

[+] MWil|12 years ago|reply

and if Mummify goes down...

Would it be a better option if the "permanents" were shared across p2p/bittorrent and every unique item had at least 10 shares distributed across the globe, maybe a max of 20. When one share host goes down, it just picks up a replacement.

[+] acdha|12 years ago|reply

Just to second what donpdonp said (https://news.ycombinator.com/item?id=6509604), I think a service like this needs to offer a standard format WARC (http://archive-access.sourceforge.net/warc/) download.

The whole point of a service like this is long-term access and that really requires a data checkout option which can be used with other tools (e.g. https://github.com/alard/warc-proxy).

[+] subpixel|12 years ago|reply

I use Safari web archives for a similar task.

But I wonder...isn't it safe to assume that, eventually, browser rendering engines* will change to the degree that something I saved 4+ years ago is essentially unreadable?

And doesn't that same potential problem apply to a hosted service as well?

*I'm using that vague term to describe everything I don't understand about how browsers render pages, markup, and javascript, which is a lot.

[+] dangero|12 years ago|reply

I agree. I think the safest bet would be to keep a static image as well. Without a static image would will always question if the rendering engine has even slightly changed the look.

[+] jboynyc|12 years ago|reply

Interesting in light of this discussion: https://news.ycombinator.com/item?id=6504331

But to trust that something like this to make a permanent copy of stuff I'm linking to, I'd need to know a bit more about them. Else this is effectively like using a link shortener -- a single point of failure.

[+] zek|12 years ago|reply

Hey creator here, we realize there is a bit of a trust issue, thats why we have the paid plans. Our costs are low enough that, even if you were our only paying customer left we would be able to keep the service running just for you.

[+] unknown|12 years ago|reply

[deleted]

[+] vijucat|12 years ago|reply

I have been using the Scrapbook add-on (see screenshots and manual here : [1]) in Firefox [2] for many years for this; it saves the web page to your local hard disk, and there are several types of annotations that you can perform on the saved page. One trick I use is to first run Readability on the page to get a clean version, and then save that to Scrapbook. With full-text and comments-only-search, this add-on, all by itself, kept me with Firefox even during the dark period when Chrome came in and thrashed Firefox on performance :-)

I used to use diigo.com, which does the job quite well, too, before I discovered Scrapbook.

[1] http://amb.vis.ne.jp/mozilla/scrapbook/ [2] https://addons.mozilla.org/En-us/firefox/addon/scrapbook/

[+] lingben|12 years ago|reply

I use evernote premium ($45/year) which is much cheaper and includes a tonne of extra features.

Compare this to the pricing plan of mummify at $15/month pricing or $180/year. for that price I could buy a pair of HD with several terabytes of capacity and copy/paste the whole webpage, code, files and all using httrack.

[+] junto|12 years ago|reply

One point to note is that a DMCA takedown targeted at Mummify.it will remove the content just as it would from the nytimes.com.

If I manually save that content to disk then any DMCA take down doesn't affect the content stored on my local hard disk.

[+] toomuchtodo|12 years ago|reply

If it's something personal, I use http://archive.is with their bookmarklet in Chrome (free; unlimited archiving). It immediately renders the page, saves a copy with a unique url, and gives me a .zip link to download the archive.

If it's something I want to submit to the Internet Archive, I use wget with WARC extensions (http://www.archiveteam.org/index.php?title=Wget_with_WARC_ou...), submit the archive to the IA and notify them it needs to be merged in, and keep the .tar.gz archive.

Eventually I'll webapp/one-click the whole thing, with an archive to S3 and/or Glacier.

Disclaimer: archiveteam participant

[+] asciimo|12 years ago|reply

Good point. Perhaps a "download catacombs" feature?

[+] brianobush|12 years ago|reply

What if the server was not in the US? A DMCA takedown notice would not apply.

[+] gabemart|12 years ago|reply

Most of the time, one will not know which pages one links to will disappear in the future. This service therefore only seems really useful if you use it with every link you make. That would make the biggest plan short on "mummies" by a couple of orders of magnitude, at least.

[+] contextual|12 years ago|reply

Not every page one visits is worth storing. The whole point is to filter out (and save) the good from the rest.

An aside: Mummify needs browser plugins for all the major browser to make saving as seamless as possible.

[+] newrenowhore|12 years ago|reply

Really like this, nice work. Is there a way to automatically mummify every link on an entire website or directory? Your method/service could be an interesting, easy to use way to recall a read-only version of a site after it goes down.

[+] d0m|12 years ago|reply

Do you rip video on the page too? If so, I would definitely use this service. I've got videos with my company brand a bit everywhere and I can't find a good way to download it or link to it so that it stay there permanently.

[+] bjackman|12 years ago|reply

Just a thought: the "Link Rot" link should really be a Mummified link. I know Wikipedia isn't disappearing any time soon, but it just seemed silly to me not to "realise" the use case!

Anyway, very sexy site design IMO.

[+] ChuckMcM|12 years ago|reply

This is pretty cool, I'd be happy with a service that mummified it to storage I control, since I've had pages that I've pointed to at archive.org vanish after the current robots.txt was changed.

66 comments