top | item 43858919

(no title)

gibibit | 10 months ago

Is there any software that can provide verified, trusted archives of websites?

For example, we can go to the Wayback Machine at archive.org to not only see what a website looked like in the past, but prove it to someone (because we implicitly trust The Internet Archive). But the Wayback Machine has deleted sites when a site later changes its robots.txt to exclude it, meaning that old site REALLY disappears from the web forever.

The difficulty for a trusted archive solution is in proving that the archived pages weren't altered, and that the timestamp of the capture was not altered.

It seems like blockchain would be a big help, and would prevent back-dating future snapshots, but there seem to be a lot of missing pieces still.

Thoughts?

discuss

order

shrinks99|10 months ago

Webrecorder's WACZ signing spec (https://specs.webrecorder.net/wacz-auth/latest) does some of this — authenticating the identity of who archived it and at what time — but the rest of what you're asking for (legitimacy of the content itself) is an unsolved problem as web content isn't all signed by its issuing server.

In some of the case studies Starling (https://www.starlinglab.org/) has published, they've published timestamps of authenticated WACZs to blockchains to prove that they were around at a specific time... More _layers_ of data integrity but not 100% trustless.

gibibit|10 months ago

Very informative, thanks!

yencabulator|9 months ago

There's been attempts to standardize a way for a HTTPS server to say "Yes, this response really did come from me", but nothing has been really adopted.

https://www.rfc-editor.org/rfc/rfc9421.html

https://httpsig.org/

Without the server participating, best you can do is a LetsEncrypt-style "we made this request from many places and got the same response" statement by a trusted party.

Inspiration: roughtime can be used to piggyback a "proof of known hash at time" mechanism, without blockchain waste. That lets you say "I've had this file since this time".

https://www.imperialviolet.org/2016/09/19/roughtime.html

https://int08h.com/post/to-catch-a-lying-timeserver/

https://blog.cloudflare.com/roughtime/

https://news.ycombinator.com/item?id=12599705

dj0k3r|10 months ago

Take a look at singleFile - a project that lets you save the entire webpage. It has an integration for saving the hash if the page on a Blockchain. You can choose to set it up between parties who're interested in the provenance of the authenticity.