top | item 41428577

(no title)

mfashby | 1 year ago

It's not what you're aiming for with this comment, but I bet git would actually make a pretty good storage tool/format for archival of mostly static sites.

horrible simple hack: use `wget` with `--mirror` option, and commit the result to a git repository. Repeat with a `cron` job to keep an archive with change history.

discuss

order

breck|1 year ago

I assume this is what wayback machine uses?

Tomte|1 year ago

Of course not. They have their own crawler (Heritrix, an open source Java crawler) and archive in WARC format. It‘s serious archiving, they want to preserve reply codes, HTTP headers etc.