top | item 44771780

(no title)

stuffoverflow | 7 months ago

Archiveteam did a full site crawl[1] when Anandtech announced they were stopping. You can browse the warc.gz files like a regular web page using https://replayweb.page

Alternatively you could use solrwayback[2] to index and browse the warc files.

1: https://archive.fart.website/archivebot/viewer/job/202409012...

2: https://github.com/netarchivesuite/solrwayback

discuss

order

seabass-labrax|7 months ago

Also Kiwix[1] is an excellent app for browsing websites offline. You can use warc2zim[2] to convert the WARC files to ZIM files for use with Kiwix.

I was pleasantly surprised to find that the DWDS (digital dictionary of the German language) app is actually Kiwix!

[1]: https://www.kiwix.org/

[2]: https://github.com/openzim/warc2zim

formerly_proven|7 months ago

> Kiwix

... I haven't heard this name in 15 years probably. Back then you could bring Wikipedia offline on a laptop, it was only around 20-25 GB.

rapnie|7 months ago

This is a bit tangential, but is there a good way to archive Discourse forums and turn them into regular websites? Anyone have experience to share?