The first thing I setup when I started to manage my own Kubernetes cluster more then a year ago was this Warrior, I completely forgot about it until this post.
Has been active for over a year steadily working the recommended project. Downloaded over 3TB in 6 days (node reboot, so pod was restarted and stats are not persistent). So rough extrapolation is about 180TB. Happy to help the good cause of the ArchiveTeam!
I noticed from the docker overlay filesystem that the container was spraying files all over the disk. (Ephemeral, destroyed on container shutdown, sure, but I wanted to reduce write-wear on my ssd...)
I tried setting it up with /tmp as a tmpfs (ramdisk) but it then refused to start...
Anyone know any broad-spectrum docker incantations to force all overlay writes to RAM, for a container?
Many of these sites are already captured and archived by proper entities as required by federal law. More is better, I guess, except when it isn't. Duplication of effort is a huge problem in the humanities in general and with archiving in particular.
The whole concept needs to be rethought. Captures from these tools show up under "ArchiveTeam" which is currently pumping thousands of copies of the Google Home Page into the Wayback Machine every week. Or at least trying to.
What federal law do you suppose is guiding the mass deletions? That doesn't look like archiving to me. Now that the foxes are running the henhouse, how reliable do you suppose their own archives are?
How do I as a non-US citizen get access to information from those "proper entities"? Is it even possible for US citizens? This is often a surprise for some visitors of this fine website, but there's a large world outside the US where "federal law" does not apply.
WildGreenLeave|1 year ago
Has been active for over a year steadily working the recommended project. Downloaded over 3TB in 6 days (node reboot, so pod was restarted and stats are not persistent). So rough extrapolation is about 180TB. Happy to help the good cause of the ArchiveTeam!
Edit: typo
ch71r22|1 year ago
https://github.com/ArchiveTeam/warrior-dockerfile/blob/maste...
NortySpock|1 year ago
I tried setting it up with /tmp as a tmpfs (ramdisk) but it then refused to start...
Anyone know any broad-spectrum docker incantations to force all overlay writes to RAM, for a container?
crtasm|1 year ago
Havoc|1 year ago
tech234a|1 year ago
honestSysAdmin|1 year ago
[deleted]
badlibrarian|1 year ago
The whole concept needs to be rethought. Captures from these tools show up under "ArchiveTeam" which is currently pumping thousands of copies of the Google Home Page into the Wayback Machine every week. Or at least trying to.
https://web.archive.org/web/20250122000033/www.google.com
Like so many things about archive.org, when you dig in you start to find wonder and craziness at every turn.
myself248|1 year ago
What federal law do you suppose is guiding the mass deletions? That doesn't look like archiving to me. Now that the foxes are running the henhouse, how reliable do you suppose their own archives are?
homebrewer|1 year ago
jfkrrorj|1 year ago
[deleted]