top | item 30532516

(no title)

jrwr | 4 years ago

Full Archival with the standards required by the Internet Archive require that full unmodified headers are required, and unmodified content. This tends not to work well with modern browsers. Chrome and Firefox both fail at this currently. Someone is looking into a kind of modified Firefox to help with this. but its just not that how this system works. Now the Archive.org does have a API of sorts to say hay archive this URL, and a little working on the backend goes and does it..

What the Archive Team does is on a much more massive scale. Like SETI at home scale of scraping data across the internet. At almost every point we have had to make custom tools to ensure it meets our needs in our archival efforts.

discuss

cxr|4 years ago

> standards required by the Internet Archive require that full unmodified headers are required

Sure, this would not be a solution for the Wayback Machine, but would be adequate[1][2] for lots of non-Wayback collections (of the sort that Archive Team is associated with).

1. https://twitter.com/textfiles/status/970912494284779520

2. http://ascii.textfiles.com/archives/4285