top | item 41865189

(no title)

TheTechRobo | 1 year ago

Browsertrix (and Webrecorder tools in general) also violate the standard by modifying response data. It's supposed to be the raw bytes as they are sent over the network (minus TLS).

The entire WARC ecosystem is kind of a mess.

discuss

order

ikreymer|1 year ago

This isn't really true, our tools do not just modify response data for no reason!

Our tools do the best that we can with an old format that is in use by many institutions. The WARC format does not account for H2/H3 data, which is used by most sites nowadays.

The goal of our (Webreocrder) tools is to preserve interactive web content with as much fidelity as possible and make them accessible/viewable in the browser. That means stripping TLS, H2/H3, sometimes forcing a certain video resolution, etc.. while preserving the authenticity and interactivity of the site. It can be a tricky balance.

If the goal is to preserve 'raw bytes sent over the network' you can use Wireshark / packet capture, but your archive won't necessarily be useful to a human.

CorentinB|1 year ago

He didn't say you modify the data for no reason, he said you violate the standard. Which is true. You could respect it, but you don't.