(no title)
shwouchk | 9 months ago
turns out that unlike most webpages, the pdf version is only a single page of what is visible on screen.
turns out also that opening the warc immediately triggers a js redirect that is planted in the page. i can still extract the text manually - it’s embedded there - but i cannot “just open” the warc in my browser and expect an offline “archive” version - im interacting with a live webpage! this sucks from all sides - usability, privacy, security.
Admittedly, i don’t use webrecorder - does it solve this problem? did you verify?
weinzierl|9 months ago
Unfortunately there are sites where it does not work.
eMPee584|9 months ago