(no title)
agamble | 8 years ago
The site will rewrite absolute image URLs as relative ones pointing to Tesoro. For example, in the Chicken Teryaki example on the homepage, the main image is sourced from the relative location "static01.nyt.com/.../28COOKING-CHICKEN-TERIYAKI1-articleLarge.jpg", which looks like it's coming from nytimes.com, but you can check in the Chrome dev console that it isn't.
Have you found an example where it isn't working correctly? If so would you mind posting it here and I'll fix it :).
ikreymer|8 years ago
Here is an example from archiving nytimes home page:
https://archive.tesoro.io/665dbeab57a4d57d8140f89cfedc69b5
If you look at network traffic (domain in devtools), you'll see that only a small % is coming from archive.tesoro.io -- the rest of the content is loaded from the live web. This can be misleading and possibly a security risk as well.
Not to discourage you, but this is a hard problem and I've been working on for years now. This area is a moving target, but we think live leaks are mostly eliminated in Webrecorder and pywb, although there are lots of areas to work on to maintain high-fidelity preservation.
If you want chat about possible solutions or want to collaborate (we're always looking for contributors!), feel free to reach out to us at support [at] webrecorder.io or find my contact on GH.
jdc0589|8 years ago