(no title)
dvanduzer | 6 years ago
Most of our parser-based crawling is done by Heritrix (crawler.archive.org) and most of our render-based crawling is done by a proxy-based recorder similar to what you theorize (https://github.com/internetarchive/brozzler).
tpmx|6 years ago