top | item 3346125

CommonCrawl: an open repository of web crawl data that is universally accessible

92 points| abhishektwr | 14 years ago |commoncrawl.org | reply

8 comments

order
[+] fungi|14 years ago|reply
If you into said things then maybe http://yacy.net/ (p2p crawler and search) will be useful to you as well.
[+] Titanous|14 years ago|reply
The latest data available is from 2010-09-25, which seems to be too old to be useful for most things.
[+] rgrieselhuber|14 years ago|reply
It would be great to hear more about the tools they are using to crawl and potentially open it up to more people who want to contribute computing resources.
[+] Aloisius|14 years ago|reply
I hear a lot of people are crunching on CommonCrawl's data. It'll be interesting the type of stuff people come up with!