top | item 3346125 CommonCrawl: an open repository of web crawl data that is universally accessible 92 points| abhishektwr | 14 years ago |commoncrawl.org | reply 8 comments order hn newest [+] [-] abhishektwr|14 years ago|reply Just a pointer, the code for CommonCrawl Project is available on Github https://github.com/commoncrawl/commoncrawl [+] [-] pooyak|14 years ago|reply thread on HN from when common crawl was announced, interesting info there: http://news.ycombinator.com/item?id=3209690 [+] [-] fungi|14 years ago|reply If you into said things then maybe http://yacy.net/ (p2p crawler and search) will be useful to you as well. [+] [-] Titanous|14 years ago|reply The latest data available is from 2010-09-25, which seems to be too old to be useful for most things. [+] [-] rgrieselhuber|14 years ago|reply It would be great to hear more about the tools they are using to crawl and potentially open it up to more people who want to contribute computing resources. [+] [-] unknown|14 years ago|reply [deleted] [+] [-] emilis_info|14 years ago|reply This one may be also interesting for open data devs: http://scraperwiki.com/ [+] [-] Aloisius|14 years ago|reply I hear a lot of people are crunching on CommonCrawl's data. It'll be interesting the type of stuff people come up with! [+] [-] nithinag|14 years ago|reply This looks really nice!
[+] [-] abhishektwr|14 years ago|reply Just a pointer, the code for CommonCrawl Project is available on Github https://github.com/commoncrawl/commoncrawl
[+] [-] pooyak|14 years ago|reply thread on HN from when common crawl was announced, interesting info there: http://news.ycombinator.com/item?id=3209690
[+] [-] fungi|14 years ago|reply If you into said things then maybe http://yacy.net/ (p2p crawler and search) will be useful to you as well.
[+] [-] Titanous|14 years ago|reply The latest data available is from 2010-09-25, which seems to be too old to be useful for most things.
[+] [-] rgrieselhuber|14 years ago|reply It would be great to hear more about the tools they are using to crawl and potentially open it up to more people who want to contribute computing resources.
[+] [-] emilis_info|14 years ago|reply This one may be also interesting for open data devs: http://scraperwiki.com/
[+] [-] Aloisius|14 years ago|reply I hear a lot of people are crunching on CommonCrawl's data. It'll be interesting the type of stuff people come up with!
[+] [-] abhishektwr|14 years ago|reply
[+] [-] pooyak|14 years ago|reply
[+] [-] fungi|14 years ago|reply
[+] [-] Titanous|14 years ago|reply
[+] [-] rgrieselhuber|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] emilis_info|14 years ago|reply
[+] [-] Aloisius|14 years ago|reply
[+] [-] nithinag|14 years ago|reply