top | item 46257235 (no title) ccgreg | 2 months ago commoncrawl.orgOur public web dataset goes back to 2008, and is widely used by academia and startups. discuss order hn newest pdimitar|2 months ago I always wanted to ask:- How often is that updated?- How current is it at any point in time?- Does it have historical / temporal access i.e. be able to check the history of a page a la The Internet Archive? ccgreg|2 months ago - monthly- it's a historical archive, the concept of "current" is hard to turn into a metric- not only is our archive historical, it is included in the Internet Archive's wayback machine.
pdimitar|2 months ago I always wanted to ask:- How often is that updated?- How current is it at any point in time?- Does it have historical / temporal access i.e. be able to check the history of a page a la The Internet Archive? ccgreg|2 months ago - monthly- it's a historical archive, the concept of "current" is hard to turn into a metric- not only is our archive historical, it is included in the Internet Archive's wayback machine.
ccgreg|2 months ago - monthly- it's a historical archive, the concept of "current" is hard to turn into a metric- not only is our archive historical, it is included in the Internet Archive's wayback machine.
pdimitar|2 months ago
- How often is that updated?
- How current is it at any point in time?
- Does it have historical / temporal access i.e. be able to check the history of a page a la The Internet Archive?
ccgreg|2 months ago
- it's a historical archive, the concept of "current" is hard to turn into a metric
- not only is our archive historical, it is included in the Internet Archive's wayback machine.