top | item 15636246

(no title)

yellowbkpk | 8 years ago

I want to give another example: I'm currently updating the Terrain Tiles dataset [0] and a very significant portion of my time was spent working around terrible government data download websites or incompatible distribution formats.

For example, the UK Government flew some excellent LIDAR data missions generating a very high resolution elevation model for most of the country and then put it behind a terrible website where you have to click 3 or 4 times to get a small piece of the data. After a couple hours I built a script to download all the pieces and put them back together into usable sized downloads [1]. Mexico's INEGI has a similar situation, so I had to dig through that to build a scraper [2]. USGS's EarthExplorer uses a terrible shopping cart metaphor for download [3].

All that is to say that the interesting piece with Earth on AWS is that this is public data that smart people are putting in a more easily accessible place for mass consumption and AWS is footing the bill. In return AWS is getting people more interested in AWS products and a set of customers that are more knowledgeable about how to process data "in the cloud".

[0] https://aws.amazon.com/public-datasets/terrain/

[1] https://github.com/iandees/uk-lidar

[2] https://github.com/iandees/mx-lidar

[3] https://earthexplorer.usgs.gov/

discuss

order

toomuchtodo|8 years ago

Would you mind if I stuck your collections of data in the Internet Archive? I appreciate AWS' efforts in this regard, but trust the Internet Archive for access and persistence more (and the Archive serves every object as a torrent).

yellowbkpk|8 years ago

It'd be great to have this data in Internet Archive. If you check out our GitHub repo [0] you can see where we coordinate finding data sources. The data I download gets composited into tiles for display on maps, so we're updating ~4 billion objects in S3.

The source data is probably the better thing to include in IA, and the GitHub repo is probably the best place to find how to mirror it. If you've got time to spend on it, you might post an issue in there and I can help point you in the right direction.

[0] https://github.com/tilezen/joerd/issues?q=is%3Aissue+is%3Aop...

sdoownek|8 years ago

Unrelated to AWS, but sites like you describe are what we used in "How not to do It" while justifying the (very minor) expense in building the Alaska Elevation Archive. http://elevation.alaska.gov/

The National Map, from the USGS, is another example of "ugh", at least it was when it was tile-by-tile download only.