thejefflarson's comments

thejefflarson | 14 years ago | on: Introducing Simple Tiles: ProPublica's new Mapping Library

Not really, though I'd say it's smaller than Mapnik. TileMill (backed by Mapnik) is an amazing GUI for styling maps, and TileStache is an awesome caching server. But, really at it's heart all SimpleTiles does is convert spatial data into an image, and does that by relying on GDAL and Cairo as much as possible.

thejefflarson | 14 years ago | on: A Case Against Using CoffeeScript

I've seen terrible assembly but I've seen much worse C. Most programmers have relatively no sense of organization, and when unleashed with C create things far more nasty than their assembly counterparts.

nasty: https://github.com/mirrors/gcc/blob/master/gcc/c-family/c-co...

nasty: https://github.com/mirrors/gcc/blob/master/gcc/c-family/c-le...

But you get the idea. I'm merely pointing out that compilers are tricky and complicated beasts. And especially in the rewriter CS has to jump through hoops to disambiguate.

thejefflarson | 15 years ago | on: Announcing Google Refine 2.0, a power tool for data wranglers

I didn't grab the flash content, but if I remember correctly, it was a flash movie that wrapped a PDF that Dan then OCRed and cleaned up with Refine. The coolest part was that the pdf was in grid form, so Dan wrote an ImageMagick script that split it into individual cells and then OCRed each cell (for better results).

EDIT: We haven't had any contact with Wolfram|Alpha but maybe we should reach out.

thejefflarson | 15 years ago | on: Announcing Google Refine 2.0, a power tool for data wranglers

So I'm on ProPublica's web team -- the organization mentioned in the first video -- and we deal with the types of messy data Refine is made for on a day to day basis.

We've been using it pretty much daily for about 5 months now and cleaning messy government data used to be time consuming and destructive, with google Refine it's so easy and fast to join, cleanup and do rudimentary analysis on said data.

It especially shines when you have to merge many disparate data sets into one. My colleague, Dan Nguyen, did just that for our Dollars for Doctors app:

http://projects.propublica.org/docdollars/

and he scraped the data from reports like this:

http://www.pfizer.com/responsibility/working_with_hcp/paymen...

(one company even put the disclosures up as a flash movie).

Of course we could write scripts, use grep/awk/sed or import it into a database, but Refine is really it. I encourage you to give it a try if you have questionable data you'll need to clean.

page 2