top | item 10068947

(no title)

mrocklin | 10 years ago

Flat CSV or JSON files are hard to parse. Fast CSV parsers and gzip decompression both run at around 100MB/s. If you want to get faster than this you'll need to use better (ideally binary) formats.

This notebook might interest you: http://nbviewer.ipython.org/gist/mrocklin/c16c5c483b2b9859de... , particularly the sections starting at "Eleven minutes is a long time." It compares CSV costs (minutes) to custom binary storage formats (seconds) on a 20 GB dataset.

discuss

order

No comments yet.