top | item 3477771

Csvkit - command line utilities for working with csv files

90 points| cpenner461 | 14 years ago |csvkit.readthedocs.org | reply

15 comments

order
[+] knowtheory|14 years ago|reply
It's worth noting that this is a tool that was built by Chris Groskopf (he's @onyxfish on twitter) while he was working at the Chicago Tribune.

Chris is now working on a Knight Foundation funded project called PANDA to build a FOSS search appliance for tabular data (especially for CSV and spreadsheet based files) intended for deployment in news rooms (http://www.pbs.org/idealab/2011/11/panda-project-releases-a-... ).

You can test out the alpha for panda online here: http://alpha.pandaproject.net/

(Chris is also a super nice guy)

[+] zvrba|14 years ago|reply
Mucking around with CSV files in command-line is painful. Been there, done that, got annoyed by many limitations of that approach, bit the bullet and learned R. It is somewhat weird language, but it was one of the best decisions I ever made in my professional career: now it is my go-to tool for data analysis and plotting.

CSV data is easily imported into R where you can easily analyze it, transform it, plot it -- everything from a unified command-line interface (there also exist GUIs, but I haven't used them). reshape and plyr packages are worth learning too. There's also an emacs package for interacting with R (ESS), and it significantly eases interaction; works also under Win, and is what I use in my work with R.

TL;DR: nice project, but it's a toy compared to what you can get from R. (Re unix philosophy: I'm the type of person that likes to get the job done and I therefore very often choose pragmatism over idealism.)

[+] archangel_one|14 years ago|reply
I don't think the intention of this tool is for doing statistical analysis of data, just for manipulating it at a shell prompt. For example, csvcut is similar to Unix cut in that it's a binary you can pipe data through, which R isn't.
[+] veyron|14 years ago|reply
Is there a CSV AWK?

By that i mean, something that could read the header line in a CSV and automatically generate those variables for each line.

Demonstration:

    $ cat test.csv
    Field1,Field2
    1,2
    3,4
    5,6
    $ csvawk '{print Field1+Field2}' test.csv
    3
    7
    11
[+] ralph|14 years ago|reply
IIRC Aho, Weinberger, and Kernighan's excellent slim tome _The Awk Programming Language_ implements something similar; awk is used to generate the awk with Field1 replaced by $1, etc.
[+] klochner|14 years ago|reply
it would probably be a pretty easy bash script to write
[+] rwmj|14 years ago|reply
You might also want to look at csvtool (already in Fedora, Debian, RHEL, etc). It's a command line tool for doing the same thing, written as part of the OCaml CSV library.
[+] plasma|14 years ago|reply
MySQL has a CSV storage engine, just give it the file to load and you can read/write using SQL.
[+] dquigley|14 years ago|reply
Anyone know of a similar tool based on Ruby?
[+] brianobush|14 years ago|reply
These are command line utilities, why worry about the language? Just pipe data into and out of csvkit's tools and level up!
[+] lhm|14 years ago|reply
https://github.com/blambeau/alf is something close, although it does a lot more: "Alf is a commandline tool and Ruby library to manipulate data with all the power of a truly relational algebra approach."
[+] lelele|14 years ago|reply
Does not Ruby support pipelines?