top | item 7479327

(no title)

dbro | 12 years ago

Thanks for bringing up csvquote. I wrote it last year, and am happy to hear that other people find it useful.

It is indeed a simple state machine (see https://github.com/dbro/csvquote/blob/master/csvquote.c), and it translates CSV/TSV files into files which follow the spirit of what's described in the original article in this thread.

But instead of using control characters as separators, it uses them INSIDE the quoted fields. This makes it easy to work with the standard UNIX text manipulation tools, which expect tabs and newlines to be the field and record separators.

The motivation for writing the tool was to work with CSV files (usually from Excel) that were hundreds of megabytes. These files came from outside my organization, and often from nontechnical people - so it would have been difficult to get them into a more convenient format. That's the killer feature of the CSV/TSV format: it's readable by the large number of nontechnical information workers, in almost every application they use. I can't think of a file format that is more widely recognized (even if it's not always consistently defined in practice).

discuss

No comments yet.