(no title)
dbro | 12 years ago
It is indeed a simple state machine (see https://github.com/dbro/csvquote/blob/master/csvquote.c), and it translates CSV/TSV files into files which follow the spirit of what's described in the original article in this thread.
But instead of using control characters as separators, it uses them INSIDE the quoted fields. This makes it easy to work with the standard UNIX text manipulation tools, which expect tabs and newlines to be the field and record separators.
The motivation for writing the tool was to work with CSV files (usually from Excel) that were hundreds of megabytes. These files came from outside my organization, and often from nontechnical people - so it would have been difficult to get them into a more convenient format. That's the killer feature of the CSV/TSV format: it's readable by the large number of nontechnical information workers, in almost every application they use. I can't think of a file format that is more widely recognized (even if it's not always consistently defined in practice).
No comments yet.