top | item 43490859

(no title)

beautron | 11 months ago

I also love CSV for its simplicity. A key part of that love is that it comes from the perspective of me as a programmer.

Many of the criticisms of CSV I'm reading here boil down to something like: CSV has no authoritative standard, and everyone implements it differently, which makes it bad as a data interchange format.

I agree with those criticisms when I imagine them from the perspective of a user who is not also a programmer. If this user exports a CSV from one program, and then tries to load the CSV into a different program, but it fails, then what good is CSV to them?

But from the perspective of a programmer, CSV is great. If a client gives me data to load into some app I'm building for them, then I am very happy when it is in a CSV format, because I know I can quickly write a parser, not by reading some spec, but by looking at the actual CSV file.

Parsing CSV is quick and fun if you only care about parsing one specific file. And that's the key: It's so quick and fun, that it enables you to just parse anew each time you have to deal with some CSV file. It just doesn't take very long to look at the file, write a row-processing loop, and debug it against the file.

The beauty of CSV isn't that it's easy to write a General CSV Parser that parses every CSV file in the wild, but rather that its easy to write specific CSV parsers on the spot.

Going back to our non-programmer user's problem, and revisiting it as a programmer, the situation is now different. If I, a programmer, export a CSV file from one program, and it fails to import into some other program, then as long as I have an example of the CSV format the importing program wants, I can quickly write a translator program to convert between the formats.

There's something so appealing about to me about simple-to-parse-by-hand data formats. They are very empowering to a programmer.

discuss

MarceliusK|11 months ago

Totally agree that its biggest strength is how approachable it is for quick, ad hoc tooling. Need to convert formats? Join two datasets? Normalize a weird export? CSV gives you just enough structure to work with and not so much that it gets in your way.

dkarl|11 months ago

> I know I can quickly write a parser, not by reading some spec, but by looking at the actual CSV file

This is fine if you can hand-check all the data, or if you are okay if two offsetting errors happen to corrupt a portion of the data without affecting all of it.

Also I find it odd that you call it "easy" to write custom code to parse CSV files and translate between CSV formats. If somebody give you a JSON file that isn't valid JSON, you tell them it isn't valid, and they say "oh, sorry" and give you a new one. That's the standard for "easy." When there are many and diverse data formats that meet that standard, it seems perverse to use the word "easy" to talk about empirically discovering the quirks in various undocumented dialects and writing custom logic to accommodate them.

Like, I get that a farmer a couple hundred years ago would describe plowing a field with a horse as "easy," but given the emergence of alternatives, you wouldn't use the word in that context anymore.

beautron|11 months ago

> If somebody give you a JSON file that isn't valid JSON, you tell them it isn't valid, and they say "oh, sorry" and give you a new one. That's the standard for "easy."

But it isn't that reliably easy with JSON. Sometimes I have clients give me data that I just have to work with, as-is. Maybe it was invalid JSON spat out by some programmer or tool long ago. Maybe it's just from a different department than my contact, which might delay things for days before the bureaucracy gets me a (hopefully) valid JSON.

I consider CSV's level of "easy" more reliable.

And even valid JSON can be less easy. I've had experiences where writing the high-level parsing for some JSON file, in terms of a JSON library, was less easy and more time-consuming than writing a custom CSV parser.

Subjectively, I think programming a CSV parser from basic programming primitives is just more fun and appealing than programming in terms of a JSON library or XML library. And I find the CSV code is often simpler and quicker to write.

beautron|11 months ago

> When there are many and diverse data formats that meet that standard, it seems perverse to use the word "easy" to talk about empirically discovering the quirks in various undocumented dialects and writing custom logic to accommodate them.

But the premise of CSV is so simple, that there are only four quirks to empirically discover: cell delimiter, row delimiter, quote, escaped-quote.

I think it's "easy" to peek at the file and say, "Oh, they use semicolon cell delimiters."

And it's likewise "easy" to write the "custom logic", which is about as simple as parsing something directly from a text stream gets. I typically have to stop and think a minute about the quoting, but it's not that bad.

If a programmer is practiced at parsing from a text stream (a powerful, general skill that is worth exercising), than I think it is reasonable to think they might find parsing CSV by hand to be easier and quicker than parsing JSON (etc.) with a library.