top | item 39814603

(no title)

Black616Angel | 1 year ago

I never liked articles about how you should replace CSV with some other format while pulling some absolutely idiotic reasons out of their rear...

1. CSV is underspecified Okay, so specify it for your use case and you're done? E.g use rfc3339 instead of the straw-man 1-1-1970 and define how no value looks like, which is mostly an empty string.

2. CSV files have terrible compression and performance Okay, who in their right mind uses a plain-text-file to export 50gb of data? Some file systems don't even support that much. When you are at the stage of REGULARLY shipping around files this big, you should think about a database and not another filetype to send via mail. Performance may be a point, but again, using it for gigantic files is wrong in the first place.

3. There's a better way (insert presentation of a filtype I have never heard of) There is lots of better ways to do this, but: CSV is implemented extremely fast, it is universally known unlike Apache Parquet (or Pickle or ORC or Avro or Feather...) and it is humanly readable.

So in the end: Use it for small data exports where you can specify everything you want or like everywhere, where you can import data, because most software takes CSV as input anyway.

For lots of data use something else.

Friends don't let friends write one-sided articles.

discuss

order

HayBale|1 year ago

2. You would be surprised, especially on the science/university level in stat, health or bioinfo. Unfortunately a lot of people go with the path of least resistance and use excel propertiary format or csv for everything.

Like NHS with their post covid data due to excel limitations or gene name conversion problems in sci journals.

Same happens with stupid amount of laboratory management things or bioinformatics tools.

Honestly obviously the article is biased but we should at least think about moving away from csv in non customer facing fronts.

Small files? Json Big files? SQLlite or parquet.

alserio|1 year ago

I agree with your other points but the first point misses the mark. Even you specify a format, you cannot use the file for exporting data between systems and organizations if they don't all agree on that format. CSV does not have a reasonable way to encode that is using a specific spec. I can open your data with my tools and silently misinterpret it. But if you are only exporting data between yourself, that's another story.

2devnull|1 year ago

You can use excel as the lingua franca. Also give them a row/col counts. Most problems solved in two easy steps.

javcasas|1 year ago

For lots of data zip the csv. For REALLY lots of data, think something different.