(no title)
jasonpbecker | 3 years ago
In the case of text delimited files, it is simply too easy and too common to generate, from the start, a malformed file that other systems cannot read. Because data loss is inherent in a text-based format, folks don't even bother to check if the files they generate can be successfully interpreted by their own system. PostgreSQL, Oracle, and MS SQL will all gladly produce CSV files that cannot be read back successfully. I'm not talking about some loss of metadata, I'm talking cannot be read.
In the "real world", of course I run validations on the data I accept. A common one for me, since the files are essentially "append only" when they're updated is to check for meaningfully fewer records than previous data loads. That's my best way of determining that when the file was read, records were dropped or lost because of things like quoting being messed up or an incomplete file transfer.
It's still not great that a mismatched quote, which is quite common, doesn't even trigger a warning in the validation methods of these parsers.
nuc1e0n|3 years ago
JSON is much easier to validate and has similar ease of data transfer as with CSV. It can have minimal overhead as well if the data is stored as arrays (or an array of arrays).