top | item 39814670

(no title)

fredguth | 1 year ago

What is the encoding of the text file? UTF8, windows-1252?

What is the decimal delimiter “.”, “,”?

Most csv users don’t even know they have to be aware of all of these differences.

discuss

order

SAI_Peregrinus|1 year ago

The main issue is that "CSV" isn't one format with a single schema. It's one format with thousands of schemas and no way to communicate them. Every program picks its own schema for CSVs it produces, some even change the schema depending on various factors (e.g. the presence or absence of a header row).

RFC 4180 provides a (mostly) unambiguous format for writing CSVs, but because it discards the (implied) schema it's useless for reading CSVs that come from other programs. RFC 4180 fields have only one type: text string in US-ASCII encoding. There are no dates, no decimal separators, no letters outside the US-ASCII alphabet, you get nothing! It leaves the option for the MIME type to specify a different text encoding, but that's not part of the resulting file so it's only useful when downloading from the internet.

averms|1 year ago

> RFC 4180 provides a (mostly) unambiguous format for writing CSVs,

What are the ambiguities in RFC 4180?