top | item 43487544

(no title)

tfvlrue | 11 months ago

In the past I remember that Excel not properly handling UTF-8 encoded text in a CSV. It would treat it as raw ASCII (or possibly code page 1252). So if you opened and saved a CSV, it would corrupt any Unicode text in the file. It's possible this has been fixed in newer versions, I haven't tried in a while.

discuss

qw|11 months ago

It's related to how older versions of Windows/Office handled Unicode in general.

From what I have heard, it's still an issue with Excel, although I assume that Windows may handle plain text better these days (I haven't used it in a while)

You need to write an UTF-8 BOM at the beginning (0xEF, 0xBB, 0xBF), if you want to make sure it's recognized as UTF-8.

darthwalsh|11 months ago

Ugh, UTF-8 BOM. Many apps can handle UTF-8 but will try to return those bytes as content; maybe ours in 2015 too

I was on the Power Query team when we were improving the encoding sniffing. An app can scan ahead i.e. 64kB, but ultimately the user needs to just say what the encoding is. All the Power Query data import dialogs should let you specify the encoding.