top | item 43488339

(no title)

qw | 11 months ago

It's related to how older versions of Windows/Office handled Unicode in general.

From what I have heard, it's still an issue with Excel, although I assume that Windows may handle plain text better these days (I haven't used it in a while)

You need to write an UTF-8 BOM at the beginning (0xEF, 0xBB, 0xBF), if you want to make sure it's recognized as UTF-8.

discuss

darthwalsh|11 months ago

Ugh, UTF-8 BOM. Many apps can handle UTF-8 but will try to return those bytes as content; maybe ours in 2015 too

I was on the Power Query team when we were improving the encoding sniffing. An app can scan ahead i.e. 64kB, but ultimately the user needs to just say what the encoding is. All the Power Query data import dialogs should let you specify the encoding.

zzo38computer|11 months ago

UTF-8 BOM is probably not a good idea for anything other than (maybe) plain text documents. For data, many (although not all) programs should not need to care about character encoding, and if they include something such as UTF-8 BOM then it will become necessary to consider the character encoding even though it shouldn't be necessary.