top | item 27143689

(no title)

reom_tobit | 4 years ago

Tangent, but I’ve never really understood what the issue is with unicode support in various languages.

Does anyone have any idea why it’s such a contentious issue?

discuss

order

tialaramex|4 years ago

Unicode reflects a reality about human writing systems. They are very complicated. This is more or less guaranteed to result in Unicode being contentious.

After all, it's obvious features my native language has are important and need to be first class APIs in the standard library, while any features that language doesn't use has aren't important and the standard library shouldn't be clogged up with anything so useless. Also things that are easy to do for my preferred writing system must be supported, if the easy way to implement them doesn't work for some other widely used languages, just ignore that, those people don't matter anyway.

chubot|4 years ago

Basically because there's 2 major ways to do it: the Windows way and the Unix way (UTF-8). Unicode has the concept of encodings and it doesn't tell you which one to use.

The Unix way is winning on the web, and I think Microsoft has made some moves toward UTF-8, but I don't understand what they are exactly:

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#W...

JavaScript and Java inherited the Windows way. Go and Rust use the Unix way (and apparently OCaml too). Python supports both which some say is a needless source of complexity, but it is flexible if you know how to use it.

reom_tobit|4 years ago

Awesome, thanks for the info. Sent me down a rabbit hole for a little bit.