top | item 44765635

(no title)

osmsucks | 7 months ago

JavaScript is: https://mathiasbynens.be/notes/javascript-encoding

discuss

demurgos|7 months ago

> The ECMAScript/JavaScript language itself, however, exposes characters according to UCS-2, not UTF-16.

The native JS semantics are UCS-2. Saying that it's UTF-16 is misleading and confuses charset, encoding and browser APIs.

Ladybird is probably implementing support properly but it's annoying that they keep spreading the confusion in their article.

dzaima|7 months ago

It's not cleanly one or the other, really. It's UCS-2-y by `str.length` or `str[i]`, but UTF-16-y by `str.codePointAt(i)` or by iteration (`[...str]` or `for (x of str)`).

Generally though JS's strings are just a list of 16-bit values, being intrinsically neither UCS-2 nor UTF-16. But, practically speaking, UTF-16 is the description that matters for everything other than writing `str.length`/`str[i]`.

grishka|7 months ago

And most mainstream GUI toolkits are, as well. It can be said that UTF-16 is the de-facto standard in-memory representation of unicode strings, even though some runtimes (Rust) prefer UTF-8.

0points|7 months ago

> And most mainstream GUI toolkits are, as well.

No. Windows use UTF-16 internally. Most GUI toolkits do not.

> It can be said that UTF-16 is the de-facto standard in-memory representation of unicode strings, even though some runtimes (Rust) prefer UTF-8.

No, that wouldn't be true at all.

Your technical merit seem to be limited by your Windows experience, and even that is dated.

Microsoft recommends UTF-8 over UTF-16 since 2019 [1].

1: https://learn.microsoft.com/en-us/windows/apps/design/global...