top | item 33229084

(no title)

erutuon | 3 years ago

A JavaScript string is conceptually an array of 16-bit unsigned integers (https://tc39.es/ecma262/multipage/ecmascript-data-types-and-...). String indexing and substring operations treat strings this way, and they couldn't change the model easily without breaking code. If you index a string containing an emoji with a code point greater than U+FFFF, you get string containing an unpaired surrogate code unit, which makes the string UCS-2, not UTF-16. I don't know how it's implemented memory-wise in the various JavaScript engines and am not familiar with typed arrays.

discuss

order

No comments yet.