(no title)
tn13 | 4 years ago
String length when defined #2 is also fairly complex when it comes to some languages such as Hindi. There are some symbols in Hindi which are not characters and can never exist as their own character but when placed next to a character they create a new character. So when you type them out on a keyboard you have to bit two keys but only one character will appear on screen. Unicode too represents this as two separate characters but for human eye it is one.
त + या = त्या
Following code will print 4
console.log("त्या".length);
DemocracyFTW|4 years ago
a.k.a. 'ligatures', as in f+f+i -> U+fb03 'ffi'
nisegami|4 years ago
Edit: to further illustrate my point, in the ligatures I'm familiar with (including the ones in your link), the component characters exist standalone and can be used on their own, unlike GP's example.
signal11|4 years ago
"त्या".count // 1
"त्या".unicodeScalars.count // 4
"त्या".utf8.count // 12
Javascript's minimal library is of course not great, but there are libraries which can help, e.g. grapheme-splitter, although it's not language-aware by design, so in this instance it'll return 2.
graphemeSplitter.countGraphemes("त्या") // 2
professoretc|4 years ago
int_19h|4 years ago
raiph|4 years ago