top | item 40587891

(no title)

asabil | 1 year ago

Yes, but you don’t end up with different glyphs. Arabic script has letter shaping, that means a letter can have up to 4 shapes based on its position within the word. If you chop off the last letter, the previous one which used to have a “middle” position shape suddenly changes into “terminal” position shape.

discuss

order

CRConrad|1 year ago

I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in Unicode? (Take a string of ÅÄÖåäöÜü and chop it off at various byte limits and see.)

gmueckl|1 year ago

This is just the general behavior of truncating strings by code point when they contain decomposed glyphs. This can also impact accents etc.

panzi|1 year ago

I don't remember the details, only that it was a bigger deal than with umlauts. I'll see if I can find the talk again.