top | item 9226122

(no title)

mbessey | 11 years ago

Actually, Cyrillic is an interesting case. The Unicode standard does define completely-separate codepoints for the Cyrillic letters, even for the ones that look "just like" letters of the Latin Alphabet. Greek letters that look exactly like Latin letters get the same treatment.

It's difficult to come up with a logical explanation for why European languages that use their own alphabet get their own codepoints, but ideographic languages need to be "unified", even though the actual letters as used in those languages look different.

The "Han unification" was fundamentally a bad idea, and persists for historical reasons. Back when (some) people thought a fixed-width 16-bit character representation would be "enough", it made sense to try to reduce the number of "redundant" code points. Now that Unicode has expanded to a much-larger code space, I would think they'd choose differently.

Unfortunately, that kind of sweeping change is unlikely any time soon.

discuss

order