top | item 22900957

(no title)

It's too bad Unicode wasn't designed around the concept of easily-recognizable grapheme clusters and "write-only" [non-round-trip] forms that are normalized in various ways. A text layout engine shouldn't have to have detailed knowledge of rules that are constantly subject to change, but if there were a standard representation for a Unicode string where all grapheme clusters are marked and everything is listed in left-to-right order, and an OS function was available to convert a Unicode string into such a form, a text-layout using that OS routine would be able to accommodate future additions to the character set and and glyph-joining rules without having to know anything about them.

discuss

a1369209993|5 years ago

You can't do that without commiting to not supporting pathological text, otherwise you're stuck adding new special cases to the layout engine every update anyway.

I do have some ideas for a better encoding (like, I assume, anyone competent with sufficient free time and interest in text encoding), but there's a lot of reluctance to put effort into something that's already completely eclipsed by a technically inferior but not completely unusable alternative, so I've had it mostly shelved.