top | item 38237171

(no title)

:-) The UTF-8/Unix FAQ and existing terminal emulators don't agree with you here. As you say, there's no spec for this, but here's what Kuhn's FAQ says (https://www.cl.cam.ac.uk/~mgk25/unicode.html#term):

"UTF-8 still allows you to use C1 control characters such as CSI, even though UTF-8 also uses bytes in the range 0x80-0x9F. It is important to understand that a terminal emulator in UTF-8 mode must apply the UTF-8 decoder to the incoming byte stream before interpreting any control characters. C1 characters are UTF-8 decoded just like any other character above U+007F."

The existing ANSI terminal emulators that support UTF-8 input and C1 controls seem to agree on this (VTE, GNU screen, Mosh). xterm, urxvt, tmux, PuTTY, and st don't seem to support C1 controls in UTF-8 mode. So I don't think poking holes in the UTF-8 decoder is necessary, especially since allowing C1 in UTF-8 mode is rare anyway.

discuss

No comments yet.