top | item 13877266

(no title)

digler999 | 9 years ago

what makes other encodings hard ? The two things that come to my mind are byte length and comparison function. If the encoding had a fixed-length byte length, then it should be just swapping n-bytes at a time instead of 1-byte. What else is difficult about non-ascii encodings ?

discuss

order

detaro|9 years ago

e.g. in UTF-8 a codepoint is encoded in varying byte lengths (so you have to split into codepoints and then reverse), and, a lot more difficult, a sequence of multiple codepoints can be combined to form a symbol. Simplest case would be something like "รถ" encoded as "o" (U+006F) followed by a combining diaeresis (U+0308).

Other fun special cases: ๐Ÿ‡บ๐Ÿ‡ธ is U+1F1FA REGIONAL INDICATOR SYMBOL LETTER U, followed by U+1F1F8 REGIONAL INDICATOR SYMBOL LETTER S and should if possible be displayed as a US flag (otherwise falls back to text "US"), should reversing it create ๐Ÿ‡ธ๐Ÿ‡บ (replacing the flag with the characters "SU"), or still show the flag? (I'm not even sure if there isn't a case where both are valid country codes and it would change to a different flag?)

Similarly, Emoji can be formed from a sequence with combining characters inbetween, which don't display correctly if reversed codepoint by codepoint.

ryandrake|9 years ago

Some examples: If you're dealing with UTF-8, which is very common, you need to handle variable-length characters. If you're working with UTF-16 you need to handle surrogate pairs. Neither are the end of the world, but the basic "array walking" string reversal methods you'd expect from a white boarding session wouldn't work.