You will never define an unambiguous way to index into a Unicode string by "characters" with less than 10 pages of specification. When two sides of the API interpret this contract differently, a security vulnerability is a very possible outcome.
Use unambiguous delimiters, JSON, XML, or even byte offsets into normalized UTF-8 instead. But don't do this please.
If you make an API, please don't do this. If you have to include indexes into a Unicode string, then make them indexes into a binary string in a known encoding. This works equally well for all encodings, and won't tempt people to do something as preposterous as using UCS-2.
marshray|13 years ago
Use unambiguous delimiters, JSON, XML, or even byte offsets into normalized UTF-8 instead. But don't do this please.
pjscott|13 years ago
mikeash|13 years ago