top | item 4834982

(no title)

unknown | 13 years ago

discuss

order

marshray|13 years ago

You will never define an unambiguous way to index into a Unicode string by "characters" with less than 10 pages of specification. When two sides of the API interpret this contract differently, a security vulnerability is a very possible outcome.

Use unambiguous delimiters, JSON, XML, or even byte offsets into normalized UTF-8 instead. But don't do this please.

pjscott|13 years ago

If you make an API, please don't do this. If you have to include indexes into a Unicode string, then make them indexes into a binary string in a known encoding. This works equally well for all encodings, and won't tempt people to do something as preposterous as using UCS-2.

mikeash|13 years ago

Such an API could easily indicate bytes in a UTF-8 string instead.