(no title)
dminuoso | 4 years ago
In database or more subtly various filesystems code the notion of bytes or codepoints might be more relevant.
By the way, what about ASCII control characters? Does carriage return have some intrinsic or clearly well defined notion of „length“ to you?
What about digraphs like ij in Dutch? Are they a singular grapheme cluster? Is this locale dependent? Do you have all scripts and cultures in mind?
frosted-flakes|4 years ago
dotancohen|4 years ago
And some clients expect that whitespace is not included in string length. "I asked to put 50 letters in this box, why can I only put 42?" would not be an unexpected complaint when working with clients. Even if you manage to convey that spaces are something funny called "characters", they might not understand that newlines are characters as well. Or emojis.