top | item 29303441

(no title)

dminuoso | 4 years ago

The primary problem is language/library designers/users believing there must be one true canonical meaning of the word „length“ like you just did, and that „length“ would be the best name for the given interface.

In database or more subtly various filesystems code the notion of bytes or codepoints might be more relevant.

By the way, what about ASCII control characters? Does carriage return have some intrinsic or clearly well defined notion of „length“ to you?

What about digraphs like ij in Dutch? Are they a singular grapheme cluster? Is this locale dependent? Do you have all scripts and cultures in mind?

discuss

order

frosted-flakes|4 years ago

A CR is a space-type character. A string containing it has a length of 1.

dotancohen|4 years ago

Whitespace is the term.

And some clients expect that whitespace is not included in string length. "I asked to put 50 letters in this box, why can I only put 42?" would not be an unexpected complaint when working with clients. Even if you manage to convey that spaces are something funny called "characters", they might not understand that newlines are characters as well. Or emojis.