top | item 37747625

(no title)

ssokolow | 2 years ago

*nod*

Rust was given as one of the examples and Rust's .len() behaviour is chosen based on three very reasonable concerns:

1. They want the String type to be available to embedded use-cases, where it's not reasonable to require the embedding of the quite large unicode tables needed to identify grapheme boundaries. (String is defined in the `alloc` module, which you can use in addition to `core` if your target has a heap allocator. It's just re-exported via `std`.)

2. They have a policy of not baking stuff that is defined by politics/fiat (eg. unicode codepoint assignments) into stuff that requires a compiler update to change. (Which is also why the standard library has no timezone handling.)

3. People need a convenient way to know how much memory/disk space to allocate to store a string verbatim. (Rust's `String` is just a newtype wrapper around `Vec<u8>` with restricted construction and added helper functions.)

That's why .len() counts bytes in Rust.

Just like with timezone definitions, Rust has a de facto standard place to find a grapheme-wise iterator... the unicode-segmentation crate.

discuss

No comments yet.