That seems like a really unfortunate design decision. I used to think that Java's use of UTF-16 for strings was just a problematic legacy thing, but compared to this it seems quite good. Strings are pretty high performance and there are no complex calculations to do indexing or bounds checks. And in Java 9 the JVM can switch between UTF-16 or Latin1 encodings on the fly, which both uses less RAM and speeds things up simultaneously. There are no memory safety issues caused by character encodings.
burntsushi|8 years ago
I do have my unrelated niche complaints about Rust's string story (and I have vague plans to resolve them), but Rust's string implementation is my favorite among any other language I've used.
kibwen|8 years ago
zigzigzag|8 years ago
UTF-8 is a fine transport format, but for raw runtime performance it's obviously going to be an issue if you ever need to iterate over characters, do substring matches, things like that because you can't do constant time "next char" or indexing.
UTF-16 doesn't let you do that either in the presence of combining characters, but they're pretty rare and for many operations it doesn't really matter.