This is so cool! I looked at the transcript for day 5 [1] and realized how I learned the same thing regarding Rust strings not being indexable with integers due to them being a series of grapheme clusters. I didn't use ChatGPT and had to dig through the crate documentation [2] and look at stackoverflow [3], but Simon was able to get an equally great explanation by simply asking "Why is this so hard?" which I could relate to very much coming from C++ land. Now, the ability to trust these explanations is another issue, but I think it's interesting to imagine a future where software documentation is context aware to an individual's situation. The Rust docs are amazing and you can see they bring up indexing in the "UTF-8" section, but it requires me reading a section of the doc which I may not have realized was the reason for my frustration with a compiler error regarding indexing. Even if ChatGPT is not "intelligent" (whatever that means), its ability to take a context and almost act like an expert who's read every page of documentation that can point to you into a productive direction is very helpful.[1]: https://github.com/simonw/advent-of-code-2022-in-rust/issues...
[2]: https://doc.rust-lang.org/std/string/struct.String.html#utf-...
[3]: https://stackoverflow.com/a/24542502
63|3 years ago
simonw|3 years ago
tialaramex|3 years ago
guitarbill|3 years ago
As the docs point out, they are simply types that either borrow or own some memory (i.e. bytes), and the types/operations guarantee those bytes are valid UTF-8/Unicode code points (aka. characters). A code point is one to four bytes when encoded with UTF-8.
Grapheme clusters are more complicated. Roughly speaking they are a collection of code points that match more what humans expect (and depend on the language/script), e.g. `ü` can actually be two code points `u` + `¨`, and splitting after `u` could be nonsensical. AFAIK, Rust's standard library doesn't really provide a way to deal with grapheme clusters? EDIT: it used to, but it got deprecated and removed [0]
So TL;DR: 1-4 bytes => 1 character, 2+ characters => maybe 1 grapheme cluster. Hope that helps either you, or someone else reading this.
[0] https://github.com/rust-lang/rust/pull/24428
estebank|3 years ago
I would say this is a bug on the compiler error: it should be making it clear not only that you can't index on the string, but also why. If the explanation is too long, it should be linking to the right section of the book.