top | item 33873417

(no title)

sorobahn | 3 years ago

This is so cool! I looked at the transcript for day 5 [1] and realized how I learned the same thing regarding Rust strings not being indexable with integers due to them being a series of grapheme clusters. I didn't use ChatGPT and had to dig through the crate documentation [2] and look at stackoverflow [3], but Simon was able to get an equally great explanation by simply asking "Why is this so hard?" which I could relate to very much coming from C++ land. Now, the ability to trust these explanations is another issue, but I think it's interesting to imagine a future where software documentation is context aware to an individual's situation. The Rust docs are amazing and you can see they bring up indexing in the "UTF-8" section, but it requires me reading a section of the doc which I may not have realized was the reason for my frustration with a compiler error regarding indexing. Even if ChatGPT is not "intelligent" (whatever that means), its ability to take a context and almost act like an expert who's read every page of documentation that can point to you into a productive direction is very helpful.

[1]: https://github.com/simonw/advent-of-code-2022-in-rust/issues... [2]: https://doc.rust-lang.org/std/string/struct.String.html#utf-... [3]: https://stackoverflow.com/a/24542502

discuss

order

63|3 years ago

I don't know if this is helpful at all, but "strings are not indexable because of their underlying implementation and the complexity of UTF-8" is drilled into the reader very hard by the rust book. Obviously to each their own, but I found I had a much easier time understanding the language by working through the book than by treating rust like any other language and randomly guessing and googling my way through errors.

simonw|3 years ago

I've not looked at the book or any of the documentation at all yet, but I'm getting the distinct impression that it's a cut above documentation for many other languages. I'm going to start working through that too.

tialaramex|3 years ago

Note that for AoC, it will often be a good idea to say you want bytes, not chars, and of course a slice of bytes is just trivially indexable. You can make "byte string literals" and "byte literals" very easily in Rust, just with a b-prefix and the obvious restriction that only ASCII works since the multi-byte characters are not single bytes. The type of a "byte literal" is u8, a byte, and the type of a "byte string literal" is &'static [u8; N] a reference to an array of bytes which lives forever.

  let s1 = "[[..]]";
  // Rats, indexing into s1 doesn't work †

  let s2 = b"[[..]]";
  // s2 is just an array of bytes
  assert_eq!(s2[4], b']');
† Technically it works fine, it's just probably not what you wanted

guitarbill|3 years ago

Unicode/text is complicated, and there's a lot of terminology. Describing Rust strings as "a series of grapheme clusters" is maybe confusing, and `chars()` doesn't allow iterating over grapheme clusters.

As the docs point out, they are simply types that either borrow or own some memory (i.e. bytes), and the types/operations guarantee those bytes are valid UTF-8/Unicode code points (aka. characters). A code point is one to four bytes when encoded with UTF-8.

Grapheme clusters are more complicated. Roughly speaking they are a collection of code points that match more what humans expect (and depend on the language/script), e.g. `ü` can actually be two code points `u` + `¨`, and splitting after `u` could be nonsensical. AFAIK, Rust's standard library doesn't really provide a way to deal with grapheme clusters? EDIT: it used to, but it got deprecated and removed [0]

So TL;DR: 1-4 bytes => 1 character, 2+ characters => maybe 1 grapheme cluster. Hope that helps either you, or someone else reading this.

[0] https://github.com/rust-lang/rust/pull/24428

estebank|3 years ago

> The Rust docs are amazing and you can see they bring up indexing in the "UTF-8" section, but it requires me reading a section of the doc which I may not have realized was the reason for my frustration with a compiler error regarding indexing.

I would say this is a bug on the compiler error: it should be making it clear not only that you can't index on the string, but also why. If the explanation is too long, it should be linking to the right section of the book.