top | item 45475825

(no title)

garethrowlands | 4 months ago

On strings in Ada vs Rust. Ada's design predates Unicode (early 1980s vs 1991), so Ada String is basically char array whereas Rust string is a Unicode text type. This explains why you can index into Ada Strings, which are arrays of bytes, but not into Rust strings, which are UTF8 encoded buffers that should be treated as text. Likely the Rust implementation could have used a byte array here.

discuss

order

debugnik|4 months ago

> Ada String is basically char array

Worse, the built-in Unicode strings are arrays of Unicode scalars, effectively UTF-32 in the general case. There's no proper way to write UTF-8 string literals AFAIK, you need to convert them from arrays of 8, 16 or 32 bit characters depending on the literal.

gjadi|4 months ago

How is the internal representation an issue? Java string are utf16 internally and it's doesn't matter how you write your code nor what's the targeted format.

tialaramex|4 months ago

I mean you can index into Rust's strings, it's just that you probably don't want that:

    "Clown"[2..5]  // is "own"
Notice that's a range, Rust's string slice type doesn't consider itself just an array (as the Ada type is) and so we can't just provide an integer index, the index is a range of integers to specify where our sub-string should begin and end. If we specify the middle of a Unicode character then the code panics - don't do that.

Yes, since AoC always uses ASCII it will typically make sense to use &[u8] (the reference to a slice of bytes) and indeed the str::as_bytes method literally gives you that byte slice if you realise that's what you actually needed.