top | item 16723801

You can't Rust that

462 points| stablemap | 8 years ago |lucumr.pocoo.org | reply

117 comments

order
[+] pcwalton|8 years ago|reply
I really like the way you captured one of the fundamental differences between Rust and C++ as "Things Move". That's an interesting way to summarize it that I hadn't really considered before—and I designed a lot of that system :)
[+] jnordwick|8 years ago|reply
Is is even remotely true though? C++ probably moves things around more than rust, and I thought rust would want to reduce cache churn. It isn't like GC were things magically change locations.

I'm not really sure I understand what he's getting at with that description.

[+] phkahler|8 years ago|reply
If pointers are useless, how do you create complex data structures. In a C++ program I have a struct that is nothing but 5 pointers (4 now since 2 can be stored as their XOR). I'm starting to wonder about this Rust thing that's been sounding so awesome...
[+] jnordwick|8 years ago|reply
I don't like the Things Move example. I'm not sure how true the general statement is (I'd never thought of it that way, but it isn't like how GC moves things around, and I'm not sure things are even more than in C++ -- I thought rust reduced unnecessary moves because would kill cache performance), but the example isn't entirely correct from my perspective.

Return values that fit into a register will be returned in a register, and his example is an 8 byte struct, so that returns in a register. Return values larger than a register will add an implicit first argument that is a pointer to memory where the return value should be written to. In that sense, it is very similar to C++ in that you are initializing into an allocated buffer.

As for "Refcounts are not Dirty", I would greatly disagree. Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust, and I feel they are giving away the performance many are fighting for by adding all these little inefficiencies to idiomatic rust. Add some refcounting here, add a Box or other indirection there, a chained monadic interface that can't short circuit and has to continually do error/null checks, etc... Soon it is death by a thousand papercuts. People fight hard for that extra 5% in performance only to have it taken away from them in interface and language issues.

Edit: Forgot about handles. Ugh. Completely unacceptable when you want to grow your tree data structure and you have to do a realloc and basically copy every node. If your tree is complete, then you are copying the whole tree every time you start a new level. The conclusions sound more like ugly hacks, than what you would properly design.

[+] pcwalton|8 years ago|reply
> Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust, and I feel they are giving away the performance many are fighting for by adding all these little inefficiencies to idiomatic rust. Add some refcounting here

The performance hit isn't as it bad as it seems, because you only adjust the reference counts when you actually take an extra reference. This is different from shared_ptr in C++, because C++ automatically calls the copy constructor and so it's easy to end up with lots of reference count traffic.

Observe that, if we assume that the time spent in malloc() and free() dominates the time spent to adjust one reference count (which is a safe assumption), then the additional time overhead of Rc with a single owner is effectively zero.

> add a Box or other indirection there

Why do you ever need to Box unnecessarily in Rust? This is more of an issue with C++, where shared_ptr has an extra indirection and interior pointers and "new" encourage heap allocation.

I actually think that Rust in practice has the opposite problem: people are afraid to Box when they shouldn't be, causing unnecessary memcpy traffic.

> a chained monadic interface that can't short circuit and has to continually do error/null checks

How is this more overhead than in C?

> Edit: Forgot about handles. Ugh. Completely unacceptable when you want to grow your tree data structure and you have to do a realloc and basically copy every node. If your tree is complete, then you are copying the whole tree every time you start a new level. The conclusions sound more like ugly hacks, than what you would properly design.

Yeah, I would like to make a crate with an interface similar to petgraph but that actually mallocs every node separately. This should be readily doable.

I think the reason why nobody has made this crate yet is that copying the nodes on growth isn't as big of a concern in practice as it might initially seem, because the time spent in the growth case is amortized and made up for by extremely fast allocation of new nodes.

[+] therein|8 years ago|reply
The more I code in Rust, the more I reluctantly agree with this statement of yours. I wish you weren't right but I think you have a point. Something needs to be done about this before too much legacy is accumulated with this pattern.

> Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust, and I feel they are giving away the performance many are fighting for by adding all these little inefficiencies to idiomatic rust. Add some refcounting here, add a Box or other indirection there, a chained monadic interface that can't short circuit and has to continually do error/null checks, etc... Soon it is death by a thousand papercuts. People fight hard for that extra 5% in performance only to have it taken away from them in interface and language issues.

[+] Manishearth|8 years ago|reply
> Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust

Every single large, "modern", C++ codebase I've worked with uses way more refcounts than is typical in Rust.

(and it's worse because C++ refcounting patterns involve more refcount churn due to how they're designed on the copy constructor, and shared_ptr uses atomics even when unnecessary. Though these large codebases tend to use a slightly different design that lets them decide atomicity per-type)

[+] the_mitsuhiko|8 years ago|reply
If a value returned from a function actually moves or not is currently up for Rust to optimize. It's not something you can depend on.

About the refcounts: since the counting is explicit (calls to clone()) they at least in my experience don't really show up. Most of the refcounted objects I deal with bump the refcounts once when some task spawns and decrements it when it ends. I have yet to see refcounts to change in hot code paths.

//EDIT for your edit:

> Edit: Forgot about handles. Ugh. Completely unacceptable when you want to grow your tree data structure and you have to do a realloc and basically copy every node.

Sure, but that's not my point anyways. At any point you can fall down to writing unsafe code and building a safe abstraction on top of it. This is to help developers not run into walls. I don't think that handles are the best thing invented but I don't think "well you can't do that in Rust until we some time in the future" and not provide an alternative is a particularly good suggestion.

[+] z3t4|8 years ago|reply
I'm not a real programmer (as in someone who do not write low level code), but do real programmers actually rely on pointers - knowing that the data might move or change !? (I program in JavaScript where all values are immutable)
[+] a_humean|8 years ago|reply
edit: don't vote the guy/gal down just because they are admitting some ignorance and seeking clarity

I would quickly disabuse yourself of the notion that all values are immutable in JavaScript, as otherwise you will cause yourself and colleagues a lot of pain in future. As someone that has to write or maintain a lot of JavaScript, saying that I don't have to think about data changing over the course of a program doesn't strike me as true at all (I wish it was).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data...

As linked, only some of the primitive values in javascript are immutable. So while you can be confident in this:

const x = 'a perfectly well formed string';

// lots of code in between

console.log(x) // will always print 'a perfectly well formed string'

You cannot be confident in this at all:

const x = {

  a: 'a perfectly well formed string within an object'
};

// lots of code in between

console.log(x.a) // absolutely no guarantee that x.a will point at the same string as at the time x was initialized as an object, or that the key will even exist (will fallback to the type 'undefined' if you attempt to read it).

JavaScript has references and values like most languages, but you aren't dealing with memory as explicitly as Rust or C. The reason is in large part because memory is garbage collected in Javascript.

[+] tomsmeding|8 years ago|reply
If we have the same concept of "value" in Javascript, then values are certainly not immutable. E.g.:

    const a = {x: 1, y: 2};
    console.log(a.x);  // 1
    a.x = 2;
    console.log(a.x);  // 2
Note that 'a' remained constant indeed, but the object it points to can certainly take on different values.
[+] VMG|8 years ago|reply
You are working under dangerously incorrect assumptions. JavaScript data structures are all mutable, except some primitives.
[+] dbaupp|8 years ago|reply
Yes, they do. The value of Rust is that it makes it safe to do: if there's a reference (Rust's safe "pointer" type) to something, it can't change or move in surprising ways.
[+] skybrian|8 years ago|reply
As a non-Rust programmer, I'm finding the memory-mapped data example to be very opaque. Does anyone care to explain it?
[+] grayrest|8 years ago|reply
It's a contrived example and doesn't make a whole lot of sense aside from demonstrating what he means by handle. I'm not an expert but I'll have a go at explaining it.

Start at the `Data` struct. It contains a Copy on Write (`CoW`) reference to a vector of bytes (`u8`) with a lifetime labeled `'a`. This is the Handle for the data. You get one by calling `Data::new` and passing in something that can be converted to the CoW.

The example is hard coded to work with a vector of u32s (driven by the `Slice<u32>` in `Header`). To use it, you'd call `get_target` with an index and get a u32 back. The other methods on data are doing the pointer math (offset) and casting (`transmute`, `from_raw_parts`) the byte array into a slice of u32s in a safe way.

I don't see anything verifying that the byte array passed in is, in fact, a bunch of u32s so I assume that's a given.

[+] joeconway|8 years ago|reply
Thank you Armin. Your rust work for sentry has been a great primer in the language for me.
[+] glenjamin|8 years ago|reply
The semantics of the final example sounds a lot like the concept of an Atom in clojure - https://clojure.org/reference/atoms

Is this swap/deref pattern something that can or should be wrapped up into a create?

[+] mwcampbell|8 years ago|reply
Given the "things move" point, would it be feasible to use a compacting memory manager with Rust, e.g. for memory-constrained applications?
[+] kibwen|8 years ago|reply
Rust allows you to take interior pointers to things (important for performance), which precludes the ability to move objects within memory at random. But for "memory-constrained" applications like embedded devices/microcontrollers, fragmentation isn't a problem in the first place because you often don't have a heap. For long-running programs that do have heaps, picking a modern memory allocator (jemalloc, tcmalloc, et al) will go a long way towards reducing fragmentation. And if you really need compaction, you could probably design a Rust library to provide it for certain types (though the operations it could provide would likely be restricted).
[+] smaddox|8 years ago|reply
If you're memory constrained, why would you use a GC? Just use manual memory arenas. They're trivially simple once you've seen how to use them.

Unfortunately, Rust currently requires breaking some conventions and using unsafe quite a bit to do this without overflowing the stack, but it's just an extra keyword or two compared to C, and the safety guarantees outside of the unsafe code make up for it.