Rust lifetimes: Getting away with things that would be reckless in C++

[+] missblit|11 years ago|reply

In C++ if the string is a rvalue reference you could std::move it into part of the return value. Think a signature like

    template<typename T>
    std::pair<std::string, std::vector<std::string_view>>
    tokenize_string(T &&str);

This would be efficient when the user passes a temporary, and it would be safe.

Which isn't to say the Rust solution isn't totally cool. Being able to easily check this class of errors at compile time is probably a lot nicer than needing to learn all the relatively complicated parts that would go into a easy to use / safe /efficient / still slightly weird C++ solution.

[+] eridius|11 years ago|reply

Sure, and then somewhere along the way you throw away the first element of the pair, because you're not using it, but you're still using the views, and oops you just reintroduced the bug you tried to fix.

Which is to say, yes, you can obviously write C++ code that works. But you run the risk that one tiny mistake, or a change weeks, months, or years later, causes memory issues. Being able to completely rule out this class of error at compile-time is really amazingly useful.

[+] pfultz2|11 years ago|reply

Actually, it should be written more like this:

    template<typename T>
    std::pair<T, std::vector<std::string_view>>
    tokenize_string(T &&str);

So then it uses `std::forward` to conditionally move the string for rvalues, and `T` will be a reference to the original string for lvalues.

[+] svalorzen|11 years ago|reply

Or, you know, instead of returning two C pointers which in modern C++ makes no sense, return a vector of `std::pair<size_t,size_t>` with position and length of each substring, and if needed use `std::string::substr` to extract the parts you need.

[+] ekidd|11 years ago|reply

(Original poster here.)

Yes, returning indexes is also an excellent solution in this particular case, because it forces the programmer to keep the underlying string around. And you probably can save memory by using a pair<uint32,uint32>.

Still, as several other people have pointed out, the truly idiomatic C++ solutions here are things like sub-ranges and string views. And these all use pointers internally. I deliberately chose to use an pair of pointers in place of these idiomatic types because it was easier to explain what was happening, not because I think C++ programmers should mess around with bare pointers.

What I like about the Rust version is that I can work directly with low-level types in high-performance code, and the compiler still has my back.

[+] expr-|11 years ago|reply

I would say returning regular pointers is more C++-esque than your handmade range implementation. Pointers are, after all, a kind of iterator, which is an essential C++ concept. (std::string::iterator is a handful.)

(Furthermore, there is nothing "C" or wrong with regular pointers. Their only "flaw" is that they can't manage an object, but they're still the semantical way to refer to one.)

[+] dbaupp|11 years ago|reply

I would think that (in C++14 at least), std::string_view would be more appropriate than both?

http://en.cppreference.com/w/cpp/header/experimental/string_...

[+] robmccoll|11 years ago|reply

If you really wanted the pointers, you could also just make your own class that contains the pointer vector and a copy of the entire string which would be slightly less costly than new strings for each token.

[+] akavel|11 years ago|reply

This would have advantage in case you only needed a few results; but in case you needed most of them, then substr would practically result in duplicating the memory anyway. Plus, this is sure more cumbersome than the Rust version.

[+] bsaul|11 years ago|reply

Which makes me wonder :

1/ could you build the same unsafe behiavor in Rust if you wanted to by not specifying lifetime constraints ?

2/ If yes, shouldn't lifetime constraints be mandatory ?

[+] dbaupp|11 years ago|reply

The compiler will not compile code it cannot verify as safe (outside an `unsafe` block), meaning it will complain about lifetime constraints that are missing, e.g. it is a compile-time error to not propagate the lifetime information out of the enum.

http://play.rust-lang.org/?run=1&code=enum%20Token%20{%0A%20...

[+] nikic|11 years ago|reply

The lifetime constraint in the enum is mandatory, the code won't compile without it.

The tokenization function itself doesn't have to explicitly specify lifetimes like it does in the blog post. It would also work without them and Rust would still ensure that you don't end up with a use-after-free.

[+] asuffield|11 years ago|reply

There's an obvious extension here for lifetime inference - the example given doesn't need to be an error, it could compile correctly by increasing the object lifetime to the outer block. I don't know offhand whether there is a universally correct inference algorithm for that (if every other language feature was static then unification would solve it easily, but the other language features are not static and I don't know how it would interact with rust type inference).

[+] Drakim|11 years ago|reply

In general I prefer the compiler to yell at me rather than to secretly help me behind the scenes. Otherwise, I don't learn anything and I get burned later on when there is a situation where the safeguard can't help me.

[+] hamstergene|11 years ago|reply

If the object has destructor with side effects, I believe that would massively complicate reasoning about program's behavior for human who's reading it.

[+] fanf2|11 years ago|reply

That is essentially what region inference in the MLKit compiler does. http://www.elsman.com/mlkit/

It has some unexpected properties: it can sometimes produce dangling pointers because it knows the target data will never be used, even though a GC would treat it as live. And on the other hand it will often promote data to an excessively long-lived region (because region lifetimes have to be nested). So current versions of the MLKit use GC as well as region inference.

[+] riffraff|11 years ago|reply

wouldn't extending the lifetime in this situations lead to subtle hard to trace memory leaks?

[+] enjoy-your-stay|11 years ago|reply

In C++, the best way to hand out pointers to anything where the creator may not necessarily be the last one referencing that object or chunk of RAM is to use reference counting, which would have solved the posters' problem.

It would mean that you would have to wrap the incoming string in a class, and probably add the tokenize_string method to that class. Then you would also have to wrap the results vector in a class that then addrefs the original string wrapper class.

But after that, handing out pointers to the contents of the string would be no problem as the results class would addref the string class and then release it when done ensuring that the string wrapper class remains alive as long as the results object has not gone out of scope.

Of course Rust's approach of alerting you when your code path causes dangling pointers is also interesting, but I wonder how that would work if you were to link against a static library that handed out references to internal objects like that - could the compiler see the scoping problem?

[+] keeperofdakeys|11 years ago|reply

Just as an aside, the &str is not stored as two pointers, but a pointer and a length.

[+] shmerl|11 years ago|reply

> The function get_input_string returns a temporary string, and tokenize_string2 builds an array of pointers into that string. Unfortunately, the temporary string only lives until the end of the current expression, and then the underlying memory is released. And so all our pointers in v now point into oblivion

So what stops you from returning a shared pointer in case of get_input_string? Then take over that ownership and use it. It's still a potential problem that v is logically disconnected from lifetime of that pointer, but at least you could avoid the problem you described.

[+] unknown|11 years ago|reply

[deleted]

[+] unknown|11 years ago|reply

[deleted]

[+] overgard|11 years ago|reply

This seems like the kind of place where std::shared_ptr would really shine. The author's point on the danger of pointers is well taken, but some of the new pointer types get around a lot of these issues. You couldn't use it to point into the middle of the string, but if you paired it with some offsets you wouldn't have to worry about the ownership of the pointer anymore.

[+] dbaupp|11 years ago|reply

shared_ptr doesn't fix this, as soon as you access the string via any standard type (char* or string_view) you can copy that type around freely, disconnected from the shared_ptr. Hence, you can drop all the shared_ptrs and leave the string view dangling.

[+] pcwalton|11 years ago|reply

std::shared_ptr has (atomic!) reference counting overhead.

[+] GoGolli|11 years ago|reply

Rust is the best complicated language I have seen!!!!!!

[+] linguafranca|11 years ago|reply

I'm hearing an awful lot about Rust on HN, even though afaict it still does't have a basic http package yet, limiting the main types of apps I would build with it. Maybe I'm in the minority, but perhaps we can slow down on Rust news until it's a little closer to usable?

[+] Iftheshoefits|11 years ago|reply

I understand there are a lot of people on HN with "web goggles" and a severe case of "all development is web development" myopia, but seriously this is just over the top. I'm a "C++ guy" (that is, I like writing programs using C++ and probably always will, even if I use others from time to time), but I would never object to a language like Rust on the basis that it lacks an http package. HTTP is a high level communication protocol. It isn't the only such, certainly not the most efficient, and definitely not even the best. To bash on a language for lack of "native" support for http is just a bit ridiculous.

[+] Jweb_Guru|11 years ago|reply

While I'm certainly not going to say Rust is ready for use in production or anything like that, I'll have to dispute your points here:

(1) It has at least three HTTP packages that I'm aware of (not finished ones, admittedly).

(2) While perhaps for you lacking an HTTP package is a dealbreaker, many of the areas Rust is targetting (such as embedded) don't require it at all.

(3) Why should people stop talking about a language because it hasn't been released yet, anyway? I'm not sure I follow the logic here.

[+] wismer|11 years ago|reply

I like reading this stuff. Sure, I may not use Rust for any real application yet, but it's been teaching me a whole lot on different paradigms than I am used to. I think these articles are great because it pushes boundaries in understanding. And this is a science we are talking about - experimentation is a lot of fun!

[+] jey|11 years ago|reply

Maybe there's just enough people interested in types of software that don't involve making HTTP requests. Programs that make REST requests is only one tributary within the taxonomy of software.

[+] nicklaforge|11 years ago|reply

> slow down on Rust news until it's a little closer to usable?

"Usable" is an ironic way to spell "hackable".

You mentioned the words "news" and "use". Does the H in HN still stand for something? Maybe HN stands for "HTTP News"?

[+] adamnemecek|11 years ago|reply

One way a new FOSS platform can become more usable is by attracting new contributors. If a new platform is talked about, it will attract new contributors. Ergo by being talked about, a new platform becomes more usable.

[+] Yardlink|11 years ago|reply

Is there are reason this language exists? They're solving a problem that's been solved many times over for at least 2 decades in the form of managed languages.

[+] mattdw|11 years ago|reply

Because Rust's "managed" benefits are compiled away, they have zero runtime impact. Meaning that there's no reason it couldn't be as fast as C/C++ for all workloads. So you get a safe language with bare-metal performance. (Bare metal like, people have written basic OS kernels in it.)

[+] ludamad|11 years ago|reply

Actually, many people have been waiting long for such a language to come out, and are very excited about Rust for good reasons.

Managed languages solve a large set of problems, but introduce two big problems as I see it:

- Cause nondeterministic overhead which is particularly problematic for kernels, games, and software with heaps over, say, 10 gigabytes.

- Limit ability to safely pass objects outside their ecosystem. How many libraries written in managed code are worth using outside of the language they are written in?

[+] wofo|11 years ago|reply

> They're solving a problem that's been solved many times over for at least 2 decades in the form of managed languages.

With the overhead of GC...

[+] pjmlp|11 years ago|reply

Because strong type alternatives to C and C++, like Modula-2, Modula-3, Ada, Turbo Pascal, Delphi, ... faded away while UNIX conquered the enterprise.

Now to get back what was lost in terms of safety, new languages are required, as the old ones are considered legacy by the average Joe/Jane coder.

90 comments