This would be efficient when the user passes a temporary, and it would be safe.
Which isn't to say the Rust solution isn't totally cool. Being able to easily check this class of errors at compile time is probably a lot nicer than needing to learn all the relatively complicated parts that would go into a easy to use / safe /efficient / still slightly weird C++ solution.
Sure, and then somewhere along the way you throw away the first element of the pair, because you're not using it, but you're still using the views, and oops you just reintroduced the bug you tried to fix.
Which is to say, yes, you can obviously write C++ code that works. But you run the risk that one tiny mistake, or a change weeks, months, or years later, causes memory issues. Being able to completely rule out this class of error at compile-time is really amazingly useful.
Or, you know, instead of returning two C pointers which in modern C++ makes no sense, return a vector of `std::pair<size_t,size_t>` with position and length of each substring, and if needed use `std::string::substr` to extract the parts you need.
Yes, returning indexes is also an excellent solution in this particular case, because it forces the programmer to keep the underlying string around. And you probably can save memory by using a pair<uint32,uint32>.
Still, as several other people have pointed out, the truly idiomatic C++ solutions here are things like sub-ranges and string views. And these all use pointers internally. I deliberately chose to use an pair of pointers in place of these idiomatic types because it was easier to explain what was happening, not because I think C++ programmers should mess around with bare pointers.
What I like about the Rust version is that I can work directly with low-level types in high-performance code, and the compiler still has my back.
I would say returning regular pointers is more C++-esque than your handmade range implementation. Pointers are, after all, a kind of iterator, which is an essential C++ concept. (std::string::iterator is a handful.)
(Furthermore, there is nothing "C" or wrong with regular pointers. Their only "flaw" is that they can't manage an object, but they're still the semantical way to refer to one.)
If you really wanted the pointers, you could also just make your own class that contains the pointer vector and a copy of the entire string which would be slightly less costly than new strings for each token.
This would have advantage in case you only needed a few results; but in case you needed most of them, then substr would practically result in duplicating the memory anyway. Plus, this is sure more cumbersome than the Rust version.
The compiler will not compile code it cannot verify as safe (outside an `unsafe` block), meaning it will complain about lifetime constraints that are missing, e.g. it is a compile-time error to not propagate the lifetime information out of the enum.
The lifetime constraint in the enum is mandatory, the code won't compile without it.
The tokenization function itself doesn't have to explicitly specify lifetimes like it does in the blog post. It would also work without them and Rust would still ensure that you don't end up with a use-after-free.
There's an obvious extension here for lifetime inference - the example given doesn't need to be an error, it could compile correctly by increasing the object lifetime to the outer block. I don't know offhand whether there is a universally correct inference algorithm for that (if every other language feature was static then unification would solve it easily, but the other language features are not static and I don't know how it would interact with rust type inference).
In general I prefer the compiler to yell at me rather than to secretly help me behind the scenes. Otherwise, I don't learn anything and I get burned later on when there is a situation where the safeguard can't help me.
If the object has destructor with side effects, I believe that would massively complicate reasoning about program's behavior for human who's reading it.
It has some unexpected properties: it can sometimes produce dangling pointers because it knows the target data will never be used, even though a GC would treat it as live. And on the other hand it will often promote data to an excessively long-lived region (because region lifetimes have to be nested). So current versions of the MLKit use GC as well as region inference.
In C++, the best way to hand out pointers to anything where the creator may not necessarily be the last one referencing that object or chunk of RAM is to use reference counting, which would have solved the posters' problem.
It would mean that you would have to wrap the incoming string in a class, and probably add the tokenize_string method to that class. Then you would also have to wrap the results vector in a class that then addrefs the original string wrapper class.
But after that, handing out pointers to the contents of the string would be no problem as the results class would addref the string class and then release it when done ensuring that the string wrapper class remains alive as long as the results object has not gone out of scope.
Of course Rust's approach of alerting you when your code path causes dangling pointers is also interesting, but I wonder how that would work if you were to link against a static library that handed out references to internal objects like that - could the compiler see the scoping problem?
> The function get_input_string returns a temporary string, and tokenize_string2 builds an array of pointers into that string. Unfortunately, the temporary string only lives until the end of the current expression, and then the underlying memory is released. And so all our pointers in v now point into oblivion
So what stops you from returning a shared pointer in case of get_input_string? Then take over that ownership and use it. It's still a potential problem that v is logically disconnected from lifetime of that pointer, but at least you could avoid the problem you described.
This seems like the kind of place where std::shared_ptr would really shine. The author's point on the danger of pointers is well taken, but some of the new pointer types get around a lot of these issues. You couldn't use it to point into the middle of the string, but if you paired it with some offsets you wouldn't have to worry about the ownership of the pointer anymore.
shared_ptr doesn't fix this, as soon as you access the string via any standard type (char* or string_view) you can copy that type around freely, disconnected from the shared_ptr. Hence, you can drop all the shared_ptrs and leave the string view dangling.
I'm hearing an awful lot about Rust on HN, even though afaict it still does't have a basic http package yet, limiting the main types of apps I would build with it. Maybe I'm in the minority, but perhaps we can slow down on Rust news until it's a little closer to usable?
I understand there are a lot of people on HN with "web goggles" and a severe case of "all development is web development" myopia, but seriously this is just over the top. I'm a "C++ guy" (that is, I like writing programs using C++ and probably always will, even if I use others from time to time), but I would never object to a language like Rust on the basis that it lacks an http package. HTTP is a high level communication protocol. It isn't the only such, certainly not the most efficient, and definitely not even the best. To bash on a language for lack of "native" support for http is just a bit ridiculous.
I like reading this stuff. Sure, I may not use Rust for any real application yet, but it's been teaching me a whole lot on different paradigms than I am used to. I think these articles are great because it pushes boundaries in understanding. And this is a science we are talking about - experimentation is a lot of fun!
Maybe there's just enough people interested in types of software that don't involve making HTTP requests. Programs that make REST requests is only one tributary within the taxonomy of software.
One way a new FOSS platform can become more usable is by attracting new contributors. If a new platform is talked about, it will attract new contributors. Ergo by being talked about, a new platform becomes more usable.
Is there are reason this language exists? They're solving a problem that's been solved many times over for at least 2 decades in the form of managed languages.
Because Rust's "managed" benefits are compiled away, they have zero runtime impact. Meaning that there's no reason it couldn't be as fast as C/C++ for all workloads. So you get a safe language with bare-metal performance. (Bare metal like, people have written basic OS kernels in it.)
Actually, many people have been waiting long for such a language to come out, and are very excited about Rust for good reasons.
Managed languages solve a large set of problems, but introduce two big problems as I see it:
- Cause nondeterministic overhead which is particularly problematic for kernels, games, and software with heaps over, say, 10 gigabytes.
- Limit ability to safely pass objects outside their ecosystem. How many libraries written in managed code are worth using outside of the language they are written in?
[+] [-] missblit|11 years ago|reply
Which isn't to say the Rust solution isn't totally cool. Being able to easily check this class of errors at compile time is probably a lot nicer than needing to learn all the relatively complicated parts that would go into a easy to use / safe /efficient / still slightly weird C++ solution.
[+] [-] eridius|11 years ago|reply
Which is to say, yes, you can obviously write C++ code that works. But you run the risk that one tiny mistake, or a change weeks, months, or years later, causes memory issues. Being able to completely rule out this class of error at compile-time is really amazingly useful.
[+] [-] pfultz2|11 years ago|reply
[+] [-] svalorzen|11 years ago|reply
[+] [-] ekidd|11 years ago|reply
Yes, returning indexes is also an excellent solution in this particular case, because it forces the programmer to keep the underlying string around. And you probably can save memory by using a pair<uint32,uint32>.
Still, as several other people have pointed out, the truly idiomatic C++ solutions here are things like sub-ranges and string views. And these all use pointers internally. I deliberately chose to use an pair of pointers in place of these idiomatic types because it was easier to explain what was happening, not because I think C++ programmers should mess around with bare pointers.
What I like about the Rust version is that I can work directly with low-level types in high-performance code, and the compiler still has my back.
[+] [-] expr-|11 years ago|reply
(Furthermore, there is nothing "C" or wrong with regular pointers. Their only "flaw" is that they can't manage an object, but they're still the semantical way to refer to one.)
[+] [-] dbaupp|11 years ago|reply
http://en.cppreference.com/w/cpp/header/experimental/string_...
[+] [-] robmccoll|11 years ago|reply
[+] [-] akavel|11 years ago|reply
[+] [-] bsaul|11 years ago|reply
1/ could you build the same unsafe behiavor in Rust if you wanted to by not specifying lifetime constraints ?
2/ If yes, shouldn't lifetime constraints be mandatory ?
[+] [-] dbaupp|11 years ago|reply
http://play.rust-lang.org/?run=1&code=enum%20Token%20{%0A%20...
[+] [-] nikic|11 years ago|reply
The tokenization function itself doesn't have to explicitly specify lifetimes like it does in the blog post. It would also work without them and Rust would still ensure that you don't end up with a use-after-free.
[+] [-] asuffield|11 years ago|reply
[+] [-] Drakim|11 years ago|reply
[+] [-] hamstergene|11 years ago|reply
[+] [-] fanf2|11 years ago|reply
It has some unexpected properties: it can sometimes produce dangling pointers because it knows the target data will never be used, even though a GC would treat it as live. And on the other hand it will often promote data to an excessively long-lived region (because region lifetimes have to be nested). So current versions of the MLKit use GC as well as region inference.
[+] [-] riffraff|11 years ago|reply
[+] [-] enjoy-your-stay|11 years ago|reply
It would mean that you would have to wrap the incoming string in a class, and probably add the tokenize_string method to that class. Then you would also have to wrap the results vector in a class that then addrefs the original string wrapper class.
But after that, handing out pointers to the contents of the string would be no problem as the results class would addref the string class and then release it when done ensuring that the string wrapper class remains alive as long as the results object has not gone out of scope.
Of course Rust's approach of alerting you when your code path causes dangling pointers is also interesting, but I wonder how that would work if you were to link against a static library that handed out references to internal objects like that - could the compiler see the scoping problem?
[+] [-] keeperofdakeys|11 years ago|reply
[+] [-] shmerl|11 years ago|reply
So what stops you from returning a shared pointer in case of get_input_string? Then take over that ownership and use it. It's still a potential problem that v is logically disconnected from lifetime of that pointer, but at least you could avoid the problem you described.
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] overgard|11 years ago|reply
[+] [-] dbaupp|11 years ago|reply
[+] [-] pcwalton|11 years ago|reply
[+] [-] GoGolli|11 years ago|reply
[+] [-] linguafranca|11 years ago|reply
[+] [-] Iftheshoefits|11 years ago|reply
[+] [-] Jweb_Guru|11 years ago|reply
(1) It has at least three HTTP packages that I'm aware of (not finished ones, admittedly).
(2) While perhaps for you lacking an HTTP package is a dealbreaker, many of the areas Rust is targetting (such as embedded) don't require it at all.
(3) Why should people stop talking about a language because it hasn't been released yet, anyway? I'm not sure I follow the logic here.
[+] [-] wismer|11 years ago|reply
[+] [-] jey|11 years ago|reply
[+] [-] nicklaforge|11 years ago|reply
"Usable" is an ironic way to spell "hackable".
You mentioned the words "news" and "use". Does the H in HN still stand for something? Maybe HN stands for "HTTP News"?
[+] [-] adamnemecek|11 years ago|reply
[+] [-] Yardlink|11 years ago|reply
[+] [-] mattdw|11 years ago|reply
[+] [-] ludamad|11 years ago|reply
Managed languages solve a large set of problems, but introduce two big problems as I see it:
- Cause nondeterministic overhead which is particularly problematic for kernels, games, and software with heaps over, say, 10 gigabytes.
- Limit ability to safely pass objects outside their ecosystem. How many libraries written in managed code are worth using outside of the language they are written in?
[+] [-] wofo|11 years ago|reply
With the overhead of GC...
[+] [-] pjmlp|11 years ago|reply
Now to get back what was lost in terms of safety, new languages are required, as the old ones are considered legacy by the average Joe/Jane coder.