When const generics are fully implemented (they are partially usable on nightly rust now) this will enable an even stronger version of this pattern, allowing you the full power of enums to represent the state.
One can then give the individual states extra parameters:
HasSentNumber {
number: u32
}
(Note that in this case that doesn't make much sense, the number that is sent is more associated data then an actual type parameter. There is no real difference, at the type level, between HasSentNumber { number: 3 } and HasSentNumber { number: 6 }, and the compiler generating two types for this would be unnecessary. It is only an example of the syntax.)
You can do session types already in stable Rust without const generics. Each "enum variant" can be its own separate type, and such types can only be instantiated by a method on the type representing the previous state, which also consumes that previous state.
We've used them in our Bulletproofs MPC (multi-party computation) where cryptographic requirement is not to replay the protocol from the mid-point. Since the user is supposed to perform these transitions within their application on top of network messages, such strongly-typed API guarantees that, if their program has compiled successfully, then:
1) Steps are performed in correct order.
2) The protocol cannot be replayed from the intermediate state.
Re: associated data for a state, that's possible too!
trait StateData {
type Data;
}
struct Sender<const S: SenderState> {
... // sender implementation
state_data: <Self as StateData>::Data
}
impl<const S: SenderState> StateData for Sender<{S}> {
// default: no associated data
default type Data = ();
}
struct HasSentNumberData {
sent: u32
}
// using a specialized impl (also an unstable feature).
impl StateData for Sender<{SenderState::HasSentNumber}> {
// instead of a complete struct we could also have done Data = u32,
// but a separate struct would enable a nicer constructor, default values, helper methods and so on.
// (ommitted in this example though)
type Data = HasSentNumberData;
}
struct HasReceivedNumberData {
sent: u32,
received: u32
}
impl StateData for Sender<{SenderState::HasReceivedNumber}> {
type Data = HasReceivedNumberData;
}
A good sign in a type system for me is that the types are accurately modeling behavior that I find in day to day code. To the point where moving back to a language that lacks this feature feels weirdly unsafe or not expressive enough. Going back to C APIs, with their states that need to be read via bit masks or through special integer return values, is just painful after using discriminated unions. Likewise being able to have multiple owners to a value is a little weird after using Rust.
Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.
We do something like this to encode our Redux states at compile-time (using TypeScript, obviously). Previously, using regular JS, the Redux devtools made tracking down incorrectly implemented reducers/state transition reasonably straightforward -- but you still had to trigger a bug before you knew you had to track it down, and implement tests etc.
This kind of design pattern measurably saves us time; it reduces the volume of unit tests we need to write/update when we make changes (we still test, just... with less fear), and it prevents newer developers from making mistakes. I haven't played with Rust (beyond a couple of toy projects) yet, and articles like this remind me I'm looking forward to sinking my teeth in over the Christmas break.
Cool! Are you referring to the phantom types technique? I am interested in how you implement this in typescript. Do you have a blog post about it you can refer me to? Or even just a gist?
This is one of my favorite patterns from Haskell, which I've known as "type-level programming". One of the most common examples is a "sized vector"[1] which allows compile-time bounds checking. This is achieved by annotating each vector type with a phantom type variable representing its size.
Nice to see something like this in Rust! One thing that's a bit of a bummer, and I'm sure there are very good reasons for this, is that we HAVE to use every type argument of struct in its definition. If this restriction were to be relaxed, we wouldn't need the "state" field in the struct at all, and we could make the state type variable truly "phantom".
No. Singletons in Haskell are the single most over-engineered and ugliest thing I've seen. If you are going this route, you might as well use a real dependently typed language like Agda or Coq, and the latter supports extracting simple Haskell code from your Coq code while keeping your proofs in Coq.
Indexed types have their uses but just think about this: for your length-indexed vector example, what if you want to read a file and put its lines into such a vector? Your return type needs the length but it's not known until runtime. Okay you use existential quantification, but you now also need a runtime representation of your types. That's done through singletons. But that destroys Haskell's phase separation and means the inefficient ways you use to prove things and encode naturals now leak to the runtime! Have fun with your unary numbers then.
Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.
> Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.
Not soundly. Such static analysis can be trivially defeated using e.g. virtual methods.
Basically, they are using Rust to encode a state machine using types. This is brilliant! The nice thing is that the transitions are determined at compile time so there can be performance benefit as well as compile time correctness testing.
I did not contribute to that library, but it's nice to see a practical implementation. They use the same approach, essentially, as I describe in my blog.
Maybe not everyone will know this but Typestate was one of the original 'headline features' of Rust when Graydon Hoare started it. It was based on the Strom/Yemeni paper[0] for the NIL language. There's a mention of in this SO answer from 2010[1] and in the LtU discussion[2].
I'm not sure why it was de-emphasised, I think it didn't work as well as anticipated in the 'real world'
Scala/Haskell has the same thing, and it's an amazing feature. The proper type is that of
trait IndexedStateT[A, B, C]
Which signifies a typelevel state machine moving from state A to state B emitting a value of type C.
I can only speak for scala, but i'm assuming haskell has singleton and literal types as well. Meaning that code like this works great.
object DoorOpen
object DoorClosed
class Door {
def open: IndexedState[DoorClosed.type, DoorOpen.type, Unit]
def close: IndexedState[DoorOpen.type, DoorClosed.type, Unit]
}
val d = Door()
for {
_ <- d.open() //works
_ <- d.close() // works
// _ <- d.close() //compile error
}
By my understanding of the article, it uses the borrow/move state to implement the state transistion. Is this generalizable to arbitrary state machines, or only a simple 2-state one?
It generalizes to arbitrary state machines- check out the reference near the end to session types.
Each state is a type, methods consume their receiver (the "move state") and return the new state. You can't accidentally keep around a copy of the old state.
This is a really useful pattern when you want to have rigidly defined/enforced state transitions to ensure that the data is never in an invalid state for a given operation.
It's pretty awful to deal with when you're unsure of what the state machine should look like, or if there needs to be a lot more flexibility in how the data is accessed. Maintainability nightmare.
An example of this I ran into is a data processing pipeline architecture where each vertex of the processing graph had a processing function called in a loop on its own dedicated thread. Using the type state pattern helped clearly define the "life cycle" of each vertex and enforce it, which provided for some powerful synchronization guarantees (e.g. we could provide some elements of memory safety even when loading things through shared libraries). If you dug into it you could break things, but that would be more work than just following the pattern.
Are there other languages that can do the trick done with "close()" on the file there? A type with a method that can make the type compile time unusable after being called!
Any dependently typed language worth its salt will be able to model a state machine in its type system.
Haskell can do it with some effort, maybe for simple stuff phantom types and GADTs are enough. With linear types the explicit "consumption" can be modeled.
Most languages have some variant of this where close is equal to variable going unaccessible. C# IDisposable, Python with-statement, Java try/resources, C++ RAII destructors, etc.
None are really as powerful but they come close and cover many use cases.
> 2. we have seeked in a file that was already closed.
> The second error, however, is much harder to catch. Most programming languages support the necessary features to make this error hard, typically by closing the file upon destruction or at the end of a higher-order function call, but the only non-academic language that I know of that can actually entirely prevent that error is Rust.
Why not simply add an `is_closed` flag and throw an error if it is?
You should probably have read the rest of the blog post, because in Rust it is a compiler error to not test the result, and the post specifically calls this out. Specifically, look for this section:
let mut my_file = MyFile::open(path)?;
// Note the `?` above. It's a simple operator that asks
// the compiler to check whether the operation succeeded.
// The *only* way to obtain a `MyFile` object is to
// have a successful `MyFile::open`.
// At this line, `my_file` is a `MyFile`, which means
// that we may use it.
> it would be nice if there was something like Python's with statement to correctly close a file.
In C++ or Rust, non-memory resource management is arguably even easier than Python.
RAII: Resource Acquisition Is Initialization.
(Apologies for the C++; can do the same thing in Rust.)
ManagedFile myfile("path.txt");
myfile.read();
`myfile` gets released whenever the scope finishes.
That said, the C++ and Rust stdlibs don't look like that because opening and closing files is not exception-free and unlike Python, C++ and Rust don't have or don't prefer non-explicit error handling.
1) A typesafe variant of this example, caught by the compiler, is what he implements further down
2) There is. Rust has the Drop trait, which guarantees destruction when going out of scope. Even better than `with` statements since the API doesn't require anything special from the user.
`open` is a static function on the `File` object and an instance of `File` does not have a `close` method at all. So, as in this article, an instance of `File` can only be created if `open` succeeded and the file will be closed when dropped (either implicitly or explicitly). For example:
fn main() -> std::io::Result<()> {
// a file object is only created if File::create is successful
// otherwise main returns an error
let mut file = File::create("foo.txt")?;
file.write_all(b"Hello, world!")?;
Ok(())
} // file closed here, no close method needed.
This contrasts with many other languages where a class instance can be in an invalid state. This is something the typestate pattern helps to avoid.
In Rust this is achieved using `if let` for anything that uses `Result` and `Drop` (it's kinda neat how a few orthogonal features fit nicely together like that):
if let Ok(f) = File::open(filename) {
f.read();
// drop(f) automatic here
}
But Rust's move semantics give you extra flexibility, because you don't have to close it if you don't want to.
let keep = None;
if let Ok(f) = File::open(filename) {
f.read();
if random() {
keep = Some(f);
}
}
// the file *may* be usable beyond the first scope,
// and it's still dropped correctly.
Values aren't dropped simply at the end of their initial scope (as in `with` or on-stack RAII), but after their last use, and language semantics allow the compiler to track globally where the last use is.
When you get a result type back in Rust you have to check it to use the value it contains. The worst you can do is call unwrap(), which says "Im assuming this will not fail, panic if I am wrong". But you can't forget to check.
RE 1: Several languages (including Rust!) have annotations to force you to test the result code. However, they generally leave my_file in scope even in the error path, which makes bugs easier to write, and which means the File type is forced to handle all the complexity of handling the case where the file wasn't successfully loaded for it's entire API surface.
RE 2: Rust takes the C++ approach of allowing types to define their own cleanup which gets auto-invoked when the object goes out of scope, instead of forcing the calling code to worry about it. Specifically, Rust types can implement the auto-invoked "Drop" trait, basically equivalent to C++'s destructors. You only need to call close if you want to close a file early (e.g. before the end of the scope), in Rust, which is fairly rare.
I vastly prefer this approach, but it admittedly doesn't play nicely with the... less deterministic lifetimes of objects in a garbage collected system, so I can understand why Python, C#, etc. have more explicit scope syntax for cleanup.
They are similar to Kotlin's sealed classes, except there's no concept in Kotlin of an object "consuming" itself, which is necessary for maintaining safety in the API.
For example, in Kotlin, if a method on state A returns state B, there's no way to invalidate all existing references to state A. Normally this invariant would be enforced on the caller's side, or perhaps by throwing an IllegalStateException if state A is called after producing state B.
[+] [-] cdirkx|6 years ago|reply
[+] [-] oleganza|6 years ago|reply
We've used them in our Bulletproofs MPC (multi-party computation) where cryptographic requirement is not to replay the protocol from the mid-point. Since the user is supposed to perform these transitions within their application on top of network messages, such strongly-typed API guarantees that, if their program has compiled successfully, then:
1) Steps are performed in correct order. 2) The protocol cannot be replayed from the intermediate state.
We have wrote about it here: https://medium.com/interstellar/bulletproofs-pre-release-fcb... - scroll to "strongly-typed multiparty computation".
[+] [-] cdirkx|6 years ago|reply
[+] [-] Munksgaard|6 years ago|reply
[+] [-] dingoegret|6 years ago|reply
[deleted]
[+] [-] _hardwaregeek|6 years ago|reply
Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.
[+] [-] reificator|6 years ago|reply
I'm not seeing a ligature there, or at least not an obvious one. What browser/platform are you using?
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] frio|6 years ago|reply
This kind of design pattern measurably saves us time; it reduces the volume of unit tests we need to write/update when we make changes (we still test, just... with less fear), and it prevents newer developers from making mistakes. I haven't played with Rust (beyond a couple of toy projects) yet, and articles like this remind me I'm looking forward to sinking my teeth in over the Christmas break.
[+] [-] amykyta|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] dmkolobov|6 years ago|reply
Nice to see something like this in Rust! One thing that's a bit of a bummer, and I'm sure there are very good reasons for this, is that we HAVE to use every type argument of struct in its definition. If this restriction were to be relaxed, we wouldn't need the "state" field in the struct at all, and we could make the state type variable truly "phantom".
[1] https://www.schoolofhaskell.com/user/konn/prove-your-haskell...
EDIT: Typos.
[+] [-] kccqzy|6 years ago|reply
Indexed types have their uses but just think about this: for your length-indexed vector example, what if you want to read a file and put its lines into such a vector? Your return type needs the length but it's not known until runtime. Okay you use existential quantification, but you now also need a runtime representation of your types. That's done through singletons. But that destroys Haskell's phase separation and means the inefficient ways you use to prove things and encode naturals now leak to the runtime! Have fun with your unary numbers then.
[+] [-] choudanu4|6 years ago|reply
https://doc.rust-lang.org/nomicon/phantom-data.html
https://doc.rust-lang.org/std/marker/struct.PhantomData.html
So
I believe could become I've never used this PhantomData personally, so this might be wrong. Cheers![+] [-] likeliv|6 years ago|reply
[+] [-] nitnelave|6 years ago|reply
Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.
[+] [-] pcwalton|6 years ago|reply
Not soundly. Such static analysis can be trivially defeated using e.g. virtual methods.
[+] [-] RcouF1uZ4gsC|6 years ago|reply
Really neat techniques!
[+] [-] cies|6 years ago|reply
Here another article, but with a more classical state machine impl.
[+] [-] reificator|6 years ago|reply
https://insanitybit.github.io/2016/05/30/beyond-memory-safet...
[+] [-] staticassertion|6 years ago|reply
https://github.com/jonhoo/rust-imap
I did not contribute to that library, but it's nice to see a practical implementation. They use the same approach, essentially, as I describe in my blog.
[+] [-] pedrow|6 years ago|reply
[0]: http://www.cs.cmu.edu/~aldrich/papers/classic/tse12-typestat...
[1]: https://stackoverflow.com/questions/3210025/what-is-typestat...
[2]: http://lambda-the-ultimate.org/node/4009
[+] [-] dhash|6 years ago|reply
I can only speak for scala, but i'm assuming haskell has singleton and literal types as well. Meaning that code like this works great.
By my understanding of the article, it uses the borrow/move state to implement the state transistion. Is this generalizable to arbitrary state machines, or only a simple 2-state one?[+] [-] Rusky|6 years ago|reply
Each state is a type, methods consume their receiver (the "move state") and return the new state. You can't accidentally keep around a copy of the old state.
[+] [-] unlinked_dll|6 years ago|reply
It's pretty awful to deal with when you're unsure of what the state machine should look like, or if there needs to be a lot more flexibility in how the data is accessed. Maintainability nightmare.
An example of this I ran into is a data processing pipeline architecture where each vertex of the processing graph had a processing function called in a loop on its own dedicated thread. Using the type state pattern helped clearly define the "life cycle" of each vertex and enforce it, which provided for some powerful synchronization guarantees (e.g. we could provide some elements of memory safety even when loading things through shared libraries). If you dug into it you could break things, but that would be more work than just following the pattern.
[+] [-] gameswithgo|6 years ago|reply
that is pretty neat.
[+] [-] stouset|6 years ago|reply
[+] [-] nimish|6 years ago|reply
Haskell can do it with some effort, maybe for simple stuff phantom types and GADTs are enough. With linear types the explicit "consumption" can be modeled.
[+] [-] Too|6 years ago|reply
None are really as powerful but they come close and cover many use cases.
[+] [-] Sharlin|6 years ago|reply
[+] [-] aliceryhl|6 years ago|reply
[+] [-] amedvednikov|6 years ago|reply
> The second error, however, is much harder to catch. Most programming languages support the necessary features to make this error hard, typically by closing the file upon destruction or at the end of a higher-order function call, but the only non-academic language that I know of that can actually entirely prevent that error is Rust.
Why not simply add an `is_closed` flag and throw an error if it is?
[+] [-] msl|6 years ago|reply
That's a way of doing it at runtime. The article describes catching the error at compile time.
[+] [-] aliceryhl|6 years ago|reply
[+] [-] justryry|6 years ago|reply
I do wish that we had reliable RVO so that this could come at zero cost.
[+] [-] nixpulvis|6 years ago|reply
[+] [-] jbritton|6 years ago|reply
my_file.open(); // Error: this may fail.
1) If this can fail, then it should be a compile error to not test the result code.
2) IMHO it would be nice if there was something like Python's with statement to correctly close a file.
This prevents trying to close a file that is not opened.The idea of encoding a state machine into the types seems interesting.
[+] [-] andolanra|6 years ago|reply
[+] [-] james-mcelwain|6 years ago|reply
All resources in Rust are implicitly like your Python with example -- when they go out of scope and destructors run, the file will be closed.
[+] [-] paulddraper|6 years ago|reply
In C++ or Rust, non-memory resource management is arguably even easier than Python.
RAII: Resource Acquisition Is Initialization.
(Apologies for the C++; can do the same thing in Rust.)
`myfile` gets released whenever the scope finishes.That said, the C++ and Rust stdlibs don't look like that because opening and closing files is not exception-free and unlike Python, C++ and Rust don't have or don't prefer non-explicit error handling.
[+] [-] strictfp|6 years ago|reply
[+] [-] ChrisSD|6 years ago|reply
`open` is a static function on the `File` object and an instance of `File` does not have a `close` method at all. So, as in this article, an instance of `File` can only be created if `open` succeeded and the file will be closed when dropped (either implicitly or explicitly). For example:
This contrasts with many other languages where a class instance can be in an invalid state. This is something the typestate pattern helps to avoid.[+] [-] pornel|6 years ago|reply
[+] [-] gameswithgo|6 years ago|reply
[+] [-] MaulingMonkey|6 years ago|reply
RE 2: Rust takes the C++ approach of allowing types to define their own cleanup which gets auto-invoked when the object goes out of scope, instead of forcing the calling code to worry about it. Specifically, Rust types can implement the auto-invoked "Drop" trait, basically equivalent to C++'s destructors. You only need to call close if you want to close a file early (e.g. before the end of the scope), in Rust, which is fairly rare.
I vastly prefer this approach, but it admittedly doesn't play nicely with the... less deterministic lifetimes of objects in a garbage collected system, so I can understand why Python, C#, etc. have more explicit scope syntax for cleanup.
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] caleb-allen|6 years ago|reply
[+] [-] james-mcelwain|6 years ago|reply
For example, in Kotlin, if a method on state A returns state B, there's no way to invalidate all existing references to state A. Normally this invariant would be enforced on the caller's side, or perhaps by throwing an IllegalStateException if state A is called after producing state B.
[+] [-] Congeec|6 years ago|reply
[+] [-] ryuukk_|6 years ago|reply
[deleted]