top | item 21413174

Typestates in Rust (2018)

205 points| lwhsiao | 6 years ago |yoric.github.io | reply

73 comments

order
[+] cdirkx|6 years ago|reply
When const generics are fully implemented (they are partially usable on nightly rust now) this will enable an even stronger version of this pattern, allowing you the full power of enums to represent the state.

  enum SenderState {
    ReadyToSendHello,
    HasSentHello,
    HasSentNumber,
    HasReceivedNumber
  }

  Sender<const S: SenderState> {
    ...
  }

  impl Sender<{SenderState::ReadyToSendHello}>{
    ...
  }
One can then give the individual states extra parameters:

  HasSentNumber {
    number: u32
  }
(Note that in this case that doesn't make much sense, the number that is sent is more associated data then an actual type parameter. There is no real difference, at the type level, between HasSentNumber { number: 3 } and HasSentNumber { number: 6 }, and the compiler generating two types for this would be unnecessary. It is only an example of the syntax.)
[+] oleganza|6 years ago|reply
You can do session types already in stable Rust without const generics. Each "enum variant" can be its own separate type, and such types can only be instantiated by a method on the type representing the previous state, which also consumes that previous state.

We've used them in our Bulletproofs MPC (multi-party computation) where cryptographic requirement is not to replay the protocol from the mid-point. Since the user is supposed to perform these transitions within their application on top of network messages, such strongly-typed API guarantees that, if their program has compiled successfully, then:

1) Steps are performed in correct order. 2) The protocol cannot be replayed from the intermediate state.

We have wrote about it here: https://medium.com/interstellar/bulletproofs-pre-release-fcb... - scroll to "strongly-typed multiparty computation".

[+] cdirkx|6 years ago|reply
Re: associated data for a state, that's possible too!

  trait StateData {
    type Data;
  }

  struct Sender<const S: SenderState> {
    ... // sender implementation
    state_data: <Self as StateData>::Data
  }

  impl<const S: SenderState> StateData for Sender<{S}> {
    // default: no associated data
    default type Data = ();  
  }

  struct HasSentNumberData {
    sent: u32
  }

  // using a specialized impl (also an unstable feature).
  impl StateData for Sender<{SenderState::HasSentNumber}> {
    // instead of a complete struct we could also have done Data = u32, 
    // but a separate struct would enable a nicer constructor, default values, helper methods and so on.
    // (ommitted in this example though)
    type Data = HasSentNumberData;
  }

  struct HasReceivedNumberData {
    sent: u32,
    received: u32
  }

  impl StateData for Sender<{SenderState::HasReceivedNumber}> {
    type Data = HasReceivedNumberData;
  }
Example usage:

  impl Sender<{SenderState::HasReceivedNumber}> {
    ...

    fn validate_received_number(&self) -> bool {
      self.state_data.received == self.state_data.sent   
    }
  }  


  let sender = Sender::<{SenderState::HasReceivedNumber}> { state_data: HasReceivedNumberData { sent: 3, received: 28 } };
  assert_eq!(sender.validate_received_number() == false) // client sent wrong number!
Playground link to see this in action: https://play.rust-lang.org/?version=nightly&mode=release&edi...
[+] _hardwaregeek|6 years ago|reply
A good sign in a type system for me is that the types are accurately modeling behavior that I find in day to day code. To the point where moving back to a language that lacks this feature feels weirdly unsafe or not expressive enough. Going back to C APIs, with their states that need to be read via bit masks or through special integer return values, is just painful after using discriminated unions. Likewise being able to have multiple owners to a value is a little weird after using Rust.

Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.

[+] reificator|6 years ago|reply
> Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.

I'm not seeing a ligature there, or at least not an obvious one. What browser/platform are you using?

[+] frio|6 years ago|reply
We do something like this to encode our Redux states at compile-time (using TypeScript, obviously). Previously, using regular JS, the Redux devtools made tracking down incorrectly implemented reducers/state transition reasonably straightforward -- but you still had to trigger a bug before you knew you had to track it down, and implement tests etc.

This kind of design pattern measurably saves us time; it reduces the volume of unit tests we need to write/update when we make changes (we still test, just... with less fear), and it prevents newer developers from making mistakes. I haven't played with Rust (beyond a couple of toy projects) yet, and articles like this remind me I'm looking forward to sinking my teeth in over the Christmas break.

[+] amykyta|6 years ago|reply
Cool! Are you referring to the phantom types technique? I am interested in how you implement this in typescript. Do you have a blog post about it you can refer me to? Or even just a gist?
[+] dmkolobov|6 years ago|reply
This is one of my favorite patterns from Haskell, which I've known as "type-level programming". One of the most common examples is a "sized vector"[1] which allows compile-time bounds checking. This is achieved by annotating each vector type with a phantom type variable representing its size.

Nice to see something like this in Rust! One thing that's a bit of a bummer, and I'm sure there are very good reasons for this, is that we HAVE to use every type argument of struct in its definition. If this restriction were to be relaxed, we wouldn't need the "state" field in the struct at all, and we could make the state type variable truly "phantom".

[1] https://www.schoolofhaskell.com/user/konn/prove-your-haskell...

EDIT: Typos.

[+] kccqzy|6 years ago|reply
No. Singletons in Haskell are the single most over-engineered and ugliest thing I've seen. If you are going this route, you might as well use a real dependently typed language like Agda or Coq, and the latter supports extracting simple Haskell code from your Coq code while keeping your proofs in Coq.

Indexed types have their uses but just think about this: for your length-indexed vector example, what if you want to read a file and put its lines into such a vector? Your return type needs the length but it's not known until runtime. Okay you use existential quantification, but you now also need a runtime representation of your types. That's done through singletons. But that destroys Haskell's phase separation and means the inefficient ways you use to prove things and encode naturals now leak to the runtime! Have fun with your unary numbers then.

[+] choudanu4|6 years ago|reply
Does the PhantomData type in Rust do what you want? Or are you talking about something slightly different?

https://doc.rust-lang.org/nomicon/phantom-data.html

https://doc.rust-lang.org/std/marker/struct.PhantomData.html

So

  struct Sender<S> {
    /// Actual implementation of network I/O.
    inner: SenderImpl;  
    /// 0-sized field, doesn't exist at runtime.
    state: S;
  }
I believe could become

  struct Sender<S> {
    /// Actual implementation of network I/O.
    inner: SenderImpl;  
    /// 0-sized field, doesn't exist at runtime.
    _marker: PhantomData<S>;
  }
I've never used this PhantomData personally, so this might be wrong. Cheers!
[+] nitnelave|6 years ago|reply
I wrote something similar in C++, completely compile time that disappears at runtime: https://www.fluentcpp.com/2019/09/24/expressive-code-for-sta...

Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.

[+] pcwalton|6 years ago|reply
> Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.

Not soundly. Such static analysis can be trivially defeated using e.g. virtual methods.

[+] RcouF1uZ4gsC|6 years ago|reply
Basically, they are using Rust to encode a state machine using types. This is brilliant! The nice thing is that the transitions are determined at compile time so there can be performance benefit as well as compile time correctness testing.

Really neat techniques!

[+] pedrow|6 years ago|reply
Maybe not everyone will know this but Typestate was one of the original 'headline features' of Rust when Graydon Hoare started it. It was based on the Strom/Yemeni paper[0] for the NIL language. There's a mention of in this SO answer from 2010[1] and in the LtU discussion[2]. I'm not sure why it was de-emphasised, I think it didn't work as well as anticipated in the 'real world'

[0]: http://www.cs.cmu.edu/~aldrich/papers/classic/tse12-typestat...

[1]: https://stackoverflow.com/questions/3210025/what-is-typestat...

[2]: http://lambda-the-ultimate.org/node/4009

[+] dhash|6 years ago|reply
Scala/Haskell has the same thing, and it's an amazing feature. The proper type is that of

    trait IndexedStateT[A, B, C]
Which signifies a typelevel state machine moving from state A to state B emitting a value of type C.

I can only speak for scala, but i'm assuming haskell has singleton and literal types as well. Meaning that code like this works great.

    object DoorOpen
    object DoorClosed
   
    class Door {
      def open: IndexedState[DoorClosed.type, DoorOpen.type, Unit]
      def close: IndexedState[DoorOpen.type, DoorClosed.type, Unit]
    }

   val d = Door()
   for {
      _ <- d.open() //works
      _ <- d.close() // works
      // _ <- d.close() //compile error
   }

By my understanding of the article, it uses the borrow/move state to implement the state transistion. Is this generalizable to arbitrary state machines, or only a simple 2-state one?
[+] Rusky|6 years ago|reply
It generalizes to arbitrary state machines- check out the reference near the end to session types.

Each state is a type, methods consume their receiver (the "move state") and return the new state. You can't accidentally keep around a copy of the old state.

[+] unlinked_dll|6 years ago|reply
This is a really useful pattern when you want to have rigidly defined/enforced state transitions to ensure that the data is never in an invalid state for a given operation.

It's pretty awful to deal with when you're unsure of what the state machine should look like, or if there needs to be a lot more flexibility in how the data is accessed. Maintainability nightmare.

An example of this I ran into is a data processing pipeline architecture where each vertex of the processing graph had a processing function called in a loop on its own dedicated thread. Using the type state pattern helped clearly define the "life cycle" of each vertex and enforce it, which provided for some powerful synchronization guarantees (e.g. we could provide some elements of memory safety even when loading things through shared libraries). If you dug into it you could break things, but that would be more work than just following the pattern.

[+] gameswithgo|6 years ago|reply
Are there other languages that can do the trick done with "close()" on the file there? A type with a method that can make the type compile time unusable after being called!

that is pretty neat.

[+] nimish|6 years ago|reply
Any dependently typed language worth its salt will be able to model a state machine in its type system.

Haskell can do it with some effort, maybe for simple stuff phantom types and GADTs are enough. With linear types the explicit "consumption" can be modeled.

[+] Too|6 years ago|reply
Most languages have some variant of this where close is equal to variable going unaccessible. C# IDisposable, Python with-statement, Java try/resources, C++ RAII destructors, etc.

None are really as powerful but they come close and cover many use cases.

[+] Sharlin|6 years ago|reply
There are, but they’re mostly research languages or otherwise experimental.
[+] aliceryhl|6 years ago|reply
It's a big part of the whole idea behind Rust really.
[+] amedvednikov|6 years ago|reply
> 2. we have seeked in a file that was already closed.

> The second error, however, is much harder to catch. Most programming languages support the necessary features to make this error hard, typically by closing the file upon destruction or at the end of a higher-order function call, but the only non-academic language that I know of that can actually entirely prevent that error is Rust.

Why not simply add an `is_closed` flag and throw an error if it is?

[+] msl|6 years ago|reply
> Why not simply add an `is_closed` flag and throw an error if it is?

That's a way of doing it at runtime. The article describes catching the error at compile time.

[+] aliceryhl|6 years ago|reply
The compiler is not able to catch that, and that the compiler can catch it is the whole idea behind type states.
[+] justryry|6 years ago|reply
This is one my favorite features of the language and I believe to be fairly unique. It can make for some slick and safe state machine like code.

I do wish that we had reliable RVO so that this could come at zero cost.

[+] jbritton|6 years ago|reply
This line caught my eye.

my_file.open(); // Error: this may fail.

1) If this can fail, then it should be a compile error to not test the result code.

2) IMHO it would be nice if there was something like Python's with statement to correctly close a file.

    with open(filename, 'r') as f:
        f.read()
    # f.close() invoked automatically here
This prevents trying to close a file that is not opened.

The idea of encoding a state machine into the types seems interesting.

[+] andolanra|6 years ago|reply
You should probably have read the rest of the blog post, because in Rust it is a compiler error to not test the result, and the post specifically calls this out. Specifically, look for this section:

    let mut my_file = MyFile::open(path)?;
    // Note the `?` above. It's a simple operator that asks
    // the compiler to check whether the operation succeeded.
    // The *only* way to obtain a `MyFile` object is to
    // have a successful `MyFile::open`.

    // At this line, `my_file` is a `MyFile`, which means
    // that we may use it.
[+] james-mcelwain|6 years ago|reply
In actual Rust APIs, opening a file returns a result, which needs to be handled or unwrapped in order to operate on the file.

All resources in Rust are implicitly like your Python with example -- when they go out of scope and destructors run, the file will be closed.

[+] paulddraper|6 years ago|reply
> it would be nice if there was something like Python's with statement to correctly close a file.

In C++ or Rust, non-memory resource management is arguably even easier than Python.

RAII: Resource Acquisition Is Initialization.

(Apologies for the C++; can do the same thing in Rust.)

    ManagedFile myfile("path.txt");
    myfile.read();
`myfile` gets released whenever the scope finishes.

That said, the C++ and Rust stdlibs don't look like that because opening and closing files is not exception-free and unlike Python, C++ and Rust don't have or don't prefer non-explicit error handling.

[+] strictfp|6 years ago|reply
1) A typesafe variant of this example, caught by the compiler, is what he implements further down 2) There is. Rust has the Drop trait, which guarantees destruction when going out of scope. Even better than `with` statements since the API doesn't require anything special from the user.
[+] ChrisSD|6 years ago|reply
It's interesting to take note of the actual Rust `File` object: https://doc.rust-lang.org/std/fs/struct.File.html

`open` is a static function on the `File` object and an instance of `File` does not have a `close` method at all. So, as in this article, an instance of `File` can only be created if `open` succeeded and the file will be closed when dropped (either implicitly or explicitly). For example:

    fn main() -> std::io::Result<()> {
        // a file object is only created if File::create is successful
        // otherwise main returns an error
        let mut file = File::create("foo.txt")?;
        file.write_all(b"Hello, world!")?;
        Ok(())
    } // file closed here, no close method needed.
This contrasts with many other languages where a class instance can be in an invalid state. This is something the typestate pattern helps to avoid.
[+] pornel|6 years ago|reply
In Rust this is achieved using `if let` for anything that uses `Result` and `Drop` (it's kinda neat how a few orthogonal features fit nicely together like that):

    if let Ok(f) = File::open(filename) {
        f.read();
        // drop(f) automatic here
    }

But Rust's move semantics give you extra flexibility, because you don't have to close it if you don't want to.

    let keep = None;
    if let Ok(f) = File::open(filename) {
        f.read();
        if random() {
           keep = Some(f);
        }
    }
    // the file *may* be usable beyond the first scope,
    // and it's still dropped correctly.
Values aren't dropped simply at the end of their initial scope (as in `with` or on-stack RAII), but after their last use, and language semantics allow the compiler to track globally where the last use is.
[+] gameswithgo|6 years ago|reply
When you get a result type back in Rust you have to check it to use the value it contains. The worst you can do is call unwrap(), which says "Im assuming this will not fail, panic if I am wrong". But you can't forget to check.
[+] MaulingMonkey|6 years ago|reply
RE 1: Several languages (including Rust!) have annotations to force you to test the result code. However, they generally leave my_file in scope even in the error path, which makes bugs easier to write, and which means the File type is forced to handle all the complexity of handling the case where the file wasn't successfully loaded for it's entire API surface.

RE 2: Rust takes the C++ approach of allowing types to define their own cleanup which gets auto-invoked when the object goes out of scope, instead of forcing the calling code to worry about it. Specifically, Rust types can implement the auto-invoked "Drop" trait, basically equivalent to C++'s destructors. You only need to call close if you want to close a file early (e.g. before the end of the scope), in Rust, which is fairly rare.

I vastly prefer this approach, but it admittedly doesn't play nicely with the... less deterministic lifetimes of objects in a garbage collected system, so I can understand why Python, C#, etc. have more explicit scope syntax for cleanup.

[+] caleb-allen|6 years ago|reply
Feels similar to Kotlin's sealed classes
[+] james-mcelwain|6 years ago|reply
They are similar to Kotlin's sealed classes, except there's no concept in Kotlin of an object "consuming" itself, which is necessary for maintaining safety in the API.

For example, in Kotlin, if a method on state A returns state B, there's no way to invalidate all existing references to state A. Normally this invariant would be enforced on the caller's side, or perhaps by throwing an IllegalStateException if state A is called after producing state B.

[+] Congeec|6 years ago|reply
also Monad in Haskell