top | item 40959185

(no title)

wh33zle | 1 year ago

> The stream is a stream of Result<ParsedMessage, ErrorMessage>

Fair point! Yeah that would allow error handling the state machine.

> > Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.

> I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?

Yeah sure! If you are up for it, you can dig into the actual code here: https://github.com/firezone/firezone/tree/main/rust/connlib/...

The requirements are:

    1. Run N clients of the TURN protocol.
    2. Run M connections (ICE agent + WireGuard tunnel).
    3. Allow each connection to use each TURN client.
    4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).

In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.

All of the above is managed as a single `Node` that is composed into a larger state machine that manages ACLs and uses this library to route IP packets to and from a TUN device. That is where the concurrent access comes in: We receive ACL updates via a WebSocket and change the routing logic as a result. Concurrently, we need to read from the TUN device and work out which WireGuard tunnel the packets needs to go through. A WireGuard tunnel may use one of the TURN clients to relay the packet in case a direct connection isn't possible. Lastly, we also need to read from the UDP socket, index into the correct connection based on the IP and decrypt incoming IP packets.

In summary, there are lots of state updates happening from multiple events and the state cannot be isolated into a single task where `&mut` would be an option. Instead, we need `&mut` access to pretty much everything upon each UDP packet from the network or IP packet from the TUN device.

> The Stun example is simple enough that it is achievable just with an async state machine.

I am fully aware of that. It needed to be simple to fit into the scope of a blog post that can be understood by people without a deep networking background. To explain sans-IO, I deliberately wanted a simple example so I could focus on sans-IO itself and not explaining the problem domain. With the basics in place, people can work out by themselves whether or not it is a good fit for their problem. Chances are, if they are struggling with lots of shared state between tasks, it could be!

discuss

joshka|1 year ago

Got it - thanks for the update. I'll take a run at it sometime - if only to understand where the pain points in doing this are more thoroughly.

wh33zle|1 year ago

Thinking about your example again made me think of another requirement: Multiplexing.

STUN ties requests and responses together with a transaction ID that is encoded as an attribute in the message.

Modelling this using `Sink`s and `Stream`s is quite difficult. Sending out the request via a `Sink` is easy but how do you correctly dispatch the incoming response again?

For example, let's say you are running 3 TURN clients concurrently in 3 async tasks that use the referenced `Sink` + `Stream` abstractions. How do you go from reading from a single UDP socket to 3 streams that each contain the correct responses based on the transaction ID that was sent out? To read the transaction ID, you'd have to parse the message. We've established that we can do that partially outside and only send the `Result` in. But in addition to the 3 tasks, we'd need an orchestrating task that keeps a temporary state of sent transaction IDs (received via one of the `Sink`s) and then picks the correct `Stream` for the response.

It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.