Managing mutable data in Elixir with Rust

[+] mgdev|2 years ago|reply

Rustler is great. Though this gets me thinking about how you can maintain as many Elixir invariants and conventions as possible, even while escaping them under the covers. Being able to call FeGraph.set/2 and have db actually be mutated violates Elixir's common call patterns, even if it's technically allowed.

For example: I wonder if it wouldn't be more "erlangy"/"elixiry" to model the mutable ops behind a genserver that you send messages to. In the Elixir world it's perfectly normal to make GenServer.call/3 and expect the target PID to change its internal state in a non-deterministic way. It's one of the only APIs that explicitly blesses this. The ETS API is another.

Alternatively, you could have the ref store both a DB sequence and a ref ID (set to the last DB sequence), and compare them on operations. If you call FeGraph.set/2 with the same db ref two times, you compare the ref ID to the sequence and panic if they aren't equal. They always need to operate with the latest ref. Then at last the local semantics are maintained.

Maybe this is less relevant for the FeGraph example, since Elixir libs dealing with data are more willing to treat the DB as a mutable thing (ETS, Digraph). But the it's not universal. Postgrex, for example, follows the DB-as-PID convention. Defaulting to an Elixiry pattern by default for Rustler implementation is probably a good practice.

[+] clarkema|2 years ago|reply

That's an interesting point that I should perhaps have covered in the original article.

The real code that this is based on is in fact hidden behind a GenServer for this exact reason -- to maintain the expectations of other Elixir code that has to interact with it. The advantage of the escape hatch, as another commenter mentions, is allowing efficient sparse mutations of a large chunk of data, without having to pay a copy penalty every time. I definitely wouldn't recommend sharing the db handle widely.

[+] evnu|2 years ago|reply

> For example: I wonder if it wouldn't be more "erlangy"/"elixiry" to model the mutable ops behind a genserver that you send messages to.

It depends on the use case. For example, when creating a resource (basically a refcounted datastructure), it might make sense to allow mutable access only through a process as the "owner" of the resource. But if you have only read-only data behind that resource, sharing the resource similar to ETS might be what you want.

[+] NiklasBegley|2 years ago|reply

I also want to give a shout out to the Rustler folks for creating a great library! We use Rustler quite extensively at Doctave, and have written about our experiences with Rustler before [0] (though our architecture has advanced quite a bit since the article was written).

Integrating Elixir and Rust has been delightfully straightforward and is a great choice for calling into libraries not available in Elixir, or offloading CPU intensive tasks.

[0]: https://www.doctave.com/blog/2021/08/19/using-rust-with-elix...

[+] atonse|2 years ago|reply

Getting rustler up and running for us was very easy. Thank you to the team for making this excellent library.

We had some inconsistent build results (ours is an umbrella app) but apart from forcing a compilation and losing the ability to cache the rust builds, everything else has worked so well so we’re happy to get access to the massive rust ecosystem.

[+] AlchemistCamp|2 years ago|reply

It’s exactly this use case that nudged me (primarily an Elixir dev) to start learning Rust a few years back.

Unfortunately, I haven’t had a project where I’ve needed to use Rustler yet, though.

[+] doctor_phil|2 years ago|reply

Nice. I thought that Zig would be a nice language for writing NIFs - but of course Rust would be good too. Cool!

[+] impulser_|2 years ago|reply

Rust perfect for this because Rust code can be very reliable which is needed for NIFs in Erlang because a NIF can crash the whole VM.

So using C and Zig libraries without fully understanding them can be a death trap while in Rust as long as it doesn't use unsafe code you can feel pretty good about using it.

[+] rubin55|2 years ago|reply

Zigler! https://github.com/E-xyza/zigler

[+] unknown|2 years ago|reply

[deleted]

[+] elbasti|2 years ago|reply

Cool writeup. A little ironic, since Erlang's `digraphs` are also mutable!

[+] Miner49er|2 years ago|reply

Erlang's digraphs are stored in an ETS table, so aren't they only mutable in the same way that ETS tables are mutable?

I don't normally see people consider (D)ETS tables as mutable, however.

[+] hpeter|2 years ago|reply

This is super cool. I learn something new every day.

[+] unknown|2 years ago|reply

[deleted]

[+] wredue|2 years ago|reply

Immutable data is not a “foundation of scalability and robustness”.

[+] unoti|2 years ago|reply

> Immutable data is not a “foundation of scalability and robustness”.

It may not be the only way to get to scalability and robustness. But it certainly is the cornerstone of how Erlang gets there.

1. First, the way Erlang treats data ensures that every piece of data can be sent over the wire by default. This helps pave the way for another amazing characteristic of Erlang, and that is when you refer to and use an object, it's essentially transparent to your code whether that object is on this machine or another machine in the cluster. This would not be possible without the fact that all data structures are remotable, which is enabled by the immutable data. (See also side note below.)

2. The immutable data also leads to clean rollback semantics, making it easy to always have a self-consistent state of the system ready to use even after some kind of fault.

3. The immutable data also leads to very clean and easy ways to handle multithreading because you never have to worry about making object copies. You can be assured that it's ok for two threads to use the same memory object because there's no way either of them can change it.

Side note: Alan Kay, the inventory of OO, has said that people get the entire idea of what he was talking about all wrong. He said that object orientation isn't about objects, but its about communication. He was talking about the idea of an object being more like what we'd call a web endpoint today, where when you instantiate it you communicate with it by sending it messages. It's funny to me that a functional language like Erlang best embodies that OO idea today. Go code can, too.

"I'm sorry that I long ago coined the term 'objects' for this topic because it gets many people to focus on the lesser idea. The big idea is 'messaging'" - Alan Kay <https://en.wikipedia.org/wiki/Alan_Kay>

He goes on in the original underlying document to say "OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things." All of these ideas are front-and-center in Erlang (and by extension Elixir).

[+] WolfeReader|2 years ago|reply

I'm not sure Joe Armstrong would agree with your comment.

58 comments