Crafting Interpreters with Rust: On Garbage Collection

[+] scott_s|1 year ago|reply

In case the author reads this: please explicitly cite all of Nystrom's figures. A link is not enough.

Even with a citation, I'm not quite comfortable just reusing someone else's figures so many times when they're doing so much heavy lifting. But an explicit citation is the minimum.

[+] ltungv|1 year ago|reply

Thanks for the suggestion. I updated it with more explicit citations. After seeing your comment, I just looked around and saw that no license was given for the images.

I should probably draw my own.

[+] imachine1980_|1 year ago|reply

idk if he/she change it, but i see the name big and center after each use.

[+] jlewallen|1 year ago|reply

Crafting Interpreters is such an amazing work.

There's at least one other Rust implementation of lox that I know of (https://github.com/tdp2110/crafting-interpreters-rs) (no unsafe)

It's always interesting to see how different people approach the problems in their own language or relative isolation. I agree with others here, the real value of the original work lies in avoiding copy and paste.

[+] munificent|1 year ago|reply

There are a whole pile of Lox implementations in Rust (as well as many other lanugages):

https://github.com/munificent/craftinginterpreters/wiki/Lox-...

[+] ceronman|1 year ago|reply

Nice article. A couple of years ago I also implemented Lox in Rust. And I faced the exact same issues that the author describes here, and I also ended up with a very similar implementation.

I also wrote a post about it: https://ceronman.com/2021/07/22/my-experience-crafting-an-in...

I ended up having two implementations: One in purely safe Rust and another one with unsafe.

Note that if you're interesting in the "object manager" approach mentioned, I did that in my safe implementation, you can take a look at https://github.com/ceronman/loxido

[+] ltungv|1 year ago|reply

I'd read your article, and it was lovely. It nudged me to just go unsafe and implement some of the data structures from scratch.

[+] jcdavis|1 year ago|reply

Fun writeup. When I went through and implement the book in rust (https://github.com/jcdavis/rulox), I just used Rc and never really solved the cycle issue.

I'll +1 and say I highly recommend going through Crafting Interpreters, particularly as a way of building non-trivial programs in languages you are less familiar with - If you just follow the java/C example you are tempted to lean into copy/pasting the samples, but figuring things out in other languages is a great experience.

I spent a longass time tracking down an issue with how the reference interpreter implementation defines tokens: https://github.com/munificent/craftinginterpreters/issues/11... which was frustrating but good debugging practice.

[+] lawn|1 year ago|reply

> If you just follow the java/C example you are tempted to lean into copy/pasting the samples, but figuring things out in other languages is a great experience.

I'll echo this recommendation.

I haven't gone through Crafting Interpreters yet (it's on the bookshelf) but I did go through a big chunk of Building Git and instead of Ruby I did it in Rust, to get back into the language and go a little deeper.

It was really great and it gave me a lot more than if I'd just copy paste the code as you say.

[+] ghosty141|1 year ago|reply

My only issue with Crafting Interpreters is that it relies too much on writing code and then explaining it instead of getting the concepts across first and then providing the code. Another thing I disliked was some code was structured in a way that it would go well with concepts that came further down the implementation which can lead to confusion since it seems overcomplicated/overabstracted at first.

[+] Dowwie|1 year ago|reply

did you not want anyone to ever find rulox on github? it has no descriptive information about it in the repo

[+] nestorD|1 year ago|reply

Note that there is quite a bit of thinking and available implementations of GC implementations available in Rust: https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe...

[+] zozbot234|1 year ago|reply

There's also more recent implementations such as https://github.com/chc4/samsara (quite similar to https://github.com/pebal/sgcl , which is for C++. Both implement a concurrent tracing algorithm which allows for reduced latency compared to a naïve stop-the-world GC).

[+] harpiaharpyja|1 year ago|reply

I did something very similar to this myself. My GC implementation was pretty heavily inspired by manishearth's Rust GC.

My swing at the Crafting Intepreters in Rust can be found here: https://github.com/mwerezak/sphinx-lang

Although, at the time I was really interested in Lua's syntax so for extra credit I designed a completely different syntax for Lox closer to Lua's.

I'd love to revisit this project sometime and implement a register-based VM similar to Lua's.

[+] ltungv|1 year ago|reply

Thanks for sharing! That's a much more interesting project compared to mine xD. I had the intuition that rust-gc could be implemented in a simpler fashion by not being generic. Glad to see someone actually had it done.

[+] samsartor|1 year ago|reply

I have run into a lot of similar problems writing a state management framework for Rust (wip at https://gitlab.com/samsartor/hornpipe) and at one point despaired that I would have to abandon Rust and make a whole new reactive language. But the last couple years I've got it working really nicely with weak references and software transactional memory. Every reference is `Copy + Send + Sync + 'static`. And you can mutate objects by having a transaction make a local bitwise copy, which will get atomically merged and swapped on commit. The old copy gets kept around for undo/redo purposes. I've still got a boatload of algorithmic puzzles to solve to provide all the MVP features, which will take a while because it isn't my full-time job. But the details seem technically sound.

I did write a blog post about some of my design thoughts, although it doesn't dig into the technical guts: https://samsartor.com/guis-3/

[+] kaspermarstal|1 year ago|reply

> Don’t ask me about covariance since I honestly don’t know the answer. I recommend reading the section on subtyping and variance and The Rustonomicon. I went through them a few times but still don’t entirely get all the details. Then, we can define each type of object like so:

Haha this is gold. We ALL do this ALL the time.

[+] ltungv|1 year ago|reply

haha, glad to know that I'm not the only one

[+] ajeetdsouza|1 year ago|reply

This is really well-written!

Shameless plug: you may want to check out Loxcraft: https://github.com/ajeetdsouza/loxcraft

I too followed the path of "ignore safe Rust for maximum performance". It got pretty close to the C version, even beating it on some benchmarks.

[+] ltungv|1 year ago|reply

Nice, thanks for sharing. Definitely gonna look into how you do it, I couldn't get as close to clox like you did

[+] lowbloodsugar|1 year ago|reply

TL;DR: Don't use unsafe to break the rules. Use unsafe to enforce the rules in new ways.

Great journey. That's the thing about "unsafe" and "questionable parts"... if they really are questionable, then don't do 'em. In this case, if it is the case that holding a reference to a GC object guarantees that that object can't be freed, then it's not questionable. Proving that to be case can be fun tho.

The question is: does my unsafe code allow violating the borrow rules?

Cell, RefCell, RwLock etc give you interior mutability, but there is no way to violate the borrow rules with them. On the surface, it looks like you are violating them because "I have a non-mut thing that I can mutate".

If you build a thing that allows two distinct &mut T's to the same thing, then you are basically asking for a bug. Don't use unsafe to break the rules. Use unsafe to enforce the rules in new ways.

[+] ltungv|1 year ago|reply

If someone else depends on this project, I'd definitely not implement the questionable stuff. You're right that proving whether it's safe can be lots of fun, and I'm planning to try it out.

Based on what I've read, it's entirely possible. If my Rust knowledge is correct, there are 2 things that have to be proven - the object's lifetime and the borrow rules.

Proving the lifetime can be done at compile-time based on what was discussed in these articles https://without.boats/tags/shifgrethor/. But that still leaves us with proving that we don't violate the borrow rules. AFAIK, the only way to do it in my implementation is with Cell and RefCell enforcing the borrow rules at runtime.

Before removing all the safety checks, I actually used Gc<RefCell<T>> for mutable objects, so the borrow rules were enforced but not the lifetime. However, I decided to remove RefCell<T> to have some fun and see how I could shoot myself in the foot.

[+] chc4|1 year ago|reply

The other major issue with the Deref implementation is that `&mut` needs to be an exclusive reference, and if you're doing mark/sweep of GC objects via references you break that invariant if you hold any `&mut` across a GC call and are causing UB. In practice this probably doesn't affect your program, but I suspect Miri would yell at you and there's the off chance the compiler gets really tricky with inlining somewhere and you are inflicted with horrible pain.

[+] ltungv|1 year ago|reply

Yeah true, this breaks all sorts of contracts that we have with Rust xD. But if mark-and-sweep is implemented correctly, then no reference is ever held across the GC. Though, there's gonna be lots of pain debugging when you got it wrong, speaking from experience.

Do you have any resources about Rust inlining and the issues it might cause? I'd love to read more about that.

[+] Daffodils|1 year ago|reply

Just wanted to share that the Build your own Interpreters challenge on CodeCrafters is based on the Crafting Interpreters book and is free at the moment.

Link: https://app.codecrafters.io/courses/interpreter/overview

(Disclaimer: I work for CodeCrafters, and built this challenge).

[+] bombela|1 year ago|reply

> How much memory is used for the stack and when it is freed can always be determined at compile-time.

Technically, not always. You can allocate on the stack at runtime, see "alloca" in C.

But is alloca even that useful in practice? I struggle to find a good example.

[+] ozgrakkurt|1 year ago|reply

This was used in linux kernel and they were removing it later because it was a huge pain as far as I can remember. Seems like it causes problems because entire system is built assuming stack/heap work in a certain way.

[+] ltungv|1 year ago|reply

Oh right. I read about this once or twice and have never looked at it again. I've made the (wrong) assumption that this function needs some compile-time help. Is it all done at runtime? Or does that depend on the actual C implementation?

[+] tick_tock_tick|1 year ago|reply

I mean it's basically the fastest allocation possible. It's amazing under a very very unique situation.

- you need a lot of memory

- it needs to be as fast as possible

- it's a leaf function that can never call another function

99.9% of the time those aren't all true.

[+] jamesmunns|1 year ago|reply

Additionally, recursion and function pointers (including dynamic dispatch) are the other two "normal" language features that can defeat static stack analysis. There are techniques for working around this (bounded annotations, languages with strong types can narrow potential dispatch options), but it's a little harder.

91 comments