top | item 45092589

(no title)

ledauphin | 6 months ago

as an aside on Nimony aka Nim 3:

can somebody provide a reference explaining/demonstrating the ergonomics of ORC/ARC and in particular .cyclic? This is with a view toward imagining how developers who have never written anything in a non-garbage-collected language would adapt to Nimony.

discuss

order

alethic|6 months ago

ORC/ARC are a reference counting garbage collector. There's a bit of a terminological clash out there as to whether "garbage collection" includes reference counting (it's common for it to not, despite reference counting... being a runtime system that collects garbage). Regardless: what makes ORC/ARC interesting is that it optimizes away some/most counts statically, by looking for linear usage and eliding counts accordingly. This is the same approach taken by the Perseus system in use in some Microsoft languages like Koka and Lean, but came a little earlier, and doesn't do the whole "memory reuse" thing the Perseus system does.

So for ergonomics: reference counting is not a complete system. It's memory safe, but it can't handle reference cycles really very well -- since if two objects retain a reference to each other there'll always be a reference to the both of them and they'll never be freed, even if nothing else depends on them. The usual way to handle this is to ship a "cycle breaker" -- a mini-tracing collector -- alongside your reference counting system, which while is a little nondeterministic works very reasonably well.

But it's a little nondeterministic. Garbage collectors that trace references, and especially tracing systems with the fast heap ("nursery" or "minor heap") / slow heap ("major heap") generational distinction are really good. There's a reason tracing collectors are used among most languages -- ORC/ARC and similar systems have put reference counting back in close competition with tracing, but it's still somewhat slower. Reference counting offers one alternative, though -- the performance is deterministic. You have particular points in the code where destructors are injected, sometimes without a reference check (if the ORC/ARC optimization is good) and sometimes with a reference check, but you know your program will deallocate only at those points. This isn't the case for tracing GCs, where the garbage collector is more along the lines of a totally separate program that barges in and performs collections whenever it so desires. Reference counting offers an advantage here. (Also in interop.)

So, while you do need a cycle breaker to not potentially leak memory, Nim tries to get it to do as little as possible. One of these tools they provide to the user is the .acyclic pragma. If you have a data structure that looks like it could be cyclic but you know is not cyclic -- for example, a tree -- you can annotate it with the .acyclic pragma to tell the compiler not to worry about it. The compiler has its own (straightforward) heuristics, too, and so if you don't have any cyclic data in your program and let the compiler know that... it just won't include the cycle collector altogether, leaving you with a program with predictable memory patterns and behavior.

What these .cyclic annotations will do in Nim 3.0, reading the design documentation, is replace the .acyclic annotations. The compiler will assume all data is acyclic, and only include the cycle breaker if the user tells it to by annotating some cyclic data structure as such. This means if the user messes up they'll get memory leaks, but in the usual case they'll get access to this predictable performance. Seems like a good tradeoff for the target audience of Nim and seems like a reasonable worst-case -- memory leaks sure aren't the same thing as memory unsafety and I'm interested to see design decisions that strike a balance between burden on the programmer vs. burden on performance, w/o being terribly unsafe in the C or C++ fashion.

alethic|6 months ago

The short answer is you'd write your code the same, then add .cyclic annotations on cyclic data structures.

("The same" being a bit relative, here. Nim's sum types are quite a bit worse than those of an ML. Better than Go's, at least.)