top | item 37670740

Arena allocator tips and tricks

288 points| jandeboevrie | 2 years ago |nullprogram.com

91 comments

order
[+] kgeist|2 years ago|reply
In my hobby project, I started always passing an allocator argument to every function or object which requires allocation (inspired by Zig) and I love it so far. Often I can just pass a bump pointer allocator or a stack-based allocator and do not care about deallocation of individual objects. I also wrote a simple unit testing framework to test out-of-memory conditions because it's easy to do when you're in control of allocators. Basically I inject an allocator which calculates how many allocations are done when a unit test is run, and then unit tests are later rerun again by injecting OOM at every known allocation point. A lot of bugs and crashes happen when an OOM is encountered because such paths are rarely tested. The idea of my pet project is a very resilient HTTP server with request-scoped allocations and recovery from OOM without crashes.
[+] habibur|2 years ago|reply
Adding that when many linux distributions face OOM, a killer daemon steps in and might kill your service even if you were handling the situation properly.
[+] olodus|2 years ago|reply
Oh wow that is a really interesting test solution. That would be an interesting thing to add to all zig tests (I know they already have the testing allocator and good valgrind support but I don't think that tests/simulates oom).

I love things like these that use existing tests and expand the to just test further thing in already covered flows. We have done similar things at my work where we test expansion of data models against old models to check that we cover upgrade scenarios.

[+] corysama|2 years ago|reply
A very old trick for running Lua in your PlayStation 2 game (where the PS2 is a machine with 32MB of RAM and no memory paging) is to hook Lua’s realloc function into the venerable Doug Lea’s Malloc (https://gee.cs.oswego.edu/dl/html/malloc.html) set up to run in arena mode (ONLY_MSPACES? It’s been a decade or two…). That way Lua can fragment the arena all it wants without making a mess of the rest of the tiny address space.
[+] phire|2 years ago|reply
> (where the PS2 is a machine with 32MB of RAM and no memory paging)

The PS2 hardware does have full support for memory paging (at least on the main cpu core). PS2 Linux makes full use of it.

But the default TLB configuration from the BIOS is just a single static 31MB page (the other 1MB is reserved for the BIOS) and the SDK doesn't provide any tooling for dynamic pages.

And this is MIPS, so it's a software managed TLB with 48 entries. I wouldn't be surprised if some games did have dynamic paging, but they would need to provide their own TLB exception handler.

[+] varispeed|2 years ago|reply
I recently needed to write a memory allocator and being lazy asked ChatGPT for help. Interestingly it came up with implementation eerily similar to what is described in that document. Nonetheless everything worked like a charm from the start.
[+] hgs3|2 years ago|reply
There is also the double-sided arena allocator which uses one contiguous buffer but grows in both directions (front-to-back and back-to-front). When allocating memory from it you need to indicate whether you want memory from the front or back. The allocator is out-of-memory when both font and back meet.

The double-sided approach is useful for various purposes. For instance you can allocate short-lived data from the front and long-lived data from the back. It also makes better use of free space: with two separate arena allocators one could be out-of-memory but the other might have free space. With the double-sided approach all memory is fair game.

[+] sakras|2 years ago|reply
Fantastic article! I have a project with a similar arena allocator, so I'll definitely be taking some of these tricks. One thing my allocator does do is organize arenas into a linked list so that you can grow your size dynamically. However I really like the article's point that you're always going to be living within _some_ memory budget, so you might as well allocate everything up front into a giant arena, and then divide the giant arena up into smaller arenas.

Also I've heard that you can save an instruction when checking if your allocator is full by subtracting from the top, and checking the zero flag. It seems to complicate alignment logic. Does that ever end up mattering?

[+] saagarjha|2 years ago|reply
> However I really like the article's point that you're always going to be living within _some_ memory budget, so you might as well allocate everything up front into a giant arena, and then divide the giant arena up into smaller arenas.

That depends. If you’re running on e.g. a video game console where you’re the sole user of a block of pretty much all memory, go ahead. On a system with other things running, you generally don’t want to assume you can just take some amount of memory, even if it’s “just the free memory”, or even “I probably won’t use it so it will be overcommitted”. Changing system conditions and other system pressure are outside of your control and your reservation may prevent the system from doing its job effectively and prioritizing your application appropriately.

[+] wongarsu|2 years ago|reply
> so you might as well allocate everything up front into a giant arena, and then divide the giant arena up into smaller arenas

However if you do this note how the article hints at this strategy needing a bit more code on Windows: Windows doesn't do overcommit by default. If you do one big malloc Windows will grow the page file to ensure it can page that much memory in if you start writing to it. That's fine if you allocate a couple megabytes, but if your area is gigabytes in size you want to call VirtualAlloc with MEM_RESERVE to get a big contiguous memory area, then call VirtualAlloc with MEM_COMMIT as needed on chunks you actually want to use.

[+] matheusmoreira|2 years ago|reply
Excellent article.

> While you could make a big, global char[] array to back your arena, it’s technically not permitted (strict aliasing).

Aren't char pointers/arrays allowed to alias everything?

I used that technique in my programming language and its allocator. It's freestanding so I couldn't use malloc. I had to get memory from somewhere so I just statically allocated a big array of bytes. It worked perfectly but I do disable strict aliasing as a matter of course since in systems programming there's aliasing everywhere.

[+] gpderetta|2 years ago|reply
char can alias everything, i.e. you can deference a char pointer with impunity, even if the actual dynamic type[1] of an object is a different type. The reverse is not true: if the dynamic type of an object is char, just by using the alias rules, you can't deference it as an object of a different type.

In C++ you can just use placement new to change the dynamic type of (part of ) a char array (but beware of pointer provenance). In C is more complex: I don't claim to understand this fully, but my understanding is you can't change the type of a named object, but you should be able to change the type of anonymous memory (for example, what is allocated with malloc) by simply writing into it.

In practice at least GCC considers the full alias rules unimplementable and my understanding is that it uses a conservative model where every store can change the type of an object, and uses purely structural equivalence.

[1] of course most C implementations don't actually track dynamic types at runtime.

[+] dellorter|2 years ago|reply
If you overlay a struct in your (char) buffer and dereference a pointer to it you would be accessing something with a different type than its declared type(char* as struct something *), it’s strict aliasing violation

To do stay in the rules you could set up a void* to suitable region in a linkerscript

[+] bajsejohannes|2 years ago|reply
> Typically arena lifetime is the whole program

Some other good cases for arenas are rendering of a frame in a video game and handling of a http request. The memory is contained within that context and short lived.

[+] gpderetta|2 years ago|reply
That's the lifetime of the objects in the arena, but wouldn't you recycle the arena itself across frames?

For requests it might make sense to have low and high water marks so that additional arenas are created during request peaks and destroyed after if you want to limit long term memory usage of your application.

[+] bjourne|2 years ago|reply
I never heard the term "arena allocation" before. I always thought it was called "bump pointer allocation" since all you do is add to (bump) a pointer. One useful trick when designing a bump allocator is to allocate word size bytes (8 on 64-bit) extra to store object headers in. For example, if you store objects' sizes in the header you can iterate all allocated objects and you can often also reallocate objects in the middle of the arena without having to free every object.
[+] eigenspace|2 years ago|reply
I really like arena / bump allocators, they're really useful and powerful tools. I've been playing around lately with a Julia package that makes it relatively easy and safe to manage an arena https://github.com/MasonProtter/Bumper.jl

The thread-safety and dynamic extent is something I'm particularly pleased about.

[+] saagarjha|2 years ago|reply
> Unsigned sizes are another historically common source of defects, and offer no practical advantages in return. Case in point exercise for the reader: Change each ptrdiff_t to size_t in alloc, find the defect that results, then fix it.

I know that it’s a different “kind” of defect, but none of the code has overflow checks even with ptrdiff_t…

[+] patrec|2 years ago|reply
Why would it need overflow checks when subtracting two valid pointers?
[+] yelnatz|2 years ago|reply
Didn't know these were called Arenas, this technique is prevalent in game development.
[+] TickleSteve|2 years ago|reply
also called a "bump" allocator... because all it does is bump a pointer.

nice to use when you have a nicely ordered order of execution where you are guaranteed to always come back to a known position where you can free the entire heap/arena at once. (i.e. a typical main message handling loop).

[+] usrnm|2 years ago|reply
> A minority of programs inherently require general purpose allocation, at least in part, that linear allocation cannot fulfill. This includes, for example, most programming language runtimes

Interesting definition of "minority"

[+] vkazanov|2 years ago|reply
As a share of the total number of programs written this is a minority indeed. How many pls are out there vs the number of libraries/apps?

The number of deploys is a different thing.

[+] sylware|2 years ago|reply
I tend to avoid to work with data types which require to have a stable virtual address, namely a dynamic base with an offset. Because, mremap.

That said, it really depends on the data usage semantics, and one could write a much less costly allocator for such specific data. Virtual address stable generic allocators have a tendancy to be technical abominations based on statistical usage assumptions. Namely, their cost could be not worth for the improvment, even if there is a significant one.

[+] chatmasta|2 years ago|reply
Are there security issues with not zeroing out the previously used memory when "releasing" a buffer (moving the offset)? I'm not a systems programmer, so I guess I just assumed that most malloc implementations also zeroed out memory when freeing it, but a quick Google suggests that's not actually the case (with typical malloc implementations "getting their pages from /dev/zero" [0], effectively zeroing memory at allocation time rather than when freeing it).

[0] https://superuser.com/a/894508

[+] titzer|2 years ago|reply
> Are there security issues with not zeroing out the previously used memory

Yes, there can be. Security-critical software often does this explicitly, and it's been a bug when compilers have removed the zeroing by reasoning that unreachable memory is unreachable...leading to crypto secrets floating in memory unnecessarily.

For languages like Java and Go where objects are at least zero-initialized before the constructor(s) run, usually the allocator just zeroes the entire TLAB before allocation.

[+] mcguire|2 years ago|reply
"Without individual lifetimes, you don’t need to write destructors, nor do your programs need to walk data structures at run time to take them apart."

Destructors are a very bad idea if you are using any form of garbage collection other than reference counting. The destructors won't run until some arbitrary time after the last access to the object, and in the case of arenas, what would be a very fast deallocation becomes proportional to the number of objects. Further, if destructors can revive objects, everything gets very complicated.

[+] teo_zero|2 years ago|reply
If alloc() changes .beg, how do you free (release to the OS, not mark as unused) the whole arena when it's not needed anymore? Wouldn't changing .end be better, so that you can implement a destroyarena(a) which calls free(a.beg)?

Typical use: set up a specific arena just to read in the config file. Once it's done, release the whole arena.

[+] sixthDot|2 years ago|reply
This globally good, two remarks however:

> you don’t need to write destructors

I think this is not accurate. Destructors are not deallocators, they are supposed to set the object field in an invalid state. Now truth is that both are often called together, e.g `delete`.

> Typically arena lifetime is the whole program, so you don’t need to worry about freeing it

A technic I use is to increment a counter on `arena.alloc` and decrement it on `arena.dealloc`, and then free the memory (if it's on the heap) accordingly.

[+] MathMonkeyMan|2 years ago|reply
> > you don’t need to write destructors

>

> I think this is not accurate. Destructors are not

> deallocators, they are supposed to set the object field in

> an invalid state. Now truth is that both are often called

> together, e.g `delete`.

If the object manages some resource other than memory, and if the object's lifetime is intended to guard the resource, then a destructor is needed.

But if the object manages memory only, as is often the case, and all of that memory came from the arena, then you really don't need to call any destructors.

This is the approach taken in one C++ [library][1] I've worked with, where objects represented scalar values to be used en masse for spreadsheet-like applications. In those applications (especially in 32-bit mode), being able to omit an allocator pointer and neglect a destructor call made things smaller and faster.

[1]: https://bloomberg.github.io/bde-resources/doxygen/bde_api_pr...

[+] flohofwoe|2 years ago|reply
> Destructors are not deallocators, they are supposed to set the object field in an invalid state.

A typical arena allocator would just reset an offset to zero when the arena is 'freed' without calling any object destructors, and the allocator wouldn't actually have any type information about the objects allocated in the arena (of course you could also write an allocator which registers a destructor function with each allocation, and which would be called before the arena is reset):

    bla_t* bla = arena_alloc(arena, alignment, size, destructor_func);
For C++ style RAII it probably makes more sense to use placement-new, and call the destructor manually to 'invalidate' the object without recycling its memory (the memory would only be recycled once the arena allocator is reset).
[+] floor_|2 years ago|reply
Coding a MUD as a hobby project using memory arenas and I am straight up not having a good time with strings. Right now I'm giving the players fixed size 4k buffers to send commands too instead of using dynamically sized strings. Everything else is golden. Just slap it on to the frame/temp arena and reset the marker back to 0 after an update.
[+] ok123456|2 years ago|reply
If you're using c++ look at pmr::string.