I was surprised by how simple an allocator is

wavemode|8 months ago

On first seeing this I wasn't sure what analogy the author was trying to make. After reading the article my best guess is that they are simply trying to say that, writing an allocator is easier than it seems on the surface.

Though it's not clear to me that the article does a good job of establishing that this is actually true ("mimalloc is only a few thousand lines of code" doesn't pass the smell test).

majormajor|8 months ago

Yeah, re: the title, the URL/path string "allocators-are-for-monkeys-with-typewriters" (including on the page) seems more clear about that "less hard than you'd think" thing than the larger-font published headline "Allocators are Monkeys With Typewriters". And of course the quote in the article is even more specific "given enough time, even a monkey with a typewriter can write a memory allocator".

I generally agree with the "memory management doesn't have to be as complicated as you might think" vibe, especially if you've read about some optimizations in fancy modern GCs and aren't aware of what a basic simple non-GC world can look like. That said, of course, you can indeed get into a lot of complexity beyond the textbook 101 examples. Like the mentioned threading...

GuB-42|8 months ago

Writing a malloc/free replacement was a 1 day project we did at school in the first couple of weeks of learning C/UNIX. Many students didn't even know how to code before that.

All that to say that writing an allocator is not hard. Writing a good allocator however is another story. Not because the algorithms are complicated, but because it is a very critical part of most systems and requires extensive knowledge of real life situations.

WalterBright|8 months ago

Writing a basic allocator should be a foundational part of any programming curriculum. It removes the mystery of them.

Just like writing a lexer should be. It's amazing how many programming tasks are improved by writing a lexer for it, and the awful code I see from people who have no idea how to write one.

(For example, I wrote a lexer for function that read a date like "Mar 7, 1997" and turned it into a time_t. The lexer greatly simplified it, as strings like "Mar" and "March" became the same tokens. A number from 13 to 31 would lex as a day token. The number 3 would lex as a DayOrMonth token. And so on.)

worthless-trash|8 months ago

Writing an allocator that is harder to exploit is harder again.

romesmoke|8 months ago

I spent the most part of my PhD trying to convince people that allocators are not simple. But the not-simple part is design, specifically the strategy/policy part. In other words, minimizing fragmentation by adapting to linked program behavior. From what I know, no product-level allocator and very few research-level ones go beyond a "one size fits all" approach.

The reason why this is so is pragmatic: fancy strategies take extra compute time that is too expensive when one's sitting on the critical path. So what follows has a good chance of not bearing practical value but (i) there are setups where memory capacity is a bottleneck and (ii) hey, it's just a PhD anyway.

So what does a fancy strategy even look like? Let's look at the simplest non-trivial version of the problem. Say, we know the sizes and lifetimes of all objects in advance, and we want to pack them in as tight a contiguous address space as possible. Mathematicians call this the Dynamic Storage Allocation (DSA) problem and it's NP-complete, i.e., no known optimal algorithm exists for the general case. Deep learning compilers battling the AI memory wall have been returning to DSA lately (it's an old problem).

The implication of the above is that a real allocator, that is, one forced to work under uncertainty w.r.t. object sizes and lifetimes down the line, can always fail remarkably. So going for a general-purpose solution begs that you know what you're doing (one could argue that most programs out there are "well-behaved", and I would agree up until the point where this observation is turned into a universal law).

Nevertheless, it remains a fact that writing a toy allocator is both educational and soberingly not hard.

the__alchemist|8 months ago

I have an STM32 rust project I've been working on this week. It talks to an ESP using protobuf/RPC.

I'm doing it bare-metal/no allocator, as I do most embedded projects... and it's flirting with running me out of memory! What do most (and the most popular) protobuf libs do in rust? Use an allocator. What does the ESP itself do? Use an allocator (with FreeRTOS).

Meanwhile I'm using Heapless (Vec and String syntax with a statically-allocated array), on a MCU with 128K flash and 32K Ram... This won't end well.

charleslmunger|8 months ago

If you use upb, an allocator is optional - you can provide a presized block to upb_Arena and a NULL upb_alloc. Of course, you'll still fail to parse a message with an in memory representation larger than your region.

javier2|8 months ago

Think you need an allocator !

jasonthorsness|8 months ago

Applications should use more special-purpose memory allocators. Much of the complexity in memory management is designing for an unknown usage pattern and things can be quite simple when allocator is specialized and patterns are predictable.

This is difficult though in higher-level languages. Go tried and failed with arenas.

Snawoot|8 months ago

I think I found some solution for arenas in Go: https://pkg.go.dev/github.com/Snawoot/freelist

Has some shortcomings, but I think it should work.

Gibbon1|8 months ago

defer and a stack allocator would probably go well together.

Start of scope mark the allocators state. defer when it goes out of scope you release everything. Cheezy would be the release function takes a list of objects to keep.

AndyKelley|8 months ago

Decent thread-safe allocator in 200 lines: https://github.com/ziglang/zig/blob/0.14.1/lib/std/heap/SmpA...

charcircuit|8 months ago

>Seeing that mimalloc does not offer this feature

If you click on the issue you will see that mimalloc already does offer this feature via mi_manage_os_memory to create an arena, mi_heap_new_in_arena to create a heap that uses the arena, and mi_heap_malloc to allocate memory from the heap.

userbinator|8 months ago

Yes, but the article doesn't really show with code why it's simple.

A basic first-fit free-list type allocator can be less than 100 LoC - in x86 Asm. Using free space to store the overhead is a simple but effective optimisation too.

notepad0x90|8 months ago

allocators can be simple but how about memory management kernel subsystems?

One question I have for those who know the topic at a greater depth: Are there simpler memory managers that don't use a fixed page size? Or does that make it a lot more complex? imagine requesting allocation of 1 byte and the kernel allocating a page of 1 byte and immediately you request 20GB and you get a 20GB page. Other than memory-management metadata taking up more memory on its own, what are the downsides of dynamic page sizes?

o11c|8 months ago

Fragmentation is one major reason to use fixed size classes. After just a few allocate/free-a-bit cycles, you can very easily end up with a large amount of free space in unusably tiny chunks. (this is one reason why languages with a Garbage Generator tend to prefer being allowed to move and compact their objects - their whole thing is keeping the allocator as simple as possible to beat the benchmarks, so they can't use the usual solutions)

There's also the matter of space overhead.

A bitmap allocator is simple and has an overhead of 1 bit per chunk (allocated or not). Normally the chunk is at least 16 bytes, but if you know you're doing smaller allocations there's nothing stopping you from doing one at byte granularity.

If there exists an invalid value (for example, UTF-8 strings cannot contain the byte 0xff), you can implement allocators with zero overhead, though efficiency requires changing the API (`malloc` requires writing to all the memory to erase the tombstones, so it's more efficient for the allocator to support `strdup` and `realloc_strcat` directly).

By bucketing you can vary the overhead based on allocation size. But also remember, a lot of the trickery you can do here comes at the cost of becoming more vulnerable to use-after-free bugs.

Besides all the missing APIs, one major annoyance related to allocators is the fact that userland lacks support for CPU-local variables that the kernel can use for more efficient operations. `rseq` helps a little but still falls short.

duped|8 months ago

That's essentially what an allocator is, it's an abstraction between a requested size and alignment from the user space program and the memory management API that interfaces with the MMU hardware (which maps virtual addresses into physical addresses where the virtual address is a "page" number + offset). The MMU is what's dealing in pages and offsets more than the kernel.

nitrix|8 months ago

> "The C standard library provides malloc/free/resize"

It does not provide a `resize`. You're thinking of `realloc` on hosted environments.

ekianjo|8 months ago

Why change the title? That's also against HN policy

gnabgib|8 months ago

It had the original title - mods changed it. See the title complaints here: https://news.ycombinator.com/item?id=44350320

And replacement: https://news.ycombinator.com/item?id=44350333

tom_|8 months ago

Flag it if you're unhappy with the submission.

b0a04gl|8 months ago

but saying they're useless ignores a bunch of real systems that wouldn't run without them.

in unity, you literally can't do burst compiled jobs efficiently unless you choose the right allocator. they expose `Temp`, `Persistent`, etc because GC isn't even an option when you're fighting for milliseconds per frame. no allocator choice = frame skips = shipped bugs.

in embedded, FreeRTOS gives you multiple heap implementations for a reason. sometimes you need fixed size pools, sometimes you care about fragmentation. malloc's out of the question. same in any real time firmware, safety critical or not.

infra world has been using arena allocators for years. folly's `Arena`, grpc's arena, even jemalloc has thread caching and slab style region reuse built in. this isn't academic. large scale systems hit allocation pressure that general purpose allocators can't tune for globally.

and rust's whole alloc abstraction : it's more about expressing ownership across lifetimes in memory sensitive codebases. `Bumpalo` isn't a premature optimization. it's what you reach for when you know the object graph layout and want to free it in one call. also safe by design. it's not even painful to use.

imo allocator choice is an interface decision. it shapes how the rest of your code handles memory lifetimes. once you start passing memory lifetimes as part of your type system or job model, the whole architecture shifts. this is way deeper than faster malloc

reactordev|8 months ago

Allocator choice is a spice, not a staple. There’s various reasons to use one design vs another for your use case and no “one-size fits all”. Sometimes you want continuous memory. Sometimes you want sorted memory. Sometimes you want fast memory damned the internal fragmentation. Others, you want terse memory because you’re trying to fit a struct in 32 bytes. There’s a use case for it all.

dang|8 months ago

[stub for offtopicness]

tantalor|8 months ago

[flagged]

keeptrying|8 months ago

From the title I thought he meant VCs.

davydm|8 months ago

lol, I read this as "alligators are monkeys with typewriters" and thought it would be a well interesting article

but it's just more blah-blah about ai :/

46 comments