I understand the decision to archive the upstream repo; as of when I left Meta, we (i.e. the Jemalloc team) weren’t really in a great place to respond to all the random GitHub issues people would file (my favorite was the time someone filed an issue because our test suite didn’t pass on Itanium lol). Still, it makes me sad to see. Jemalloc is still IMO the best-performing general-purpose malloc implementation that’s easily usable; TCMalloc is great, but is an absolute nightmare to use if you’re not using bazel (this has become slightly less true now that bazel 7.4.0 added cc_static_library so at least you can somewhat easily export a static library, but broadly speaking the point still stands).
I’ve been meaning to ask Qi if he’d be open to cutting a final 6.0 release on the repo before re-archiving.
At the same time it’d be nice to modernize the default settings for the final release. Disabling the (somewhat confusingly backwardly-named) “cache oblivious” setting by default so that the 16 KiB size-class isn’t bloated to 20 KiB would be a major improvement. This isn’t to disparage your (i.e. Jason’s) original choice here; IIRC when I last talked to Qi and David about this they made the point that at the time you chose this default, typical TLB associativity was much lower than it is now. On a similar note, increasing the default “page size” from 4 KiB to something larger (probably 16 KiB), which would correspondingly increase the large size-class cutoff (i.e. the point at which the allocator switches from placing multiple allocations onto a slab, to backing individual allocations with their own extent directly) from 16 KiB up to 64 KiB would be pretty impactful. One of the last things I looked at before leaving Meta was making this change internally for major services, as it was worth a several percent CPU improvement (at the cost of a minor increase in RAM usage due to increased fragmentation). There’s a few other things I’d tweak (e.g. switching the default setting of metadata_thp from “disabled” to “auto”, changing the extent-sizing for slabs from using the nearest exact multiple of the page size that fits the size-class to instead allowing ~1% guaranteed wasted space in exchange for reducing fragmentation), but the aforementioned settings are the biggest ones.
I would love to see these changes - or even some sort of blog post or extended documentation explaining rational. As is the docs are somewhat barren. I feel that there’s a lot of knowledge that folks like you have right now from all of the work that was done internally at Meta that would be best shared now before it is lost.
> we (i.e. the Jemalloc team) weren’t really in a great place to respond to all the random GitHub issues people would file
Why not? I mean this is complete drive-by comment, so please correct me, but there was a fully staffed team at Meta that maintained it, but was not in the best place to manage the issues?
Jason, here is a story about how much your work impacts us.
We run a decently sized company that processes hundreds of millions of images/videos per day. When we first started about 5 years ago, we spent countless hours debugging issues related to memory fragmentation.
One fine day, we discovered Jemalloc and put it in our service, which was causing a lot of memory fragmentation. We did not think that those 2 lines of changes in Dockerfile were going to fix all of our woes, but we were pleasantly surprised. Every single issue went away.
Today, our multi-million dollar revenue company is using your memory allocator on every single service and on every single Dockerfile.
Interesting that one of the factor listed in there, the hardcoded page-size on arm64, is still is an unsolved issue upstream, and that forces app developers to either ship multiple arm64 linux binaries, or drop support for some platforms.
I wonder if some kind of dynamic page-size (with dynamic ftrace-style binary patching for performance?) would have been that much slower.
I've used jemalloc in every game engine I've written for years. It's just the thing to do. WAY faster on win32 than the default allocator. It's also nice to have the same allocator across all platforms.
I learned of it from it's integration in FreeBSD and never looked back.
As of when I left Meta nearly two years ago (although I would be absolutely shocked if this isn’t still the case) Jemalloc is the allocator, and is statically linked into every single binary running at the company.
> Or I wonder if they could simply use tcmalloc or another allocator these days?
Jemalloc is very deeply integrated there, so this is a lot harder than it sounds. From the telemetry being plumbed through in Strobelight, to applications using every highly Jemalloc-specific extension under the sun (e.g. manually created arenas with custom extent hooks), to the convergent evolution of applications being written in ways such that they perform optimally with respect to Jemalloc’s exact behavior.
The big recent change is that jemalloc no longer has any of its previous long-term maintainers. But it is receiving more attention from Facebook than it has in a long time, and I am somewhat optimistic that after some recent drama where some of that attention was aimed in a counterproductive direction that the company can aim the rest of it in directions that Qi and Jason would agree with, and that are well aligned with the needs of external users.
Suppose this is as good a place to pile-on as any.
Though this was not the post I was expecting to show up today, it was super awesome for me to get to have played my tiny part in this big journey. Thanks for everything @je (and qi + david -- and all the contributors before and after my time!).
Your leadership on continuing investing in core technologies in Facebook were as fruitful as it could ever being. GraphQL, PyTorch, React to name a few cannot happen without.
I’ve wondered about this before but never when around people who might know. From my outsider view, jemalloc looked like a strict improvement over glibc’s malloc, according to all the benchmarks I’d seen when the subject came up. So, why isn’t it the default allocator?
It is on FreeBSD. :P Change your malloc, change your life? May as well change your libc while you're there and use FreeBSD libc too, and that'll be easier if you also adopt the FreeBSD kernel.
I will say, the Facebook people were very excited to share jemalloc with us when they acquired my employer, but we were using FreeBSD so we already had it and thought it was normal. :)
Disclaimer: I'm not an allocator engineer, this is just an anecdote.
A while back, I had a conversation with an engineer who maintained an OS allocator, and their claim was that custom allocators tend to make one process's memory allocation faster at the expense of the rest of the system. System allocators are less able to make allocation fair holistically, because one process isn't following the same patterns as the rest.
Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.
These allocators often have higher startup cost. They are designed for high performance in the steady state, but they can be worse in workloads that start a million short-lived processes in the unix style.
For a long time, one of the major problems with alternate allocators is that they would never return free memory back to the OS, just keep the dirty pages in the process. This did eventually change, but it remains a strong indicator of different priorities.
There's also the fact that ... a lot of processes only ever have a single thread, or at most have a few background threads that do very little of interest. So all these "multi-threading-first allocators" aren't actually buying anything of value, and they do have a lot of overhead.
Semi-related: one thing that most people never think about: it is exactly the same amount of work for the kernel to zero a page of memory (in preparation for a future mmap) as for a userland process to zero it out (for its own internal reuse)
As far as I know there is no technical reason why jemalloc shouldn't be the default allocator. In fact, as pointed out in the article, it IS the default allocator on FreeBSD. My understanding is it is largely political.
I believe there’s no other allocator besides jemalloc that can seamlessly override macOS malloc/free like people do with LD_PRELOAD on Linux (at least as of ~2020). jemalloc has a very nice zone-based way of making itself the default, and manages to accommodate Apple’s odd requirements for an allocator that have tripped other third-party allocators up when trying to override malloc/free.
> And people find themselves in impossible situations where the main choices are 1) make poor decisions under extreme pressure, 2) comply under extreme pressure, or 3) get routed around.
Oh that's interesting. jemalloc is the memory allocator used by redis, among other projects. Wonder what the performance impact will be if they have to change allocators.
Back in 2008-2009 I remember the Varnish project struggled with what looked very much like a memory leak. Because of the somewhat complex way memory was used, replacing the Glibc malloc with jemalloc was an immediate improvement and removed the leak-like behavior.
The article mentioned the influence of large-scale profiling on both jemalloc and tcmalloc, but doesn't mention mimalloc. I consider mimalloc to be on par with these others, and now I am wondering whether Microsoft also used large scale profiling to develop theirs, or if they just did it by dead reckoning.
All the allocators have the same issue. They largely work against a shared set of allocation APIs. Many of their users mostly engage via malloc and free.
So the flow is like this: user has an allocation looking issue. Picks up $allocator. If they have an $allocator type problem then they keep using it, otherwise they use something else.
There are tons of users if these allocators but many rarely engage with the developers. Many wouldn’t even notice improvements or regressions on upgrades because after the initial choice they stop looking.
I’m not sure how to fix that, but this is not healthy for such projects.
That’s because sane allocators that aren’t glibc will return unused memory periodically to the OS while glibc prefers to permanently retain said memory.
Looking at all the comments and lightly browsing the source code, I'm amazed. Both at how much impact a memory allocator can make, but also how much code is involved.
I'm not really sure what I expected, but somehow I expect a memory allocator to be ... smaller, simpler perhaps?
You can write a naive mark-and-sweep in an afternoon. You can write a reference counter in even less time. And for some runtimes this is fine.
But writing a generational, concurrent, moving GC takes a lot of time. But if you can achieve it, you can get amazing performance gains. Just look at recent versions of Java.
You can write a simple size-class allocator (even lock-free) in just a couple dozen lines of code. (I've done it both for interviews and for a work presentation.) But an allocator that is fast, scalable, and performs well over diverse workloads--that is HARD.
mimalloc is cleaner but lacks the very useful profiling features. To be fair it also has not gone through decades of changes as described in the postmortem either.
I was using FreeBSD back when jemalloc came along, and it blew my mind to imagine swapping out just that one (major) part of its libc. Honestly, it hadn't occured to me, and made me wonder what else we could wholesale replace.
Thank you. Jemalloc was recently recommended to me on some presentation about Java optimization.
I wonder if you did get everything you should from the companies that use it. I mean sometimes I feel that big tech firms only use free software, never giving anything to it, so I hope you were the exception here.
I think the author was happy to be employed by a megacorp, along with a team to push jemalloc forward.
He and the other previous contributors are free to find new employers to continue such an arrangement, if any are willing to make that investment. Alternatively they could cobble together funding from a variety of smaller vendors. I think the author is happy to move on to other projects, after spending a long time in this problem space.
I don’t think that “don’t let one megacorp hire a team of contributors for your FOSS project” is the lesson here. I’d say it’s a lesson in working upstream - the contributions made during their Facebook / Meta investment are available for the community to build upon. They could’ve just as easily been made in a closed source fork inside Facebook, without violating the terms of the license.
Also Mozilla were unable to switch from their fork to the upstream version, and didn’t easily benefit from the Facebook / Meta investment as a result.
He worked for like a decade at Facebook it looks like. I would guess at least at a Staff level. How many millions of dollars do you think he got from that? It doesnt sound like the worse trade in the world.
I very recently used jemalloc to resolve a memory fragmentation issue that caused a service to OOM every few days. While jemalloc as it is will continue to work, same as it does today, I wonder what allocator I should reach for in the future. Does anyone have any experiences to share regarding tcmalloc or other allocators that aim to perform better than stock glibc?
Try mimalloc. I have prototyped a feature on top of mimalloc and while effort was a dead end, the code (this was around 2020) was nicely written and well maintained and it was fun to hack on it. When I swapped jemalloc in our system with mimalloc, it was on par if not better when it comes to fragmentation growth control and heap usage perspective.
Kind of nuts that he worked on Jemalloc for over a decade while having personal preference for garbage collection. I'm surprised he doesn't have more regret.
Why are those two mutually exclusive? I'd think that a high performance allocator would be especially crucial in the implementation of a fast garbage collected language. For example, in Python you can't alloc(n * sizeof(obj)) to reserve that much contiguous space for n objects. Instead, you use the builtins which isolate you from that low-level bookkeeping. Those builtins have to be pretty fast or performance would be terrible.
The maintainers are probably all making personally reasonable choices that we should support.
But it’s still sad that there’s probably no world where someone will still focus on jemalloc with professional support from their employer. It means that an important piece of technology will not continue improving.
Forking is possible, but it doesn’t look like the kind of project that many people could fork and improve, it requires a lot of focus by people with specific domain knowledge.
I don't understand why you don't understand that you can be sad about this.
Parent stated that he's sad the project is no longer maintained. That's a perfectly reasonable and human response. Parent does not have to defend having an emotion, even less provide objective truth for why he feels sad.
If you don't agree, fine. But I don't see why one would write a paragraph long statement, I validating someone's emotional response.
Well, that's not the only meaning of "postmortem". The fine article does open with,
"The jemalloc memory allocator was first conceived in early 2004, and has been in public use for about 20 years now. Thanks to the nature of open source software licensing, jemalloc will remain publicly available indefinitely. But active upstream development has come to an end. This post briefly describes jemalloc’s development phases, each with some success/failure highlights, followed by some retrospective commentary."
postmortem is looking back after an event. That can be a security event/outage, it can also be the completion of a project (see: game studios often do postmortems once their game is out to look back on what went wrong and right between preproduction, production, and post launch).
The last part is unfortunate. However, it is a perfectly fine choice of title, as it does not make the majority of us think that there were an outage caused by jemalloc. You should update how you think of the word, and align it with the majority usage
Svetlitski|8 months ago
I’ve been meaning to ask Qi if he’d be open to cutting a final 6.0 release on the repo before re-archiving.
At the same time it’d be nice to modernize the default settings for the final release. Disabling the (somewhat confusingly backwardly-named) “cache oblivious” setting by default so that the 16 KiB size-class isn’t bloated to 20 KiB would be a major improvement. This isn’t to disparage your (i.e. Jason’s) original choice here; IIRC when I last talked to Qi and David about this they made the point that at the time you chose this default, typical TLB associativity was much lower than it is now. On a similar note, increasing the default “page size” from 4 KiB to something larger (probably 16 KiB), which would correspondingly increase the large size-class cutoff (i.e. the point at which the allocator switches from placing multiple allocations onto a slab, to backing individual allocations with their own extent directly) from 16 KiB up to 64 KiB would be pretty impactful. One of the last things I looked at before leaving Meta was making this change internally for major services, as it was worth a several percent CPU improvement (at the cost of a minor increase in RAM usage due to increased fragmentation). There’s a few other things I’d tweak (e.g. switching the default setting of metadata_thp from “disabled” to “auto”, changing the extent-sizing for slabs from using the nearest exact multiple of the page size that fits the size-class to instead allowing ~1% guaranteed wasted space in exchange for reducing fragmentation), but the aforementioned settings are the biggest ones.
matoro|8 months ago
kstrauser|8 months ago
What's hard about using TCMalloc if you're not using bazel? (Not asking to imply that it's not, but because I'm genuinely curious.)
gazpacho|8 months ago
EnPissant|8 months ago
michaelcampbell|8 months ago
For the non low-level programmers in the bowels of memory allocators among us, why is this a "lol"?
klabb3|8 months ago
Why not? I mean this is complete drive-by comment, so please correct me, but there was a fully staffed team at Meta that maintained it, but was not in the best place to manage the issues?
einpoklum|8 months ago
custom-malloc-newbie question: Why is the choice of build system (generator) significant when evaluating the usability of a library?
Thaxll|8 months ago
adityapatadia|8 months ago
One fine day, we discovered Jemalloc and put it in our service, which was causing a lot of memory fragmentation. We did not think that those 2 lines of changes in Dockerfile were going to fix all of our woes, but we were pleasantly surprised. Every single issue went away.
Today, our multi-million dollar revenue company is using your memory allocator on every single service and on every single Dockerfile.
Thank you! From the bottom of our hearts!
thewisenerd|8 months ago
the top 3 from https://github.com/topics/resize-images (as of 2025-06-13)
imaginary: https://github.com/h2non/imaginary/blob/1d4e251cfcd58ea66f83...
imgproxy: https://web.archive.org/web/20210412004544/https://docs.imgp... (linked from a discussion in the imaginary repo)
imagor: https://github.com/cshum/imagor/blob/f6673fa6656ee8ef17728f2...
laszlojamf|8 months ago
masklinn|8 months ago
FWIW while it was a factor it was just one of a number: https://github.com/rust-lang/rust/issues/36963#issuecomment-...
And jemalloc was only removed two years after that issue was opened: https://github.com/rust-lang/rust/pull/55238
Aissen|8 months ago
I wonder if some kind of dynamic page-size (with dynamic ftrace-style binary patching for performance?) would have been that much slower.
dazzawazza|8 months ago
I learned of it from it's integration in FreeBSD and never looked back.
jemalloc has help entertained a lot of people :)
Iwan-Zotow|8 months ago
windows def allocator is pos. Jemalloc rules
chubot|8 months ago
Or I wonder if they could simply use tcmalloc or another allocator these days?
Facebook infrastructure engineering reduced investment in core technology, instead emphasizing return on investment.
Svetlitski|8 months ago
> Or I wonder if they could simply use tcmalloc or another allocator these days?
Jemalloc is very deeply integrated there, so this is a lot harder than it sounds. From the telemetry being plumbed through in Strobelight, to applications using every highly Jemalloc-specific extension under the sun (e.g. manually created arenas with custom extent hooks), to the convergent evolution of applications being written in ways such that they perform optimally with respect to Jemalloc’s exact behavior.
charcircuit|8 months ago
https://github.com/facebook/jemalloc
anonymoushn|8 months ago
schrep|8 months ago
lbrandy|8 months ago
Though this was not the post I was expecting to show up today, it was super awesome for me to get to have played my tiny part in this big journey. Thanks for everything @je (and qi + david -- and all the contributors before and after my time!).
liuliu|8 months ago
kstrauser|8 months ago
toast0|8 months ago
I will say, the Facebook people were very excited to share jemalloc with us when they acquired my employer, but we were using FreeBSD so we already had it and thought it was normal. :)
favorited|8 months ago
A while back, I had a conversation with an engineer who maintained an OS allocator, and their claim was that custom allocators tend to make one process's memory allocation faster at the expense of the rest of the system. System allocators are less able to make allocation fair holistically, because one process isn't following the same patterns as the rest.
Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.
jeffbee|8 months ago
o11c|8 months ago
There's also the fact that ... a lot of processes only ever have a single thread, or at most have a few background threads that do very little of interest. So all these "multi-threading-first allocators" aren't actually buying anything of value, and they do have a lot of overhead.
Semi-related: one thing that most people never think about: it is exactly the same amount of work for the kernel to zero a page of memory (in preparation for a future mmap) as for a userland process to zero it out (for its own internal reuse)
sanxiyn|8 months ago
b0a04gl|8 months ago
[deleted]
meisel|8 months ago
glandium|8 months ago
adgjlsfhk1|8 months ago
wiz21c|8 months ago
> And people find themselves in impossible situations where the main choices are 1) make poor decisions under extreme pressure, 2) comply under extreme pressure, or 3) get routed around.
It doesn't sound like a work place :-(
bravetraveler|8 months ago
Twirrim|8 months ago
dpe82|8 months ago
perbu|8 months ago
technion|8 months ago
swinglock|8 months ago
spookie|8 months ago
jeffbee|8 months ago
bch|8 months ago
the_mitsuhiko|8 months ago
So the flow is like this: user has an allocation looking issue. Picks up $allocator. If they have an $allocator type problem then they keep using it, otherwise they use something else.
There are tons of users if these allocators but many rarely engage with the developers. Many wouldn’t even notice improvements or regressions on upgrades because after the initial choice they stop looking.
I’m not sure how to fix that, but this is not healthy for such projects.
Cloudef|8 months ago
mavis|8 months ago
vlovich123|8 months ago
didip|8 months ago
jemalloc is always the first thing I installed whenever I had to provision bare servers.
If jemalloc is somehow the default allocator in Linux, I think it will not have a hard time retaining contributors.
mrweasel|8 months ago
I'm not really sure what I expected, but somehow I expect a memory allocator to be ... smaller, simpler perhaps?
ratorx|8 months ago
However it is typically always more complex to make production quality software, especially in a performance sensitive domain.
const_cast|8 months ago
You can write a naive mark-and-sweep in an afternoon. You can write a reference counter in even less time. And for some runtimes this is fine.
But writing a generational, concurrent, moving GC takes a lot of time. But if you can achieve it, you can get amazing performance gains. Just look at recent versions of Java.
senderista|8 months ago
swinglock|8 months ago
gdiamos|8 months ago
kstrauser|8 months ago
soulbadguy|8 months ago
p0w3n3d|8 months ago
I wonder if you did get everything you should from the companies that use it. I mean sometimes I feel that big tech firms only use free software, never giving anything to it, so I hope you were the exception here.
jeffbee|8 months ago
burnt-resistor|8 months ago
igrunert|8 months ago
He and the other previous contributors are free to find new employers to continue such an arrangement, if any are willing to make that investment. Alternatively they could cobble together funding from a variety of smaller vendors. I think the author is happy to move on to other projects, after spending a long time in this problem space.
I don’t think that “don’t let one megacorp hire a team of contributors for your FOSS project” is the lesson here. I’d say it’s a lesson in working upstream - the contributions made during their Facebook / Meta investment are available for the community to build upon. They could’ve just as easily been made in a closed source fork inside Facebook, without violating the terms of the license.
Also Mozilla were unable to switch from their fork to the upstream version, and didn’t easily benefit from the Facebook / Meta investment as a result.
ecshafer|8 months ago
nevon|8 months ago
beyonddream|8 months ago
sanxiyn|8 months ago
unknown|8 months ago
[deleted]
kev009|8 months ago
poorman|8 months ago
ecshafer|8 months ago
dikei|8 months ago
Good times.
unknown|8 months ago
[deleted]
brcmthrowaway|8 months ago
forty|8 months ago
half-kh-hacker|8 months ago
skeptrune|8 months ago
kstrauser|8 months ago
fermentation|8 months ago
unknown|8 months ago
[deleted]
b0a04gl|8 months ago
[deleted]
curtisszmania|8 months ago
[deleted]
KingLancelot|8 months ago
[deleted]
zozbot234|8 months ago
[deleted]
hyperpape|8 months ago
But it’s still sad that there’s probably no world where someone will still focus on jemalloc with professional support from their employer. It means that an important piece of technology will not continue improving.
Forking is possible, but it doesn’t look like the kind of project that many people could fork and improve, it requires a lot of focus by people with specific domain knowledge.
7bit|8 months ago
Parent stated that he's sad the project is no longer maintained. That's a perfectly reasonable and human response. Parent does not have to defend having an emotion, even less provide objective truth for why he feels sad.
If you don't agree, fine. But I don't see why one would write a paragraph long statement, I validating someone's emotional response.
userbinator|8 months ago
stingraycharles|8 months ago
chrisweekly|8 months ago
"The jemalloc memory allocator was first conceived in early 2004, and has been in public use for about 20 years now. Thanks to the nature of open source software licensing, jemalloc will remain publicly available indefinitely. But active upstream development has come to an end. This post briefly describes jemalloc’s development phases, each with some success/failure highlights, followed by some retrospective commentary."
runevault|8 months ago
bmacho|8 months ago
unknown|8 months ago
[deleted]
Omarbev|8 months ago