(no title)
8dcc | 1 year ago
- Added a link to this HN post.
- Renamed some variables and types in the code for readability.
- Mention (in a footnote) that we could allocate the 'Pool' structure and the chunk arrays with a single call, or even return the 'Pool' on the stack. Suggested by 'liontwist' and 'unwind'.
- Mention (in a footnote) that we don't always need to store the full address when building the linked list of free chunks. Suggested by 'cb321'.
- Added valgrind support (also to my 'libpool' project). Suggested by 'kazinator'.
cb321|1 year ago
This can really add up. For example, like 20 years ago, I did a `print sizeof` in gdb for some g++ STL map node with a data payload of like 4 bytes and it was some obscene size approximately >= 1 order of magnitude bigger than necessary (like 80 bytes instead of, say 8 with 2Byte ptrs). While main memory is cheap, CPU cache memory is not and for "linky" structures, it often pays to be more cache-friendly. It's pretty easy to imagine a 10X smaller thing fitting into an L2 CPU cache where the larger thing would not even fit in L3, netting you a big latency boost (though the more root-ward levels of the tree likely always stay in caches in a hot loop).
EDIT: Anyway, you don't have to believe me. Others did an x32 ABI for x86_64 mostly to have "merely 2X" smaller pointers: https://en.wikipedia.org/wiki/X32_ABI
[1] https://news.ycombinator.com/item?id=42643410
tandr|1 year ago
8dcc|1 year ago
This is my first time using valgrind, however, so I might just be confused about what you mean, and there is a simpler way of doing all this.
veltas|1 year ago