Line marked KABOOM copies $DEST_BYTE_COUNT, rather than $SOURCE_BYTE_COUNT.
Say you want to realloc a 1 byte buffer to a 4 byte buffer - you just copied 4 bytes from a 1 byte buffer which means you're reading 3 bytes from 0xDEADBEEF/0xBADF000D/segfault land.
EDIT: Also, this is why the ENTIRE PREMISE of implementing your own reallocator speced to just the realloc prototype doesn't make much sense. You simply don't know the size of the original data with just a C heap pointer as this is not standardized AFAIK.
"Also, this is why the ENTIRE PREMISE of implementing your own reallocator speced to just the realloc prototype doesn't make much sense."
If you're reimplementing realloc() it's pretty easy to know the size of the allocated regions - you just need to store the size somewhere when you allocate a block. One common method is to allocate N extra bytes of memory whenever you do malloc() to hold the block header and return a pointer to (block_address + N) to the user. When you then want to realloc() a block, just look in the block header (N bytes before the user's pointer) for the size.
The block header can store other useful stuff, like debugging information. I once implemented a memory manager for debugging that could generate a list of all leaked blocks at the end of the program with the file names and line numbers where they were allocated.
In the context of the code presented all mallocs were returned by sbrk. Even if the previous malloc had been "allocated" with size zero, the sbrk called by this function's invocation of malloc will ensure that there are enough bytes of addressable memory to ensure that, although it will be copying data out of earlier allocations, it will not segfault. realloc makes no guarantees of zero-initializing a newly allocated extent.
I have also found people often unestimate realloc (but have never done the same level of investigation to find out just how clever it is!)
On several occasions I have wanted to use mmap, mremap, and friends more often to do fancy things like copy-on-write memory. However, I always find this whole area depressingly poorly documented, and hard to do (because if you mess up a copy-on-write call, it just turns into a copy it seems, with the same result but less performance).
While it's good realloc is clever, I find it increasingly embarassing how badly C (and even worse, C++ which doesn't even really have realloc (as most C++ types can't be bitwise moved) memory allocation maps to what operating systems efficiently support.
C++ really wants a realloc variant that extends an allocation if it can be extended without a copy, and leaves the allocation unchanged if it can't. The annoying thing is that there's no good reason why this can't exist beyond that the STL allocator interface happens not to have it.
Serious question from a guy made soft by garbage collection: how frequent is memory allocation failure nowadays, with large memories and virtual memory? Were I to guess from my state of ignorance I'd think that if allocs began to fail, there was no recovery anyhow... so leaking in this case would be one leak right before a forced quit.
Wrong? Are there lots of ways allocation can fail besides low memory conditions?
It depends on your recovery strategy. If your response to a `realloc()` failure is going to be to terminate the program, then you might as well assign the result back to the original pointer. If you're going to continue executing without allocating the extra memory, then yes, you need to assign the result to a separate pointer.
The realloc implementation in this blog is incorrect: the passed in pointer must not be freed if realloc is called with a non-zero length and returns NULL. This will cause a double free in correct callers.
As someone else pointed out, the example call of realloc is also incorrect.
edit: also, malloc is incorrect for three reasons: 1) sbrk doesn't return NULL on failure, 2) a large size_t length will cause a contraction in the heap segment rather than an allocation, and 3) sbrk doesn't return a pointer aligned in any particular way, whereas malloc must return a pointer suitably aligned for all types.
Yes and no. The real error here would be to realloc without any geometric progression IMHO - ie. reallocating one more byte each time, which would behave well on Linux (except the libc call cost of course) but not on other implementations (such as some Microsoft's MSVCRT versions). Assuming realloc has no catastrophic performance impact is not something too daring.
This is really neat. Somehow I always assumed realloc() copied stuff instead of using the page table.
But say you have 4K page table size. You malloc() in turn a 2K object, a 256K object, and another 2K object, ending up with 2K, 256K, 2K in memory. Then your 256K is not aligned on a page boundary. If you realloc() the 256K it has to move since it's surrounded by two objects. When you do that, you'll wind up with the two pages on the end being mapped to multiple addresses. Which is actually just fine...Interesting...
The libc memory allocator does not simply hand out memory contiguously. In your example, the 256K block will end up being 4K aligned.
In fact, that's what the article already explains: the large alloc will just end up being passed through to the kernel, which only deals at page granularity.
Agreed. Not sure why people are downvoting you. It blows my mind every time I think, "I wonder how <some program> works?" and I'm able to just "apt-get source some_program" and check it out.
Working with Linux and having the source (and the ability to change it) for entire stack all the way down to and including the kernel is liberating. As a programmer it feels like the entire world is open to me.
I guess that's GNU's dream brought to life, really.
[+] [-] optimiz3|12 years ago|reply
Say you want to realloc a 1 byte buffer to a 4 byte buffer - you just copied 4 bytes from a 1 byte buffer which means you're reading 3 bytes from 0xDEADBEEF/0xBADF000D/segfault land.
EDIT: Also, this is why the ENTIRE PREMISE of implementing your own reallocator speced to just the realloc prototype doesn't make much sense. You simply don't know the size of the original data with just a C heap pointer as this is not standardized AFAIK.
[+] [-] greenyoda|12 years ago|reply
If you're reimplementing realloc() it's pretty easy to know the size of the allocated regions - you just need to store the size somewhere when you allocate a block. One common method is to allocate N extra bytes of memory whenever you do malloc() to hold the block header and return a pointer to (block_address + N) to the user. When you then want to realloc() a block, just look in the block header (N bytes before the user's pointer) for the size.
The block header can store other useful stuff, like debugging information. I once implemented a memory manager for debugging that could generate a list of all leaked blocks at the end of the program with the file names and line numbers where they were allocated.
[+] [-] xroche|12 years ago|reply
[+] [-] mgraczyk|12 years ago|reply
[+] [-] prutschman|12 years ago|reply
[+] [-] CJefferson|12 years ago|reply
On several occasions I have wanted to use mmap, mremap, and friends more often to do fancy things like copy-on-write memory. However, I always find this whole area depressingly poorly documented, and hard to do (because if you mess up a copy-on-write call, it just turns into a copy it seems, with the same result but less performance).
While it's good realloc is clever, I find it increasingly embarassing how badly C (and even worse, C++ which doesn't even really have realloc (as most C++ types can't be bitwise moved) memory allocation maps to what operating systems efficiently support.
[+] [-] plorkyeran|12 years ago|reply
[+] [-] asveikau|12 years ago|reply
[+] [-] rfrey|12 years ago|reply
Wrong? Are there lots of ways allocation can fail besides low memory conditions?
[+] [-] _kst_|12 years ago|reply
[+] [-] ctz|12 years ago|reply
As someone else pointed out, the example call of realloc is also incorrect.
edit: also, malloc is incorrect for three reasons: 1) sbrk doesn't return NULL on failure, 2) a large size_t length will cause a contraction in the heap segment rather than an allocation, and 3) sbrk doesn't return a pointer aligned in any particular way, whereas malloc must return a pointer suitably aligned for all types.
[+] [-] xroche|12 years ago|reply
[+] [-] kabdib|12 years ago|reply
This is implementation coupling at its worst. Don't do it.
[+] [-] xroche|12 years ago|reply
[+] [-] picomancer|12 years ago|reply
But say you have 4K page table size. You malloc() in turn a 2K object, a 256K object, and another 2K object, ending up with 2K, 256K, 2K in memory. Then your 256K is not aligned on a page boundary. If you realloc() the 256K it has to move since it's surrounded by two objects. When you do that, you'll wind up with the two pages on the end being mapped to multiple addresses. Which is actually just fine...Interesting...
[+] [-] nhaehnle|12 years ago|reply
In fact, that's what the article already explains: the large alloc will just end up being passed through to the kernel, which only deals at page granularity.
[+] [-] crackerz|12 years ago|reply
[+] [-] __david__|12 years ago|reply
Working with Linux and having the source (and the ability to change it) for entire stack all the way down to and including the kernel is liberating. As a programmer it feels like the entire world is open to me.
I guess that's GNU's dream brought to life, really.
[+] [-] mjcohen|12 years ago|reply
[+] [-] greatsuccess|12 years ago|reply
[deleted]