The closest thing to "conditionally returned to the kernel" is if the page had been given to madvise(MADV_FREE), but that would still not have the behavior they're talking about. Reading and writing would still produce the same content, either the original page content because the kernel hasn't released the page yet, or zero because the kernel has already released the page. Even if the order of operations is read -> kernel frees -> write, then that still doesn't match their story, because the read will produce the original page content, not zero.
That said, the code they're talking about is different from yours in that their code is specifically doing an out-of-bounds read. (They said "If you happen to allocate a string that's 128 bytes, and malloc happens to return an address to you that's 128 bytes away from the end of the page, you'll write the 128 bytes and the null terminator will be the first byte on the next page. So they're very clearly talking about the \0 being outside the allocation.)
So it is absolutely possible to have this setup: the string's allocation happens to be followed by a different allocation that is currently 0 -> the `data[size()] != '\0'` check is performed and succeeds -> `data` is returned to the caller -> whoever owns that following allocation writes a non-zero value to the first byte -> whoever called `c_str()` will now run off the end of the 128B string. This doesn't have anything to do with pages; it can happen within the bounds of a single page. It is also such an obvious out-of-bounds bug that it boggles my mind that it passed any sort of code review and required some sort of graybeard to point out.
Arnavion|9 months ago
The closest thing to "conditionally returned to the kernel" is if the page had been given to madvise(MADV_FREE), but that would still not have the behavior they're talking about. Reading and writing would still produce the same content, either the original page content because the kernel hasn't released the page yet, or zero because the kernel has already released the page. Even if the order of operations is read -> kernel frees -> write, then that still doesn't match their story, because the read will produce the original page content, not zero.
That said, the code they're talking about is different from yours in that their code is specifically doing an out-of-bounds read. (They said "If you happen to allocate a string that's 128 bytes, and malloc happens to return an address to you that's 128 bytes away from the end of the page, you'll write the 128 bytes and the null terminator will be the first byte on the next page. So they're very clearly talking about the \0 being outside the allocation.)
So it is absolutely possible to have this setup: the string's allocation happens to be followed by a different allocation that is currently 0 -> the `data[size()] != '\0'` check is performed and succeeds -> `data` is returned to the caller -> whoever owns that following allocation writes a non-zero value to the first byte -> whoever called `c_str()` will now run off the end of the 128B string. This doesn't have anything to do with pages; it can happen within the bounds of a single page. It is also such an obvious out-of-bounds bug that it boggles my mind that it passed any sort of code review and required some sort of graybeard to point out.
sqrt_1|9 months ago
He explicitly states 128byte filename allocates 129 bytes. https://www.youtube.com/watch?v=kPR8h4-qZdk&t=1417s