It does load it - mmap doesn't copy the file content into a buffer, it merely allows you to operate on a file as if it were in memory. Memory reads correspond to file read operations.
Sort of. mmap absolutely copies the file contents into the kernel file system cache which is a buffer, it just lets you map the filesystem cache into your address space so you can see it. And memory reads don't translate to file reads unless not in the cache already.
> mmap absolutely copies the file contents into the kernel file system cache which is a buffer
Isn't this a bit misleading? mmaping a file doesn't cause the kernel to start loading the whole thing into RAM, it just sets things up for the kernel to later transparently load pages of it on demand, possibly with some prefetching.
Well, it copies into the kernel buffer as you access it as a sort of demand paging that isn’t actually all that bad depending on what you’re doing. It’s dramatically different from a typical “read everything into a buffer” that most programs do.
General question: if mmap pulls in data as you ask it and not before, you're going to have CPU waits on the disk, followed by processing on the CPU but no disk activity, alternating back and forth. I'd assume that to be optimal is to have them both working at once, so to have some kind of readahead request for the disk. How is this done, if at all?
monocasa|5 years ago
wtallis|5 years ago
Isn't this a bit misleading? mmaping a file doesn't cause the kernel to start loading the whole thing into RAM, it just sets things up for the kernel to later transparently load pages of it on demand, possibly with some prefetching.
ivalm|5 years ago
saagarjha|5 years ago
throwaway_pdp09|5 years ago
Edit: just seen this which kind of touches on the same https://news.ycombinator.com/item?id=24737186
saagarjha|5 years ago
bitcharmer|5 years ago
There is literally no io being done on your data access paths. Synchronising mapped pages with file contents happens in background write back threads.