top | item 24737119

(no title)

Gladdyu | 5 years ago

It does load it - mmap doesn't copy the file content into a buffer, it merely allows you to operate on a file as if it were in memory. Memory reads correspond to file read operations.

discuss

order

monocasa|5 years ago

Sort of. mmap absolutely copies the file contents into the kernel file system cache which is a buffer, it just lets you map the filesystem cache into your address space so you can see it. And memory reads don't translate to file reads unless not in the cache already.

wtallis|5 years ago

> mmap absolutely copies the file contents into the kernel file system cache which is a buffer

Isn't this a bit misleading? mmaping a file doesn't cause the kernel to start loading the whole thing into RAM, it just sets things up for the kernel to later transparently load pages of it on demand, possibly with some prefetching.

ivalm|5 years ago

But I think the important part is that the file starts on disk and ends parsed. The rate of that was NVME limited (per article).

saagarjha|5 years ago

Well, it copies into the kernel buffer as you access it as a sort of demand paging that isn’t actually all that bad depending on what you’re doing. It’s dramatically different from a typical “read everything into a buffer” that most programs do.

throwaway_pdp09|5 years ago

General question: if mmap pulls in data as you ask it and not before, you're going to have CPU waits on the disk, followed by processing on the CPU but no disk activity, alternating back and forth. I'd assume that to be optimal is to have them both working at once, so to have some kind of readahead request for the disk. How is this done, if at all?

Edit: just seen this which kind of touches on the same https://news.ycombinator.com/item?id=24737186

saagarjha|5 years ago

Generally the OS should see if you’re doing a long sequential access and prefetch this data before you access it.

bitcharmer|5 years ago

Not sure if you know how mmap works, but regardless you can't say that memory reads correspond to file reads.

There is literally no io being done on your data access paths. Synchronising mapped pages with file contents happens in background write back threads.