top | item 24843790

(no title)

Yes, the overhead is per-syscall. The number of read() syscalls shrinks as the per-call buffer grows.

With mmap() of course we only ever have one syscall to create the initial mapping. Everything else is a memory read.

We can get read() down to just one syscall too, with a 4G buffer ;) I can't recall if GB sized buffers are possible, but I have certainly used MB sized read buffers for exactly this reason.

discuss

loeg|5 years ago

       On  Linux,  read()  (and  similar system calls) will transfer at most 0x7ffff000
       (2,147,479,552) bytes, returning  the  number  of  bytes  actually  transferred.
       (This is true on both 32-bit and 64-bit systems.)

eloff|5 years ago

Yes, of course. What I mean to say is that it's weird the author did not see that trend and test it. There wouldn't have been an article if the result was "mmap is faster than read using buffer size < 64kb". It seems kind of central to their whole thesis.

throwaway373438|5 years ago

Ah, I see your point.

Here's the catch: Large buffer sizes only increase efficiency for sequential reads. mmap() is still much faster for random access within a file. Doubly so, because we need to not just read() but also lseek() for every read.

The inefficiencies of read() can be minimized in the sequential case only.