top | item 46514774

(no title)

pjdesno | 1 month ago

Since no one else seems to have pointed this out - the OP seems to have misunderstood the output of the 'time' command.

  $ time ./wc-avx2 < bible-100.txt
  82113300
  
  real    0m0.395s
  user    0m0.196s
  sys     0m0.117s
"System" time is the amount of CPU time spent in the kernel on behalf of your process, or at least a fairly good guess at that. (e.g. it can be hard to account for time spent in interrupt handlers) With an old hard drive you would probably still see about 117ms of system time for ext4, disk interrupts, etc. but real time would have been much longer.

    $ time ./optimized < bible-100.txt > /dev/null

    real    0m1.525s
    user    0m1.477s
    sys     0m0.048s
Here we're bottlenecked on CPU time - 1.477s + 0.048s = 1.525s. The CPU is busy for every millisecond of real time, either in user space or in the kernel.

In the optimized case:

  real    0m0.395s
  user    0m0.196s
  sys     0m0.117s
0.196 + 0.117 = 0.313, so we used 313ms of CPU time but the entire command took 395ms, with the CPU idle for 82ms.

In other words: yes, the author managed to beat the speed of the disk subsystem. With two caveats:

1. not by much - similar attention to tweaking of I/O parameters might improve I/O performance quite a bit.

2. the I/O path is CPU-bound. Those 117ms (38% of all CPU cycles) are all spent in the disk I/O and file system kernel code; if both the disk and your user code were infinitely fast, the command would still take 117ms. (but those I/O tweaks might reduce that number)

Note that the slow code numbers are with a warm cache, showing 48ms of system time - in this case only the ext4 code has to run in the kernel, as data is already cached in memory. In the cold cache case it has to run the disk driver code, as well, for a total of 117ms.

discuss

order

No comments yet.