top | item 39768022

(no title)

My hunch is that if you added the buffered reader and kept the original xxd in the pipe you’d see similar timings.

The amount of input data is just laughably small here to result in a huge timing discrepancy.

I wonder if there’s an added element where the constant syscalls are reading on a contended mutex and that contention disappears if you delay the start of the program.

discuss

vlovich123|1 year ago

Good hunch. On my machine (13900k) & zig 0.11, the latest version of the code:

> INFILE="$(mktemp)" && echo $INFILE && \ echo '60016000526001601ff3' | xxd -r -p > "${INFILE}" && \ zig build run -Doptimize=ReleaseFast < "${INFILE}" > execution time: 27.742µs

> echo '60016000526001601ff3' | xxd -r -p | zig build run -Doptimize=ReleaseFast > execution time: 27.999µs

The idea that the overlap of execution here by itself plays a role is nonsensical. The overlap of execution + reading a byte at a time causing kernel mutex contention seems like a more plausible explanation although I would expect someone better knowledgeable (& more motivated) about capturing kernel perf measurements to confirm. If this is the explanation, I'm kind of surprised that there isn't a lock-free path for pipes in the kernel.

mtlynch|1 year ago

Based on what you've shared, the second version can start reading instantly because "INFILE" was populated in the previous test. Did you clear it between tests?

Here are the benchmarks before and after fixing the benchmarking code:

Before: https://output.circle-artifacts.com/output/job/2f6666c1-1165...

After: https://output.circle-artifacts.com/output/job/457cd247-dd7c...

What would explain the drastic performance increase if the pipelining behavior is irrelevant?

rofrol|1 year ago

This @mtlynch