top | item 45134258

(no title)

jared_hulbert | 5 months ago

YES! gcc and clang don't like to optimize this. But they do if you hardcode the size_bytes to an aligned value. It kind of makes sense, what if a user passes size_bytes as 3? With enough effort the compilers could handle this, but it's a lot to ask.

I just ran MAP_POPULATE the results are interesting.

It speeds up the counting loop. Same speed or higher as the my read() to a malloced buffer tests.

HOWEVER... It takes a longer time overall to do the population of the buffer. The end result is it's 2.5 seconds slower to run the full test when compared to the original. I did not guess that one correctly.

time ./count_10_unrolled ./mnt/datafile.bin 53687091200 unrolled loop found 167802249 10s processed at 5.39 GB/s ./count_10_unrolled ./mnt/datafile.bin 53687091200 5.58s user 6.39s system 99% cpu 11.972 total time ./count_10_populate ./mnt/datafile.bin 53687091200 unrolled loop found 167802249 10s processed at 8.99 GB/s ./count_10_populate ./mnt/datafile.bin 53687091200 5.56s user 8.99s system 99% cpu 14.551 total

discuss

order

titanomachy|5 months ago

Hmm, I expected some slowdown from POPULATE, but I thought it would still come out ahead. Interesting!

mischief6|5 months ago

it could be interesting to see what ispc does with similar code.