top | item 46477515

(no title)

dfawcus | 1 month ago

Well his "Normal Functions" (benchmarks/closures/source/normal_functions.cpp in his repo) looks quite similar to what I had with my GNU nested functions using a stand in "wide pointer", and hence no generated trampoline.

(https://news.ycombinator.com/item?id=46243298)

Which rather suggests to me that such a scheme, but generated by the compiler, should have a similar performance to said "Normal Functions" and hence similar to his preferred lambda form.

Since his benchmark environment is so unwieldy, I may have a go at extracting those two code sets to a standalone environment, and measure them so see...

discuss

order

uecker|1 month ago

So here are my preliminary benchmarks with my own implementation on an AMD EPYC 9334 32-Core processo. I need to double checks things - so take this with a grain of salt for now. Time is in seconds for 100000 iterations of manorboy(10). So far, the only implementation which clearly sucks is std::function<>. Even trampolines are suprisingly good (but I can imagine that they are much worse on other CPUs / architectures)

  xgcc (GCC) 16.0.0 20260103 (experimental)
  1.50 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack
  1.11 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack -DREFARG
  7.21 gcc -ftrampoline-impl=heap
  7.34 gcc -ftrampoline-impl=heap -DREFARG
  0.93 gcc -DWIDEPTR
  1.38 gcc -DWIDEPTR -DREFARG
  1.40 gcc -DDIRECT
  1.05 gcc -xc++ -std=c++26 -DFUNCREF -DDEDUCING
  19.68 gcc -xc++ -std=c++26 -DDEDUCING
  20.73 gcc -xc++ -std=c++26
  6.31 gcc -xc++ -std=c++26 -DDEDUCING -DREFARG
  6.31 gcc -xc++ -std=c++26 -DREFARG
  Debian clang version 16.0.6 (15~deb12u1)
  21.11 clang -xc++
  6.16 clang -xc++ -DREFARG
  1.66 clang -fblocks
  1.70 clang -fblocks -DREFARG