turogallus's comments

turogallus | 6 years ago | on: I Got a Knuth Check for 0x$3.00

Except that the optimization seem to be quite spectacular.

Compiling pantalaimon's code, there doesn't seem to be any significative difference between the two algorithms with the default compilation parameters of gcc.

However compiling with gcc -O3, Knuth's algorithm is almost twice as fast on an Intel i7-7700HQ, old tricks still work well:

naive search found: 0, took 597719 ns found: 0, took 601291 ns found: 0, took 593332 ns found: 0, took 592513 ns found: 0, took 592799 ns found: 0, took 592409 ns found: 0, took 592563 ns found: 0, took 592289 ns found: 0, took 631602 ns found: 0, took 592928 ns average: 597944.50 ns knuth search found: 0, took 300860 ns found: 0, took 318691 ns found: 0, took 317502 ns found: 0, took 317348 ns found: 0, took 351612 ns found: 0, took 317625 ns found: 0, took 318217 ns found: 0, took 312106 ns found: 0, took 397532 ns found: 0, took 318284 ns average: 326977.69 ns

Just for fun, here is the dump of assembler code for function search_knuth:

   0x0000000000001220 <+0>: add    %rsi,%rdx
   0x0000000000001223 <+3>: mov    %edi,%eax
   0x0000000000001225 <+5>: mov    %dil,(%rdx)
   0x0000000000001228 <+8>: cmp    (%rsi),%dil
   0x000000000000122b <+11>: je     0x1238 <search_knuth+24>
   0x000000000000122d <+13>: nopl   (%rax)
   0x0000000000001230 <+16>: add    $0x1,%rsi
   0x0000000000001234 <+20>: cmp    %al,(%rsi)
   0x0000000000001236 <+22>: jne    0x1230 <search_knuth+16>
   0x0000000000001238 <+24>: cmp    %rsi,%rdx
   0x000000000000123b <+27>: setne  %al
   0x000000000000123e <+30>: retq

And here is the dump of assembler code for function search_naive:

   0x00000000000011e0 <+0>: add    %rsi,%rdx
   0x00000000000011e3 <+3>: mov    %edi,%eax
   0x00000000000011e5 <+5>: cmp    %rdx,%rsi
   0x00000000000011e8 <+8>: je     0x1205 <search_naive+37>
   0x00000000000011ea <+10>: cmp    (%rsi),%dil
   0x00000000000011ed <+13>: jne    0x11fc <search_naive+28>
   0x00000000000011ef <+15>: jmp    0x1210 <search_naive+48>
   0x00000000000011f1 <+17>: nopl   0x0(%rax)
   0x00000000000011f8 <+24>: cmp    %al,(%rsi)
   0x00000000000011fa <+26>: je     0x1210   <search_naive+48>
   0x00000000000011fc <+28>: add    $0x1,%rsi
   0x0000000000001200 <+32>: cmp    %rsi,%rdx
   0x0000000000001203 <+35>: jne    0x11f8   <search_naive+24>
   0x0000000000001205 <+37>: xor    %eax,%eax
   0x0000000000001207 <+39>: retq   
   0x0000000000001208 <+40>: nopl   0x0(%rax,%rax,1)
   0x0000000000001210 <+48>: mov    $0x1,%eax
   0x0000000000001215 <+53>: retq