top | item 22855041

(no title)

pso | 5 years ago

It seems that the authors are not measuring what they think they are, or have explained it poorly. Most transitions from interpreter to JIT show speedups of x10 to x100, eg luajit or V8. How is it possible that the variation of V8 (as an example), according to their numbers is showing improvements of only a few percent, when it should be orders of magnitude faster after transition? My conclusion, they are measuring variations after warmup.

All of the warmup, and transitions from interpreter, to JIT, to optimised JIT , happen inside the first few micro or milliseconds of EVERY one of their thousands of process iteration. Their measurements are ALL of the system variation of the VM after warm up has taken place. The VM is optimizing within the first 1-1000 inner loops occuring at the start of EACH process iteration. For most working programmers, a variation of a few percent on a running system AFTER warm-up in "steady-state peak performance", and before any I/O takes place (because language benchmarks avoid I/O), would not be an issue. If it is an issue, then the article perhaps demonstrates that a compiled language would offer less variation.

The benchmarks listed range from a shortest of around 0.4s for fannkuch/hotspot/linux, up to 1.8s for n-body, pypy, linux. This 'long-running' benchmark code (of .4 to 1.8s ), by definition, has to include multiple inner loops/hot code, which is quickly optimized, otherwise benchmark code would have to be millions of lines long, in order to have a sufficient runtime length. Tests need to run for at least tenths of a second, for cross language comparisons, since JITted languages take some iterations to warm-up.

discuss

hedora|5 years ago

Their first iteration is an entire run of the underlying benchmark. Subsequent iterations are reusing the same VM. They run each plot multiple times, and reboot between plots.

They’re trying to show that “warmed up steady state” isn’t something that reliably exists.

pso|5 years ago

Yes, I know. But the tone of the whole article, is as if, they've found deep flaws across many VMs. They call something 'warmup" which I think has little or nothing to do with the JIT, but is unaccounted variations in the whole running system.

The final graph shows a binary trees program in C, with a 6% variation between "in process executions", and no steady state, it seems logical that most VMs will show the same or worse variation.

The "warmed-up steady state" does exist, but not if they define it so narrowly. All of their iterations and timings are running at x30 to x100 interpreted speed, the only 'cold' interpreted code is in a few microseconds of the first loops of an execution.