top | item 38270613

(no title)

Freaky | 2 years ago

> We were very generous in terms of warm-up time. Each benchmark was run for 1000 iterations, and the first half of all the iterations were discarded as warm-up time, giving each JIT a more than fair chance to reach peak performance.

1,000 iterations isn't remotely generous for JRuby, unfortunately - JVM's Tier-3 compilation only kicks in by default around 2,000, and full tier-4 is only considered beyond 15,000. I've observed this to have quite a substantial effect, for instance bringing manticore (JRuby wrapper for Apache's Java HttpClient) down from merely "okay" performance after 10,000 requests to pretty much matching the curb C extension under MRI after 20,000.

You can tweak it to be more aggressive, but I guess this puts more pressure on the compiler threads and their memory use, while reducing the run-time profiling data they use to optimize most effectively. It perhaps also risks more churn from deoptimization. I kind of felt like I'd be better off trying to formalise the warmup process.

It's rather a shame that all this warmup work is one-shot. It would be far less obnoxious if it could be preserved across runs - I believe some alternative Java runtimes support something like that, though given JRuby's got its own JIT targetting Java bytecode I dare say it would require work there as well.

discuss

order

Twirrim|2 years ago

Agreed, the JVM's thresholds for method compilation is higher than the test strategy seemed to account for.

Also, quoting from the site:

    TruffleRuby eventually outperforms the other JITs, but it takes about two minutes to do so. It is also initially quite a bit slower than the CRuby interpreter, taking over 110 seconds to catch up to the interpreter’s speed. This would be problematic in a production context such as Shopify’s, because it can lead to much slower response times for some customers, which could translate into lost business.
There's always ways around that, for example pushing artificial traffic at a node as part of a deployment process, prior to exposing it to customers. I've known places that have opted for that approach, because it made the best sense for them. The initial latency hit of JIT warm-up wasn't a good fit for their needs, while every other aspect of using a JIT'd language was.

As ever, depends on the trade-off if it's worth the extra work to do that. e.g. if I could see after 5-10 minutes that TruffleRuby was, say, 25% faster than YJIT, then that extra engineering effort may be the right choice.

edit: Some folks throw traffic at nodes before exposing them to customers to ensure that their caches are warm, too. It's not necessarily something limited to JIT'd languages/runtimes.

MarkSweep|2 years ago

If you are able to snapshot the state of the JIT, you can do the warming on a single node. The captured JIT state can then be deployed to other machines, saving them from spending time doing the warming. This increases the utilization of your machines.

While this approach sounds like a convoluted way to do ahead of time compilation, I’ve seen it done.

maxime_cb|2 years ago

It is enough iterations for these VMs to warm up on the benchmarks we've looked at, but the warm-up time is still on the order of minutes on some benchmarks, which is impractical for many applications.

ericb|2 years ago

I have a compiler that eventually produces 10x faster bytecode than the current JRuby. It takes a bit of warmup time.

The first command is "Let there be Light."