(no title)
kannanvijayan | 1 month ago
The initial order-of-magnitude jump in perf that JITs provided took us from the 5-2x overhead for managed runtimes down to some (1 + delta)x. That was driven by runtime type inference combined with a type-aware JIT compiler.
I expect that there's another significant, but smaller perf jump that we haven't really plumbed out - mostly to be gained from dynamic _value_ inference that's sensitive to _transient_ meta-stability in values flowing through the program.
Basically you can gather actual values flowing through code at runtime, look for patterns, and then inline / type-specialize those by deriving runtime types that are _tighter_ than the annotated types.
I think there's a reasonable amount of juice left in combining those techniques with partial specialization and JIT compilation, and that should get us over the hump from "slightly slower than native" to "slightly faster than native".
I get it's an outlier viewpoint though. Whenever I hear "managed jitcode will never be as fast as native", I interpret that as a friendly bet :)
rudedogg|1 month ago
The battlecry of Java developers riding their tortoises.
Don’t we have decades of real-world experience showing native code almost always performs better?
For most things it doesn’t matter, but it always rubs me the wrong way when people mention this about JIT since it almost never works that way in the real world (you can look at web framework benchmarks as an easy example)
kannanvijayan|1 month ago
The idea that an absurdly dynamic language like JS, where all objects are arbitrary property bags with prototypical dependency chains that are runtime mutable, would execute at a tech budget under 2x raw performance was just a matter of fact impossibility.
Until it wasn't. And the technology reason it ended up happening was research that was done in the 80s.
It's not surprising to me that it hasn't happened yet. This stuff is not easy to engineer and implement. Even the research isn't really there yet. Most of the modern dynamic language JIT ideas which came to the fore in the mid 200X's were directly adapting research work on Self from about two decades prior.
Dynamic runtime optimization isn't too hot in research right now, and it never was to be honest. Most of the language theory folks tend to lean more in the type theory direction.
The industry attention too has shifted away. Browsers were cutting edge a while back and there was a lot of investment in core research tech associated with that, but that's shifting more to the AI space now.
Overall the market value prop and the landscape for it just doesn't quite exist yet. Hard things are hard.
pjmlp|1 month ago
AOT compilers without PGO data usually tend to perform worse when those conditions aren't met.
Which is why the best of both worlds is using JIT caches that survive execution runs.
anotherhue|1 month ago
What are the real world chances that a) one's compiled code benefits strongly from runtime data flow analysis AND b) no one did that analysis at the compilation stage?
Some sort of crazy off label use is the only situation I think qualifies and that's not enough.
xylophile|1 month ago
It's then just a matter of how your team values runtime performance vs other considerations such as workflow, binary portability, etc. Virtually all projects have an acceptable range of these competing values, which is where JIT shines, in giving you almost all of the performance with much better dev economics.
kannanvijayan|1 month ago
Obviously JITting means you'll have a compiler executing sometimes along with the program which implies a runtime by construction, and some notion of warmup to get to a steady state.
Where I think there's probably untapped opportunity is in identifying these meta-stable situations in program execution. My expectation is that there are execution "modes" that cluster together more finely than static typing would allow you to infer. This would apply to runtimes like wasm too - where the modes of execution would be characterized by the actual clusters of numeric values flowing to different code locations and influencing different code-paths to pick different control flows.
You're right that on the balance of things, trying to say.. allocate registers at runtime will necessarily allow for less optimization scope than doing it prior.
But, if you can be clever enough to identify, at runtime, preferred code-paths with higher resolution than what (generic) PGO allows (because now you can respond to temporal changes in those code-path profiles), then you can actually eliminate entire codepaths from the compiler's consideration. That tends to greatly affect the register pressure (for the better).
It might be interesting just to profile some wasm executions of common programs. If there are transient clusterings of control flow paths that manifest during execution. It'd be a fun exercise...