top | item 33630046

(no title)

GLGirty | 3 years ago

I watched Goddard's lectures a few years ago, and he talked about different price points for the mill--'tiers' of processors that differed mainly, (iirc,) by the size of the belt. He made some claims about porting binaries from one tier to another without compilation from source, and it struck me as a very optimistic claim, for the reasons you hinted at.

My bread and butter work isn't very close to the metal, but you sound more experienced with this sort of thing. Are you familiar with the mill? Do you think they have a chance of avoiding the weeds that Transmeta got stuck in?

discuss

kllrnohj|3 years ago

I'm not familiar with mill at all, but the problem of doing a JIT at this level is where do you store the result and how does the JIT actually run? The CPU can't exactly call mmap to ask the kernel for a read/write buffer, fill it with the JIT result, and then re-map it r+x. So you get things like carveouts where the CPU just has a reserved area to stash JIT results, which is then globally readable/writable by the CPU. Better hope that JIT doesn't have an exploit that lets malicious app A clobber the JIT'd output of kernel routine B when app A gets run through the JIT! Also not like the kernel is aware of the JIT happening, either, as the CPU can't launch a new thread to do the JIT work. So as far as all profiling tools are concerned it looks like random functions just suddenly take way longer for no apparent reason. Good luck optimizing anything with that behavior. And the CPU then also can't cache these results to disk obviously, so you're JIT'ing these unchanging binaries everytime they're run? That's fun.

Maybe all of that is solvable in some way, sure. CPUs can communicate with the kernel through various mechanisms of course, you can build it such that the JIT is a different task somehow that's observable & preemptable, etc... But it's complicated & messy. And very complex for what a CPU microcode typically is tasked with dealing with, for a benefit that seems quite questionable. It's not like there's any reason a CPU doing the JIT is going to be more optimal than the kernel/userspace doing the JIT - it's trivial (and common) to expose performance counters from the CPU, after all.

That doesn't mean a CPU designed for a JIT is inherently bad, it just means doing this at the microcode level like Transmeta was doing is a bad idea.