(no title)
Teongot | 1 year ago
Because Jazelle converted Java bytecodes into ARM instructions in sequence, there is no opportunity for any instruction scheduling. So a bytecode sequence like:
// public static int get_x(int x, T a, T b) { return a.x+b.x; }
aload_1
getfield #N
aload_2
getfield #N
iadd
would go down the pipeline as something like: LDR r1, [r0, #4] // a_load1
* LDR r1, [r1] // getfield
LDR r2, [r0, #8] // aload_2
* LDR r2, [r2] // getfield
* ADD r1, r1, r2 // iadd
There would be a pipeline stall before each instruction marked with a *.On the first ARM 9 CPUs with Jazelle, the pipeline is fairly similar to the standard 5 stage RISC pipeline (Fetch-Decode-Execute-MemoryAccess-Writeback) so this stall would be 1 cycle. That wasn't too bad - you could just accept that loads took usally 2 cycles, and it would still be pretty fast.
However, on later CPUs with a longer pipeline the load-use delay was increased. By ARM11, it was 2 cycles - so now the CPU is spending more time waiting for pipeline stalls that it spends actually executing instructions.
In contrast, even a basic JIT can implement instruction scheduling and find some independent instructions to do between a load and the use of the result, which makes the JIT much more performant than Jazelle could be.
colejohnson66|1 year ago