top | item 29564458

(no title)

neel_k | 4 years ago

Self-modifying code was useful for optimisation back in the 80s, but these days it's usually awful for performance (with JIT compilation as the main exception to this rule).

Your CPU has an instruction cache and a data cache, and on ARM (and x86, too, but I'm not sure) these caches are not coherent. So if you modify your instruction stream with a write, you have to clear the instruction cache to ensure that your modified instructions are actually executed by the processor. If you do this a lot, this will make things S-L-O-W, because it forces you to go all the way to main memory to find the next instruction to execute.

This means that if you do want to generate code at runtime, you want to batch the modifications into large groups, so that you have to invalidate the i-cache less frequently. This actually is useful -- it's what JIT compilation is! The reason that JIT can be helpful (even in statically typed languages like Java or Haskell) is that programs often get passed functions as arguments (eg, qsort in C). A static compiler can't optimise these functions much, because you have to know what the function argument will be to do much. But at runtime, you do know what the function is, and by inlining it your code can be made much faster.

discuss

order

nneonneo|4 years ago

On x86 you can use self-modifying code without explicitly flushing caches. However, if you execute the modified code soon after writing it, the penalty can be tens to hundreds of cycles (source: Agner Fog’s optimization manual).

ARM quite famously requires the explicit cache flush, and will usually fail to work without it. However, some emulators, e.g. QEMU, don’t require the cache flush, which can lead to confusion if you usually test on emulators.

zinxq|4 years ago

Came here to hope someone wrote this. Wasted many hours of my young life trying to figure out why my self-modifying assembler program worked perfect in the debugger but not without it.

bogomipz|4 years ago

Could you elaborate? I feel like the OP is saying that it works just painfully slowly but your comment/problem indicates it didn't work at all without the debugger. Can you say how these relate? Am I missing something obvious?

gpderetta|4 years ago

Inlining passed function pointer is not really a JIT only optimization. As long as the pointer is a constant it only requires interprocedural optimizations and/or link time optimization.

The jit can help if the value of the pointer varies dynamically and in an unpredictable way (otherwise PGO would also help).