(no title)
rostayob | 1 year ago
uiCA is a very nice tool which tries to simulate how instructions will get scheduled, e.g. this is the trace it produces for sum3 on Haswell, showing the fusion: https://uica.uops.info/tmp/75182318511042c98d4d74bc026db179_... .
xiphias2|1 year ago
gergo_barany|1 year ago
The version in the Compiler Explorer wouldn't work on AArch64 without an -mcpu flag and I didn't know what to pass, so I copied -mcpu=cyclone from https://djolertrk.github.io/2021/11/05/optimize-AARCH64-back.... You'd have to look up the correct one for your Mac's CPU.