top | item 41753759

(no title)

mhio | 1 year ago

Would the profiles and resulting binaries be highly CPU specific? I couldn't find any cross hardware notes in the original paper.

The example's I'm thinking of are CPU's with vastly different L1/L2/L3 cache profiles. Epyc vs Xeon. Maybe Zen 3 v Zen 5.

Just wondering if it looks great on a benchmark machine (and a hyperscaler with a common hardware fleet) but might not look as great when distributing common binaries to the world. Doing profiling/optimising after release seems dicey.

discuss

order

pgaddict|1 year ago

Interesting question. I think most optimizations described in the BOLT paper are fairly hardware agnostic - branch prediction does not depend the architecture, etc. But I'm not an expert on microarchitectures.

jeffbee|1 year ago

A lot of the benefits of BOLT come from fixing the block layout so that taken branches go backward and untaken branches go forward. This is CPU neutral.