top | item 44443140

(no title)

fossa1 | 8 months ago

This is a textbook case of micro-architectural reality beats theoretical elegance. It's fascinating how replacing 5 loads with 2 loads + 3 vextq_f32 intrinsics, which should reduce memory pressure, ends up being slower due to execution port contention and dependency chains.

discuss

order

No comments yet.