(no title)
ack_complete | 2 months ago
I'd expect more of a gain from enabling FMA, but that's assuming the program actually got built to use FMA -- it needs to either use it explicitly or have relaxations to allow the contraction. Oryon has 4 x 128-bit NEON pipes with 3c latency fadd and 4c latency fmul/fma, so it easily ends up latency bottlenecked unless there are plenty of independent calculations.
No comments yet.