top | item 30314226

(no title)

mhkool | 4 years ago

Since the performance for array sizes <L1-size and <L2-size is similar , I would like to see an attempt to improve B. B = L2-size / 2 / sizeof(int) - 16 might produce better results.

Note also that _mm_broadcast_ss() is faster on newer processors.

discuss

order

No comments yet.