top | item 30314226 (no title) mhkool | 4 years ago Since the performance for array sizes <L1-size and <L2-size is similar , I would like to see an attempt to improve B. B = L2-size / 2 / sizeof(int) - 16 might produce better results.Note also that _mm_broadcast_ss() is faster on newer processors. discuss order hn newest No comments yet.
No comments yet.