top | item 3778903

Using SIMD for hardware acceleration

9 points| eerpini | 14 years ago |krishnakanthmallikc.blogspot.com | reply

2 comments

[+] skylan_q|14 years ago|reply

Great post. Vectorization is one of the easiest ways to increase per-thread performance.

This gives me an excuse to post one of my favorite articles at Intel. It shows the best performance increases for this person's problem come from optimizing memory accesses.

http://software.intel.com/en-us/articles/superscalar-program...

Seeing as memory read/write instructions are about 40-50% of the x86 code out there (from what I've heard) tweaking memory accesses seems to be a great way to get great performance.

[+] eerpini|14 years ago|reply

Yes memory access patterns seem to be the most common bottleneck for most parallel code. I was implementing a parallel version of quick sort recently and I have similar stories to tell. Optimize the code to avoid cache misses frequently and you end up getting a near optimal speedup.