(no title)
deletes | 12 years ago
I did a similar test in C and have gotten very similar results. When N is around 4000 the trashing version starts to differ substantially. A 3x difference can already be seen when N is 1000.
This means if your program is running on two threads over different parts of the matrix, every single iteration requires a request to RAM.
I'm skeptical over this part, I have tried to replicate this behavior but was unsuccessful. Even though cores are sharing L3, I doubt that a thread will overwrite the entire cache on every iteration.
lettergram|12 years ago
Either way, you should see a noticeable difference as the size increases, which was the point.