top | item 3926083

(no title)

seanmcq | 14 years ago

Please profile tricks like this, as they may actually be significantly slower than their naive counterparts on modern hardware. The prefetcher knows what a linked list looks like, and it knows how to get it somewhere closer than main memory before the nodes are needed.

discuss

order

jacquesm|14 years ago

> The prefetcher knows what a linked list looks like

That's some pretty advanced magic.

Even the compiler has very limited insight into what your code is actually doing without simulating it, the prefetcher might be able to look a bit ahead in the execution stream and do branch prediction but absolutely no way does that extend to knowing stuff about your data structures.

Unless I have just been transported by a time warp I really think this is fiction.

If you're thinking of cache pre-fetching that actually has a really hard time dealing with stuff like linked lists because it has absolutely no idea at all about the data structure it is looking at. The 'next' and 'previous' pointers in the linked list might actually simply be values without any significance at all. And if they are dereferenced as pointers then that memory could be just about anywhere within the valid address range.

For arrays on the other hand such pre-fetching can be useful.

exDM69|14 years ago

> Unless I have just been transported by a time warp I really think this is fiction.

Nope, the Intel guys and gals do this kinda magic day in, day out. Or at least the chips they manufacture do. I'm not familiar with the internals of the prefetcher of any CPU at this level, but let me wave hands here. This is what the prefetcher could do:

All it takes is for the prefetcher to get a cache line when requested, and then observe what is inside and look at pointer-sized and aligned values that look like pointers and are pretty close to the original cache line's (virtual) address. Now if these values happen to be sane virtual addresses in the current process, the prefetcher might as well fetch them one cache closer to the CPU. If it hits, it might yield big performance boost in real world apps. If it misses, it's just a little wasted electricity.

All modern CPUs do dirty little tricks like this if it helps them outshine their competitors.

Btw. you can add prefetch instructions in your code manually if you do linked list traversals or similar. In GCC you can use __builtin_prefetch() compiler intrinsic.

ssp|14 years ago

The prefetcher knows what a linked list looks like

That seems unlikely to me. Which processors do this?

idunno246|14 years ago

Prefetching when a register contains what looks like a memory address at least seem possible

forgotusername|14 years ago

As best I understand it, while instruction prefetch is a complex affair, data prefetch is limited to detecting incrementing sequential address access, unless directed through a PREFETCH op. I can't a find a good reference, but I believe at best the CPU can only optimize by detecting stride.

In no circumstance will the CPU magically understand the semantics of some data structure.

zaptheimpaler|14 years ago

Pre-fetchers rely on two metrics to determine what to cache - temporal locality, meaning it will cache things that it judges to be frequently accessed, and spatial locality, meaning that when you access something at memory address X, the cache will fill one cache line with the contents of consecutive memory locations i.e X,X+1,X+2,..,X+s.

Hence, they have no capability to understand a structure like a linked list and will not cache it intelligently. All you can rely on is that they will cache arrays and traverse them in a way that efficiently uses the cache.

noselasd|14 years ago

Note though that speed isn't all that matter, memory usage does too, especially in constrained environments.