top | item 33791081

(no title)

brooksbp | 3 years ago

I suspect you're talking about the case where the peripheral issues a memory read request and you want the coherence protocol to return the value from CPU cache via snoop instead of having that value already been evicted from CPU cache and having to go to DRAM (off-chip memory) for the value?

If the peripheral issues a memory write, that location in the CPU cache must be invalidated so a CPU memory read of the location does not return an old/stale value.

In my own experimentation (non real-world use case) on a very specific system, I was surprised by the rate of peripheral read requests that resulted in snoop hits where the value would be returned from CPU cache (instead of from DDR PHY controller). The base case was surprisingly low. Modifying the experiment to have the CPU continuously access the memory (read accesses) while the CPU-peripheral interaction was taking place resulted in much much higher snoop hit rates. The overall performance difference between the two cases was not as big as I would have hoped at all. Perhaps the value being returned from DDR PHY controller was not as slow as I would have expected (some unknown/unexpected behavior caching/bypassing in the DDR controller?)--again, this was not a use case that had real-world memory accesses...

A language keyword for "please don't take it from cache" is tricky because it would be an incredibly low low low level specifier intended to be used for performance reasons in a system that is very complex to reason about performance. Maybe having more knobs could help (much easier to use this language specifier rather than having to write code to have the CPU continuously access the memory hoping that will keep it in cache), but I think this could get into the realm that people get distracted by performance and just start doing things in the name of performance without having proper controls and measurements in place for assisting in understanding what may be happening in the system.

Instructions related to the memory model exist for correctness. Memory prefetch instructions are just suggestions for an already sophisticated memory unit. Memory QoS can be thought of as having an impact on performance, but it is a much higher level solution aimed at partitioning of resources.

discuss

No comments yet.