(no title)
brooksbp | 3 years ago
If the peripheral issues a memory write, that location in the CPU cache must be invalidated so a CPU memory read of the location does not return an old/stale value.
In my own experimentation (non real-world use case) on a very specific system, I was surprised by the rate of peripheral read requests that resulted in snoop hits where the value would be returned from CPU cache (instead of from DDR PHY controller). The base case was surprisingly low. Modifying the experiment to have the CPU continuously access the memory (read accesses) while the CPU-peripheral interaction was taking place resulted in much much higher snoop hit rates. The overall performance difference between the two cases was not as big as I would have hoped at all. Perhaps the value being returned from DDR PHY controller was not as slow as I would have expected (some unknown/unexpected behavior caching/bypassing in the DDR controller?)--again, this was not a use case that had real-world memory accesses...
A language keyword for "please don't take it from cache" is tricky because it would be an incredibly low low low level specifier intended to be used for performance reasons in a system that is very complex to reason about performance. Maybe having more knobs could help (much easier to use this language specifier rather than having to write code to have the CPU continuously access the memory hoping that will keep it in cache), but I think this could get into the realm that people get distracted by performance and just start doing things in the name of performance without having proper controls and measurements in place for assisting in understanding what may be happening in the system.
Instructions related to the memory model exist for correctness. Memory prefetch instructions are just suggestions for an already sophisticated memory unit. Memory QoS can be thought of as having an impact on performance, but it is a much higher level solution aimed at partitioning of resources.
No comments yet.