top | item 33788042

(no title)

deadcanard | 3 years ago

CAT is indeed a good thing to look at. But there are some important caveats 1) unless you have a very small number of cores, it's not possible to reserve a cache slice for all programs running (some slices are shared for things like DDIO), 2) it's still not possible to lock some specific data in the cache because any collision will replace the data 3) the slices are kinda big, so it's hard to be properly fine grained. Basically, CAT just prevents other processes from stealing all the cache. It does that by reserving ways (as in the cache associativity meaning)

discuss

order

bitcharmer|3 years ago

1) fully agreed but most HFT apps with the exception of really simple ones like market data feed handlers which can easily fit their working set into L2 anyway will be the only thing running on a host

2) mutual cache eviction by hash collisions is solvable with a number of tricks (although those methods are not easy and often wasteful). The "DDIO slice" issue used to be a problem back when Intel used ring topology for LLC. These days they are built as a mesh thus minimizing this effect.

3) CAT doesn't recognize threads or processes. COS (class of service) uses CPU cores for way-of-cache assignments

Recent micro-architectures like SKX or CLX have 11 ways of L3 and what often happens is 1-2 ways get assigned to cpu0 for non latency-critical workloads while the rest are assigned to latency-sensitive, isolated cores usually running a single user space thread each.

deadcanard|3 years ago

2) Agreed about the solvability and difficulty of avoiding cache collisions. DDIO must write its data somewhere in the L3 cache. It ends up in the shareable slice. So either you're okay with sharing your cache or cannot use these slices if you want exclusive access for your processes. That was my point.

3) CAT does not recognize processes but resctl does. Feels we're kinda nitpicking here...

Last of your point: Agree, that gives you 9ish usable slices which is not very much depending on the number of cores. That was my point I was trying to make.