top | item 46508268

(no title)

The limit is the number of outstanding cache line requests to the memory controller. CPUs have a fixed number of slots for this, around 10-12 usually. Intel calls them LFBs (Line Fill Buffers) and AMD MSHRs (Miss Status Holding Registers). When the slots are filled, the CPU can issue no more requests and has to wait for them to complete. Apple M chips (probably) have more slots and the memory is physically packaged together with the CPU, so they get better numbers.

discuss

foota|1 month ago

I assume these must be really expensive? Otherwise it seems like a great way to improve throughput on low concurrency tasks.

IgorPartola|1 month ago

At least in older CPUs the caches were SRAM (static RAM). It is complicated but requires no refreshing. DRAM is basically just a capacitor per bit and capacitors leak so you constantly have to refresh the entire memory space. When the CPU sends a request to RAM, the memory controller might be too busy refreshing the soon to decay parts to actually respond right away. And if I recall correctly when you read from DRAM you destroy what was there so the process is to read it, then write it back, then send the answer to the CPU which is just a lot of steps. But the price and die size difference is huge so we use GB or TB levels of DRAM and MB levels of SRAM.

memoriuaysj|1 month ago

bus wires. you can route only so many of them on a motherboard.

it's why GPUs have their memory chips in a circle around the GPU chip.