top | item 30354290

What Every Programmer Should Know About Memory (2007) [pdf]

82 points| zonovar | 4 years ago |people.freebsd.org

18 comments

corysama|4 years ago

This comes up over and over. It's great. But, the 75% of useful content comes after 25% of diving way too deep into the details of the electrical engineering.

"Every programmer" should know the orders of magnitude of cache hierarchy latencies, how RAM loads a whole cache line to service that single byte you requested, roughly how the automatic prefetcher thinks, that MESI and NUMA access are a growing issue, that the TLB cache is a thing, and generally how the memory controller is the interface between the CPU and pretty much everything else --like the NIC, HD and GPU.

"Every programmer" does not need to know about DRAM discharge timing, row selection and refresh cycles.

Understanding quad-pumped bus frequencies and CAS latencies is great when you are building systems. But, it's not something you think about when designing data structures and algos.

not2b|4 years ago

Long ago when I worked on real-time digital signal processing I did have to worry about DRAM discharge timing because the board I was using had a non-maskable interrupt to do the RAM refresh, and this limited the rate at which I could access the A/D converter. I think it was an LSI-11 board if I remember right. Fun times.

But yes, unless you're doing bare-metal embedded systems you haven't needed to care in a long time ... at least until someone came up with Rowhammer.

mhh__|4 years ago

The ones who have the skills to actually use this information will end up reading the whole thing anyway, there's an element of self selection here.

You don't need to know the DRAM timing stuff, technically, but it doesn't hurt to learn something new.

The actual problem with this doc is that it was written a very long time ago so although most things are very very true still a few things (like everything on the quality of prefetching techniques should be taken with a large grain of salt)

throwawaylinux|4 years ago

It's also wrong about some basic details AFAIKS. A "memory controller" typically has more than one memory channels and not always ganged, so independent concurrent accesses even in a single controller can be supported. DRAM can also have multiple banks which can be accessed independently. So there is certainly not only one "bank" per north bridge or one bank per ODMC.

convolvatron|4 years ago

on modern systems the memory controller interfaces between the memory network and the DRAM. I wouldn't say the CPU/NIC or CPU/storage boundary touches the memory controller (except if its writing to memory)

rramadass|4 years ago

Quite Right!

Any resources on where i can read up on these ?

mjochim|4 years ago

Well thanks for this summary :-).

jcranmer|4 years ago

If you have a 114-page paper, you probably don't have a document of what every programmer should know about memory, especially since many programmers work in domains where some of the recommendations here aren't even possible to follow!

Here's a brief summary of what every programmer really needs to know about memory:

* There is a hierarchy of memory from fast-but-small to slow-but-large. The smallest and fastest is in L1 cache, which is typically on the order of 10-100 KiB per core, and isn't growing substantially over time. The largest cache is L3 cache, in the 10s of MiB, and shared between many or all cares, while your RAM is in the 10s of GiB. An idea of the difference in access times can be found here: https://gist.github.com/jboner/2841832.

* All memory traffic happens on cacheline-sized blocks of memory, 64 bytes on x86. (It might be different on other processor architectures, but not by much). If you're reading 1 byte of data from memory, you're going to also read in 63 other bytes anyways, whether or not you're actually using that data.

* The slow speed of DRAM (compared to processor clock speed) means that it's frequently the case that trading off a little CPU computation time for tighter memory utilization. For example, storing a 16-bit integer instead of a pointer-sized integer, if you know that the maximum value will be <65,536.

* "Pointer chasing" algorithms are generally less efficient than array based algorithms. In particular, an array-based list is going to be faster than linked lists most of the time, especially on small (<1,000 element) lists.

That about covers what I think every programmer needs to know about memory. There are some topics that many programmers should perhaps know--cross-thread contention for memory, false sharing, and then NUMA in descending order, I think, but by the time you're in the weeds of "let's talk about nontemporal stores"... yeah, that's definitely not in the ballpack of everyone should know about them, especially if you're not going to mention when nontemporal stores will hurt instead of help [1]. Also, transactional memory is something I'd put on my list of perennial topics of "this sounds like a good idea in theory, but doesn't work in practice."

[1] What Drepper omits is that nontemporal stores will evict the cache lines from cache if they're already in cache, so you really shouldn't use them unless you know there is going to be no reuse.

throwawaylinux|4 years ago

You have to know something about coherency. You don't need to know the details but you need to know that multiple processors can read a cache line, but if there are any writers there will be trouble.

You have to know something about consistency. You don't need to know the details but you need to know enough to know you don't know enough to write your own synchronization primitives or lock free algorithms, at least.

mhh__|4 years ago

The fields of microarchitecture and compiler design (specifically relating to microarchitecture also, i.e. we didn't go with Itanium or RISC) could really use a new batch of textbooks.

Hennessey and Patterson is still good but there's only like 1 alternative, and the most up to date "complete" book on memory I'm aware of is from 2007 (Jacobs et al).

Also: I've read about memory quite bit, I get it. What I don't get, and am thus asking for recommendations for is: "What every programmer should know about storage".

mhh__|4 years ago

You can get an idea of your computers memory hierarchy in your browser here:

https://chipsandcheese.com/memory-latency-test/

dang|4 years ago

The main past threads appear to be:

What every programmer should know about memory [pdf] - https://news.ycombinator.com/item?id=27343593 - May 2021 (7 comments)

What every programmer should know about memory, Part 1 (2007) - https://news.ycombinator.com/item?id=25908018 - Jan 2021 (26 comments)

What Every Programmer Should Know About Memory (2007) [pdf] - https://news.ycombinator.com/item?id=19302299 - March 2019 (97 comments)

What every programmer should know about memory, Part 1 (2007) - https://news.ycombinator.com/item?id=15300547 - Sept 2017 (12 comments)

What Every Programmer Should Know About Memory (2007) [pdf] - https://news.ycombinator.com/item?id=14622861 - June 2017 (26 comments)

What every programmer should know about memory (2007) - https://news.ycombinator.com/item?id=10601626 - Nov 2015 (53 comments)

What Every Programmer Should Know About Memory (2007) [pdf] - https://news.ycombinator.com/item?id=9360778 - April 2015 (15 comments)

What every programmer should know about memory, Part 1 - https://news.ycombinator.com/item?id=3919429 - May 2012 (79 comments)

What every programmer should know about memory [2007] - https://news.ycombinator.com/item?id=3360188 - Dec 2011 (2 comments)

What every programmer should know about memory - https://news.ycombinator.com/item?id=1511990 - July 2010 (37 comments)

What every programmer should know about memory, Part 1 - https://news.ycombinator.com/item?id=1394346 - June 2010 (4 comments)

What Every Programmer Should Know About Memory - https://news.ycombinator.com/item?id=659367 - June 2009 (1 comment)

What every programmer should know about memory, Part 1 - https://news.ycombinator.com/item?id=58627 - Sept 2007 (7 comments)

throwaway81523|4 years ago

This is from 2007! Still somewhat relevant though.

mhh__|4 years ago

It's almost all exactly the same just with different parameters.

btdmaster|4 years ago

`curl cht.sh/latency` illustrates potential bottlenecks quite well given how succinct it is.