top | item 33301403

(no title)

abiloe | 3 years ago

> If you really think about it, the only real difference between main memory blocking and disk blocking is the amount of time they may block.

This is a somewhat confusing analysis you have here. Direct read/write from memory for all intents and purposes doesn't block. Why do you say that reads and writes may also block?

The reason memory blocks is because it needs to page in or out from secondary storage - which makes this statement "the only real difference between main memory blocking and disk blocking is the amount of time they may block." not really true

discuss

order

kentonv|3 years ago

> Direct read/write from memory for all intents and purposes doesn't block.

Sure it does! Main memory is much slower than cache so on a cache miss the CPU has to stop and wait for main memory to respond. The CPU may even switch to executing some other thread in the meantime (that's what hyperthreading is). But if there isn't another hyperthread ready, the CPU sits idle, wasting resources.

It's not a form of blocking implemented by the OS scheduler, but it's pretty similar conceptually.

> The reason memory blocks is because it needs to page in or out from secondary storage

Nope, that's not what I was referring to (other than in the line mentioning swap).

abiloe|3 years ago

> Sure it does! Main memory is much slower than cache so on a cache miss the CPU has to stop and wait for main memory to respond. The CPU may even switch to executing some other thread in the meantime (that's what hyperthreading is).

Cache is a memory. And which cache, by the way? Even L1 cache on modern processors doesn't have 0 latency. And this is a rather poor way of describing hyperthreading - the CPU doesn't really "switch" - the context for the alternate process is already available and the resource stealing can occur for any kind of stall (including cache loads), not just memory. Calling this a "switch" suggesting it is like a context switch is very misleading. It's not similar conceptually.

In any event, by this definition even a mispredicted branch or a divide becomes "blocking" - which sort of tortures any meaningful definition of blocking.

The essential difference is - memory accesses to paged in memory (and branch mispredictions, cache misses) are not something you typically or reasonably trap outside of debugging. mmaps, swaps, disk I/O, network accesses are all something delegated to an OS - and at that point are orders of magnitude more expensive than even most NUMA memory accesses. I sort of see where you're coming from - but I don't think it's a useful point.

jandrewrogers|3 years ago

I think the important practical distinction is whether or not these stalls imply a context switch. Software that avoids blocking calls is largely trying to minimize context switches, with measurable adverse effects being increasingly common due to improvements in hardware. Stalls that do not imply context switches, such as filling a cache line, are not "blocking" as a matter of practical semantics because there is no context switch that has to be accounted for. Of course, this gets into a gray area with things like hyper-threading that have some of the side-effects of a context switch without an actual context switch.

bch|3 years ago

With the utmost respect, I’ve never heard “blocking” described as “takes some measurable amount of time” (which is how I’m reading your above statement); by that definition, async blocks to a degree too.

You’re throwing traditional blocking/non-blocking distinctions on their ear.

tremon|3 years ago

Why do you say that reads and writes may also block?

Let's define "may block" first, perhaps? What do we mean when we say "network I/O may block"? Usually, this means that the kernel may see your network request and raise you a context switch while it waits for the network response on your behalf. In your last sentence you appear to argue that the reason why the kernel performs a context switch is relevant in determining if an operation "may block", and the GP is arguing that that's a distinction without a difference.

If the definition of "may block" is really just "the kernel may decide to context-switch away from your program", then yes, the GP's assertion that file I/O, memory I/O (mmap) and memory access (swap) are all operations that may block is correct -- the only difference is in degree: from microsecond delays for nvm-backed swap to multi-second delays for network transfers.

Or, of course, I may have misunderstood the GP's train of thought.

jesboat|3 years ago

>> If you really think about it, the only real difference between main memory blocking and disk blocking is the amount of time they may block. > > This is a somewhat confusing analysis you have here. Direct read/write from memory for all intents and purposes doesn't block. Why do you say that reads and writes may also block?

Reads and writes from actual, physical, hardware memory might block, depending on how you define "block", in the sense that some reads may miss CPU cache. But once you get to that point, you could argue that every branch might block if the branch misprediction causes a pipeline stall. This is not a useful definition of "block".

The thing is, most programs are almost never low-level enough to be dealing with memory in that sense: they read and write virtual memory. And virtual memory can block for any number of reasons, including some pretty non-obvious ones like. For example:

- the system is under memory pressure and that page is no longer in RAM because it got written to a swap file

- the system is under memory pressure and that page is no longer in RAM because it was a read-only mapping from a file and could be purged

-- e.g. it's part of your executable's code

- this is your first access to a page of anonymous virtual memory and the kernel hadn't needed to allocate a physical page until now

- you're in a VM and the VMM can do whatever it wants

- the page is COW from another process

kentonv|3 years ago

> This is not a useful definition of "block".

I think what I'm saying is that calling file I/O "blocking" is also not a useful definition of "block". Because I don't really see the fundamental difference between "we have to wait for main memory to respond" and "we have to wait for disk to respond".

> this is your first access to a page of anonymous virtual memory and the kernel hadn't needed to allocate a physical page until now

And said allocation could block on all sorts of things you might not expect. Once upon a time I helped debug a problem where memory allocation would block waiting for the XFS filesystem driver to flush dirty inodes to disk. Our system generated lots of dirty inodes, and we were seeing programs randomly hang on allocation for minutes at a time.

cout|3 years ago

The reason memory access can block is because it can cause the page fault handler to be invoked (https://www.kernel.org/doc/gorman/html/understand/understand...). There are many reasons the page fault handler might cause the process to block. The kernel made need to swap the page in from disk, or it might be a copy-on-write page that was just written to.

To copy the page, the kernel needs to first find a free page frame. If there are no free page frames, the kernel will attempt to reclaim pages that are in use (https://www.kernel.org/doc/gorman/html/understand/understand...). This may cause in process-mapped pages being swapped to disk, but it also may result in disk writeback activity (https://lwn.net/Articles/396561/). In either case, control cannot be returned to the process until there is a free page to map into the process.

p12tic|3 years ago

It's complicated, memory accesses can really block for relatively long periods of time.

Consider that regular memory access via cache takes around 1 nanosecond.

If the data is not in top-level cache, then we're looking at roughly 10 nanoseconds access latency.

If the data is not in cache at all, we are looking into 50-150 nanoseconds access latency.

If the data is in memory, but that memory is attached to another CPU socket, it's even more latency.

Finally, if the data access is via atomic instruction and there are many other CPUs accessing the same memory location, then the latency can be as high as 3000 nanoseconds.

It's not very hard to find NVMe attached storage that has latencies of tens of microseconds, which is not very far off memory access speeds.

slashdev|3 years ago

I just want to add to your explanation, that even in the absence of hard paging from disk, you can have soft page faults where the kernel modifies the page table entries or assigns a memory page, or copies a copy on write page, etc.

In addition to the cache misses you mention there's also TLB misses.

Memory is not actually random access, locality matters a lot. SSDs reads, on the other hand, are much closer to random access, but much more expensive.