top | item 12022200

(no title)

omellet | 9 years ago

The problem isn't locking so much, it's that you have to dispatch to a kernel thread when you're requesting and sending data, paying the cost of that context switch every time. In userspace you can spin a polling thread on its own core and DMA data up and down to the hardware all day long without yielding your thread to another one.

discuss

bogomipz|9 years ago

The kernel is mapped into the top of the address space of each user spaces process. That is generally pretty efficient which is why it is done.

hendzen|9 years ago

sure, that saves you from dumping TLB state - but you still need to save register state, copy data from a user supplied buffer in to a kernel-owned device-mapped buffer - wiping L1 data and instruction caches in the process.

For 99% of use cases this isn't a problem, but if you're trying to save every possible microsecond, then it definitely does.