Have you actually benchmarked a context switch on modern hardware? A full switch (including register spilling and page table swap) can be had in <150 cycles on even cheap, older Arm A-series cores like the Cortex A72. We're not still living in a world where a context switch forces you to flush the TLB, you literally just have to pay the cost for the trap, spill, page table swap, unspill, and return. This cost is even lower of modern ARM processors which support speculative exceptions where you can perform the entire context switch speculatively.
paulmd|3 years ago
https://www.phoronix.com/news/Netflix-NUMA-FreeBSD-Optimized
https://2019.eurobsdcon.org/slides/NUMA%20Optimizations%20in...
zorgmonkey|3 years ago
ilyt|3 years ago