We have atomic bitwise operations already (look at glibc's mutex implementation), and the unit atomic operations work on is a 64-byte cache line. Cache lines are useful because reading 64 bytes isn't really more expensive but it improves sequential memory access by a lot.
No comments yet.