MCS Locks and qspinlocks
Jonathan Corbet (via Kyle Sluder):
Note that the use of an atomic swap operation on the main lock means that only CPU 2 can have a pointer to CPU 1’s mcs_spinlock structure. So there is no need for atomic operations when making changes to that structure, though some careful programming is still needed to make sure that changes are visible to CPU 1 at the right times.
Once this assignment is done, CPU 2 will spin on the locked value in its ownmcs_spinlock structure rather than the value in the main lock. Its spinning will thus be entirely CPU-local, not touching the main lock at all. This process can go on indefinitely as contention for the lock increases, with each CPU placing itself in line behind those that are already there, and each CPU spinning on its own copy of the lock. The pointer in the “main” lock, thus, always indicates the tail of the queue of waiting CPUs.
[…]
An MCS lock, thus, is somewhat more complicated than a regular spinlock. But that added complexity removes much of the cache-line bouncing from the contended case; it also is entirely fair, passing the lock to each CPU in the order that the CPUs arrived.