Monday, January 6, 2020

Beware Spinlocks in User Space

Malte Skarupke (via Shantonu Sen, Niels Broekhuijsen):

I overheard somebody at work complaining about mysterious stalls while porting Rage 2 to Stadia. […] The only thing those mysterious stalls had in common was that they were all using spinlocks. I was curious about that because I happened to be the person who wrote the spinlock we were using. The problem was that there was a thread that spent several milliseconds trying to acquire a spinlock at a time when no other thread was holding the spinlock. Let me repeat that: The spinlock was free to take yet a thread took multiple milliseconds to acquire it. […] In our case we were able to make the problem go away by replacing spinlocks with mutexes, but that leads to the question: How do you even measure whether a spinlock is better than a mutex, and what makes a good spinlock?

Linus Torvalds (Hacker News):

So now you still hold the lock, but you got scheduled away from the CPU, because you had used up your time slice. The “current time” you read is basically now stale, and has nothing to do with the (future) time when you are actually going to release the lock.

Somebody else comes in and wants that “spinlock”, and that somebody will now spin for a long while, since nobody is releasing it - it’s still held by that other thread entirely that was just scheduled out. At some point, the scheduler says “ok, now you’ve used your time slice”, and schedules the original thread, and now the lock is actually released. Then another thread comes in, gets the lock again, and then it looks at the time and says “oh, a long time passed without the lock being held at all”.

[…]

You’re just getting random values because different schedulers have different heuristics for “do I want to let CPU bound processes use long time slices or not”?

[…]

Notice, how when the author uses an actual std::mutex, things just work fairly well, and regardless of scheduler. Because now you’re doing what you’re supposed to do. Yeah, the timing values might still be off - bad luck is bad luck - but at least now the scheduler is aware that you’re “spinning” on a lock.

Malte Skarupke:

Once we break it down like that we realize that actually these are all the same case. In all of these cases one thread can run, all others are calling yield(). The only difference between the case that I wanted to measure and the other two accidental cases is whether the scheduler is incorrectly not running thread C or incorrectly not running thread N. In either case all other fifteen threads are just calling yield().

So your claim is that it’s a problem that I try to measure how long it takes for thread N to run even though I might accidentally be measuring how long it takes for thread C to run. But I claim that that’s fine because these are all equally problematic. One thread could run, all other threads are yielding, yet that one thread is not running. And we don’t care whether the thread that could run is thread N or thread C.

Linus Torvalds:

The problem with that is “yield” is pretty much undefined. The definition of it is literally about single queue of a real-time behavior with a real-time scheduler with priorities.

[…]

What you want to use it for is “schedule the right process”. But you don’t even know what the right process is, or if you do you don’t tell the system (because sched_yield() literally doesn’t have that interface), so the kernel has to guess.

Previously:

Comments RSS · Twitter

Leave a Comment