Wednesday, November 25, 2020

libdispatch’s Unmet Promise

Thomas Clement:

Apple demonstrated the libdispatch and the promise seemed great, they introduced the notion of serial queues and told us that we should stop thinking in term of threads and start thinking in term of queues. We would submit various program tasks to be executed serially or concurrently and the libdispatch would do the rest, automatically scaling based on the available hardware. Queues were cheap, we could have a lot of them. I actually remember very vividly a Q&A at the end of one of the WWDC sessions, a developer got to the mic and asked how many queues we could have in a program, how cheap were they really? The Apple engineer on stage answered that most of the queue size was basically the debug label that the developer would pass to it at creation time. We could have thousands of them without a problem.

[…]

Then the problems started. We ran into thread explosion which was really surprising because we were told that the libdispatch would automatically scale based on the available hardware so we expected the number of threads to more or less match the number of cores in the machine. A younger me in 2010 asked for help on the libdispatch mailing-list and the response from Apple at the time was to remove synchronization points and go async all the way.

As we went down that rabbit hole, things got progressively worse. Async functions have the bad habit of contaminating other functions: because a function can’t call another async function and return a result without being async itself, entire chain calls had to be turned async.

[…]

Turns out Apple engineers are developers just like us and met the exact same problems that we did. […] An Apple engineer also revealed that a lot of the perf wins in iOS 12 were from daemons going single-threaded.

[…]

Now I’m a bit worried because I see all those shiny new things that Apple is planning to add into the Swift language and I wonder what might happen this time.

Via Peter Steinberger:

Please see past the clickbaity title. It failed to deliver on the promise. It’s still incredibly useful. It’s just dangerous that the documentation wasn’t updated to reflect this.

Greg Titus:

To call any technology a failure because it was initially over-promised would leave pretty much no successes ever.

Coding under Dispatch is a lot nicer than pthreads or NSThread/NSLock, which were the options on the platform before its debut. By my definition that’d be success.

Alexis Gallagher:

P1. Task queues will be easier than threads & locks.

P2. libdispatch can handle many queues and it is sensible to organize a program that way.

Could be we agree that P1 was true but P2 proved false for a mix of performance and programming model complexity reasons.

Jonathan Grynspan:

I say: dispatch long-running (like, seconds or more) tasks off the UI thread, including as much I/O as possible. Everything else can run on one thread. Other processes can use other cores. Enforce in API by making most stuff sync but long tasks async with a completion handler.

David Smith:

Personally I currently prefer a small number of queues (or workloops!) for execution contexts and unfair locks for protecting state. For example cfprefsd uses* a two queue model (“request processing” and “async IO” queues), but fine grained locking.

Marcel Weiher (quoting his excellent iOS and macOS Performance Tuning):

Due to the pretty amazing single-core performance of today’s CPUs, it turns out that the vast majority of CPU performance problems are not, in fact, due to limits of the CPU, but rather due to sub-optimal program organization

In the end, I’ve rarely had to use multi-threading for speeding up a CPU-bound task in anger, and chances are good that I would have made my code slower rather than faster. The advice to never optimize without measuring as you go along goes double for multi-threading.

Previously:

Update (2020-11-30): David Zarzycki:

As the designer of libdispatch, I just want to say: I get why people feel this way and I’m sorry that we/I oversold libdispatch to some degree at the time.

(And just to be clear, I left Apple many years ago and I still deeply respect them.)

I also feel bad because I knew that blocking was a pain point and I had plans/ideas for how to minimize that pain but I burned out after GCD 1.0 and took a leave of absence. I don’t think those ideas ever got recorded or implemented. So ya, I’m sorry about that too.

That being said, what we had before libdispatch was awful. POSIX and derived threading APIs are both more difficult to use and more inefficient. I do feel proud that we made life easier for people in this regard and helped people clean up their existing threading code.

Chris Nebel:

Maybe you can clear something up for me: these days, we’re advised to not use concurrent queues, because the system will start a thread for every block in the queue because it has no idea which blocks depend on which others to make progress. Fair enough, but as I recall the initial presentations, concurrent queues only promised some amount of concurrency, where “some” might be “none”, meaning that if you deadlocked a concurrent queue, it would be your fault, not the system’s. Did something change, or am I misremembering?

Pierre Habouzit:

that’s how one was told they worked, but never did. if a work item blocks on a concurrent queue, you get more threads eventually.

We now recognize some blocking situations as being due to contention and excluded such blocking points from the policy, but it only goes so far.

the other problem is that the concurrent queue pool (non overcommit threads in the implementation parlance) are a shared singleton which makes using them correctly fraught with peril if you allow blocking on future work.

This is why Swift Actors executors have to disallow it.

David Zarzycki:

The overcommit queues were never supposed to exist. They were added as attempt to fix a bug on the single core Mac mini but later we found the root cause: workqueue threads defaulted to SCHED_FIFO instead of SCHED_OTHER and it was too late to remove overcommit queues before GM

Update (2020-12-24): Brent Simmons (tweet):

We’ve been getting some reports that NetNewsWire for Mac will hang sometimes. A sample will report something like this[…] And there will be hundreds of threads labelled com.apple.network.boringssl.metrics_queue.

8 Comments RSS · Twitter

> In the end, I’ve rarely had to use multi-threading for speeding up a CPU-bound task in anger, and chances are good that I would have made my code slower rather than faster.

The main usage of multithreading in Desktop and Phone Apps is not for performances, but to avoid blocking the UI thread while performing task with indeterministic execution time like disk and network IO, IPC, etc.

In this regard, libdispatch is a great success. There is some pitfall (thread explosion if you don't setup queues carefully), but it is still a lot easier than using any other alternative.

In my app Find Any File, when users want to trash items, I simply pass them to `[NSFileManager trashItemAtURL:...]`.

But when they trash several 1000 of files at once, the program eventually crashes because the system creates a unique thread for each trash task, and sets no limit. So it eventually reaches the 10xx thread limit and bails out.

I have noticed similar behavior with other takes of queues where I'd feed the tasks into the queue, and it gets insanely big, degrating performance. So I have to add my own code that blocks feeding more tasks when it's above a high water mark.

I shouldn't have to do that, IMO, because it only happens when things go out of hand. And this is usually a rare case, meaning that at initial implementation you're likely not aware of this trap, and then deliver an unstable app. Only later, when you get enough error reports, you may realize what goes wrong and add your own measures to prevent this. The trashItemAtURL: issue is an good example of where even an Apple engineer messed this up, apparently.

A good API design would take care of this automatically, because the API designer should know better about the pitfalls and design the API so that this special case is somehow taken care of.

For every libdispatch problem ever the answer is always: you are doing it wrong, go watch this video.

So what *is* the recommended way to do networking without blocking the UI thread? Making it async has always been callback hell, so I'm not reluctant to switch!

Thread pools and job systems, while not completely solved problems, are at least well-known and practiced ones. Why is libdispatch still falling down at the basics?

In the end, I’ve rarely had to use multi-threading for speeding up a CPU-bound task in anger, and chances are good that I would have made my code slower rather than faster.

Yup.

People vastly overestimate the usefulness — you’ll rarely have bottlenecks that are both CPU-bound and easy to parallelize.

(Turns out having a high single-threaded performance is useful.)

The main usage of multithreading in Desktop and Phone Apps is not for performances, but to avoid blocking the UI thread while performing task with indeterministic execution time like disk and network IO, IPC, etc.

Sure, but that’s really just… duo-threading, if you will. A UI thread, and a secondary thread that contains a queue for background tasks. Once I/O finishes, synchronize back to the UI thread.

(In the .NET world, the latter will technically be a pool of threads, but regardless, you typically won’t find yourself spawning a lot of parallelized work.)

Corollary to Nick's Answer: the exact WWDC video you want has probably already been taken down by Apple.

vintner: GCD *is* implemented as a thread pool. The issue is that Apple claimed it was so efficient we should use it for everything, even mutexes. Unfortunately, "threading is hard" still. GCD is a nice simple interface to a thread pool, and you just can't put all of multithreading behind a super simple interface and expect great results in every situation. Nobody else has managed to do that, either.

libdispatch is good at some "basics". It's not so good at other "basics". It depends on how you define "the basics".

Leave a Comment