Monday, January 7, 2019 [Tweets] [Favorites]

When to Use dispatch_async()

Pierre Habouzit:

Re the last discussions, dispatch_async() can be used for 3 different things:

(1) asynchronous state machines (onto the same queue hierarchy), which is a way to address C10k and is fast


(2) getting concurrency (a better pthread_create())

(3) parallelism (dispatch_apply()


(1) provided you use dispatch_async_f for the shortest things to avoid allocating blocks, dispatch is fast, and it’s great almost whatever the size of your workitem (assuming you do something meaningful).

(2-3) is way tricker than it looks:

your workitem needs to represent enough work (100µs at the very least, 1ms is best)

your workitems if running concurrently need not to contend, else your perf sinks dramatically.

Contention takes many forms.


As we presented in WWDC’17: go serial first, and as you find performance bottle necks, measure why, and if concurrency helps, apply with care, always validating under system pressure (such as iOS low power mode to name one).

We have repeatedly measured that inefficient concurrency is commonly a 2x cost in time to completion.

It is not a 2x cost in instructions count though, it’s just that concurrency kills your IPC rate and you spend a lot of time just waiting.

Lastly be very careful with micro benchmarks: calling your code 1M times in a loop makes the CPU ridiculously better at running it and while improving micro benchmarks is good, you should always have a macro benchmark to validate that it's a good idea.

David Smith:

What I’m getting at is that we’ve been discovering that the inherent costs of multithreading are a lot higher than they look in microbenchmarks (because microbenches hide cache effects and keep thread pools hot). A lot of iOS 12 perf wins were from daemons going single-threaded.

See also: Modernizing Grand Central Dispatch Usage.


Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment