{"id":30813,"date":"2020-11-25T16:58:08","date_gmt":"2020-11-25T21:58:08","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=30813"},"modified":"2020-12-24T13:20:08","modified_gmt":"2020-12-24T18:20:08","slug":"libdispatchs-unmet-promise","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2020\/11\/25\/libdispatchs-unmet-promise\/","title":{"rendered":"libdispatch&rsquo;s Unmet Promise"},"content":{"rendered":"<p><a href=\"https:\/\/tclementdev.com\/posts\/what_went_wrong_with_the_libdispatch.html\">Thomas Clement<\/a>:<\/p>\n<blockquote cite=\"https:\/\/tclementdev.com\/posts\/what_went_wrong_with_the_libdispatch.html\"><p>Apple demonstrated the libdispatch and the promise seemed great, they introduced the notion of serial queues and told us that we should stop thinking in term of threads and start thinking in term of queues. We would submit various program tasks to be executed serially or concurrently and the libdispatch would do the rest, automatically scaling based on the available hardware. Queues were cheap, we could have a lot of them. I actually remember very vividly a Q&amp;A at the end of one of the WWDC sessions, a developer got to the mic and asked how many queues we could have in a program, how cheap were they really? The Apple engineer on stage answered that most of the queue size was basically the debug label that the developer would pass to it at creation time. We could have thousands of them without a problem.<\/p>\n<p>[&#8230;]<\/p>\n<p>Then the problems started. We ran into thread explosion which was really surprising because we were told that the libdispatch would automatically scale based on the available hardware so we expected the number of threads to more or less match the number of cores in the machine. A younger me in 2010 <a href=\"https:\/\/lists.macosforge.org\/pipermail\/libdispatch-dev\/2010-October\/000424.html\">asked<\/a> for help on the libdispatch mailing-list and the response from Apple at the time was to <a href=\"https:\/\/lists.macosforge.org\/pipermail\/libdispatch-dev\/2010-October\/000425.html\">remove synchronization points<\/a> and <a href=\"https:\/\/lists.macosforge.org\/pipermail\/libdispatch-dev\/2010-October\/000428.html\">go async all the way<\/a>.<\/p>\n<p>As we went down that rabbit hole, things got progressively worse. Async functions have the bad habit of contaminating other functions: because a function can&rsquo;t call another async function and return a result without being async itself, entire chain calls had to be turned async.<\/p>\n<p>[&#8230;]<\/p>\n<p>Turns out Apple engineers are developers just like us and met the exact same problems that we did. [&#8230;] An Apple engineer also <a href=\"https:\/\/twitter.com\/Catfish_Man\/status\/1081673457182490624\">revealed<\/a> that a lot of the perf wins in iOS 12 were from daemons going single-threaded.<\/p>\n<p>[&#8230;]<\/p>\n<p>Now I&rsquo;m a bit worried because I see all those shiny new things that Apple is planning to add into the Swift language and I wonder what might happen this time.<\/p><\/blockquote>\n\n<p>Via <a href=\"https:\/\/twitter.com\/steipete\/status\/1330941683626815496\">Peter Steinberger<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/steipete\/status\/1330941683626815496\">\n<p>Please see past the clickbaity title. It failed to deliver on the promise. It&rsquo;s still incredibly useful. It&rsquo;s just dangerous that the documentation wasn&rsquo;t updated to reflect this.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/gregtitus\/status\/1330940554897625088\">Greg Titus<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/gregtitus\/status\/1330940554897625088\">\n<p>To call any technology a failure because it was initially over-promised would leave pretty much no successes ever.<\/p>\n<p>Coding under Dispatch is a lot nicer than pthreads or <code>NSThread<\/code>\/<code>NSLock<\/code>, which were the options on the platform before its debut. By my definition that&rsquo;d be success.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/alexisgallagher\/status\/1330945607544946693\">Alexis Gallagher<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/alexisgallagher\/status\/1330945607544946693\">\n<p>P1. Task queues will be easier than threads &amp; locks.<\/p>\n<p>P2. libdispatch can handle many queues <em>and<\/em> it is sensible to organize a program that way.<\/p>\n<p>Could be we agree that P1 was true but P2 proved false for a mix of performance and programming model complexity reasons.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/grynspan\/status\/1330915541293412358\">Jonathan Grynspan<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/grynspan\/status\/1330915541293412358\">\n<p>I say: dispatch long-running (like, seconds or more) tasks off the UI thread, including as much I\/O as possible. Everything else can run on one thread. Other processes can use other cores. Enforce in API by making most stuff sync but long tasks async with a completion handler.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/Catfish_Man\/status\/1330945552540790784\">David Smith<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/Catfish_Man\/status\/1330945552540790784\">\n<p>Personally I currently prefer a small number of queues (or workloops!) for execution contexts and unfair locks for protecting state. For example cfprefsd uses<a href=\"https:\/\/twitter.com\/Catfish_Man\/status\/1330945702776557568\">*<\/a> a two queue model (&ldquo;request processing&rdquo; and &ldquo;async IO&rdquo; queues), but fine grained locking.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/mpweiher\/status\/1331207988774903808\">Marcel<\/a> <a href=\"https:\/\/twitter.com\/mpweiher\/status\/1331208104600621056\">Weiher<\/a> (quoting his excellent <a href=\"http:\/\/www.amazon.com\/dp\/0321842847\/?tag=michaeltsai-20\">iOS and macOS Performance Tuning<\/a>):<\/p>\n\n<blockquote cite=\"https:\/\/twitter.com\/mpweiher\/status\/1331207988774903808\">\n<p>Due to the pretty amazing single-core performance of today&rsquo;s CPUs, it turns out that the vast majority of CPU performance problems are not, in fact, due to limits of the CPU, but rather due to sub-optimal program organization<\/p>\n<\/blockquote>\n\n<blockquote cite=\"https:\/\/twitter.com\/mpweiher\/status\/1331208104600621056\">\n<p>In the end, I&rsquo;ve rarely had to use multi-threading for speeding up a CPU-bound task in anger, and chances are good that I would have made my code slower rather than faster. The advice to never optimize without measuring as you go along goes double for multi-threading.<\/p>\n<\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2020\/11\/03\/swift-concurrency-roadmap\/\">Swift Concurrency Roadmap<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2019\/08\/06\/practical-concurrency-some-rules\/\">Practical Concurrency: Some Rules<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/10\/30\/locks-thread-safety-and-swift-2017-edition\/\">Locks, Thread Safety, and Swift: 2017 Edition<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2016\/10\/07\/os_unfair_lock\/\">os_unfair_lock<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2014\/05\/12\/how-to-efficiently-read-thousands-of-small-files-with-gcd\/\">How to Efficiently Read Thousands of Small Files With GCD<\/a><\/li>\n<\/ul>\n\n<p id=\"libdispatchs-unmet-promise-update-2020-11-30\">Update (2020-11-30): <a href=\"https:\/\/twitter.com\/davezarzycki\/status\/1332398323328819201\">David Zarzycki<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/davezarzycki\/status\/1332398323328819201\"><p>As the designer of libdispatch, I just want to say: I get why people feel this way and I&rsquo;m sorry that we\/I oversold libdispatch to some degree at the time.<\/p><p>(And just to be clear, I left Apple many years ago and I still deeply respect them.)<\/p><p>I also feel bad because I knew that blocking was a pain point and I had plans\/ideas for how to minimize that pain but I burned out after GCD 1.0 and took a leave of absence. I don&rsquo;t think those ideas ever got recorded or implemented. So ya, I&rsquo;m sorry about that too.<\/p><p>That being said, what we had before libdispatch was awful. POSIX and derived threading APIs are both more difficult to use and more inefficient. I do feel proud that we made life easier for people in this regard and helped people clean up their existing threading code.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/nebelch\/status\/1332793664372817920\">Chris Nebel<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/nebelch\/status\/1332793664372817920\"><p>Maybe you can clear something up for me: these days, we&rsquo;re advised to not use concurrent queues, because the system will start a thread for every block in the queue because it has no idea which blocks depend on which others to make progress.  Fair enough, but as I recall the initial presentations, concurrent queues only promised <em>some<\/em> amount of concurrency, where &ldquo;some&rdquo; might be &ldquo;none&rdquo;, meaning that if you deadlocked a concurrent queue, it would be your fault, not the system&rsquo;s.  Did something change, or am I misremembering?<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/pedantcoder\/status\/1332796587836276736\">Pierre Habouzit<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/pedantcoder\/status\/1332796587836276736\">\n<p>that&rsquo;s how one was told they worked, but never did. if a work item blocks on a concurrent queue, you get more threads eventually.<\/p>\n<p>We now recognize some blocking situations as being due to contention and excluded such blocking points from the policy, but it only goes so far.<\/p>\n<p>the other problem is that the concurrent queue pool (non overcommit threads in the implementation parlance) are a shared singleton which makes using them correctly fraught with peril if you allow blocking on future work.<\/p>\n<p>This is why Swift Actors executors have to disallow it.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/davezarzycki\/status\/1332844008528359424\">David Zarzycki<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/davezarzycki\/status\/1332844008528359424\">\n<p>The overcommit queues were never supposed to exist. They were added as attempt to fix a bug on the single core Mac mini but later we found the root cause: workqueue threads defaulted to <code>SCHED_FIFO<\/code> instead of <code>SCHED_OTHER<\/code> and it was too late to remove overcommit queues before GM<\/p>\n<\/blockquote>\n\n<p id=\"libdispatchs-unmet-promise-update-2020-12-24\">Update (2020-12-24): <a href=\"https:\/\/inessential.com\/2020\/12\/23\/com_apple_network_boringssl_metrics_queu\">Brent Simmons<\/a> (<a href=\"https:\/\/twitter.com\/brentsimmons\/status\/1341871034962747393\">tweet<\/a>):<\/p>\n<blockquote cite=\"https:\/\/inessential.com\/2020\/12\/23\/com_apple_network_boringssl_metrics_queu\"><p>We&rsquo;ve been getting some reports that NetNewsWire for Mac will hang sometimes. A sample will report something like this[&#8230;] And there will be hundreds of threads labelled <code>com.apple.network.boringssl.metrics_queue<\/code>.<\/p><\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Thomas Clement: Apple demonstrated the libdispatch and the promise seemed great, they introduced the notion of serial queues and told us that we should stop thinking in term of threads and start thinking in term of queues. We would submit various program tasks to be executed serially or concurrently and the libdispatch would do the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2020-11-25T21:58:10Z","apple_news_api_id":"76a9e98c-be90-453e-bfc6-c295ffd11dfe","apple_news_api_modified_at":"2020-12-24T18:20:12Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAg==","apple_news_api_share_url":"https:\/\/apple.news\/AdqnpjL6QRT6_xsKV_9Ed_g","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[800,164,880,31,1837,30,1891,138,71],"class_list":["post-30813","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-concurrency","tag-documentation","tag-grand-central-dispatch-gcd","tag-ios","tag-ios-14","tag-mac","tag-macos-11-0","tag-optimization","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/30813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=30813"}],"version-history":[{"count":4,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/30813\/revisions"}],"predecessor-version":[{"id":31139,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/30813\/revisions\/31139"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=30813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=30813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=30813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}