{"id":8842,"date":"2014-05-21T14:41:54","date_gmt":"2014-05-21T18:41:54","guid":{"rendered":"http:\/\/mjtsai.com\/blog\/?p=8842"},"modified":"2014-06-06T16:55:26","modified_gmt":"2014-06-06T20:55:26","slug":"making-dispatch_once-fast","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2014\/05\/21\/making-dispatch_once-fast\/","title":{"rendered":"Making dispatch_once() Fast"},"content":{"rendered":"<p>I had assumed that <code>dispatch_once()<\/code> was implemented as a basic atomic compare-and-swap, but the source for <a href=\"http:\/\/opensource.apple.com\/source\/libdispatch\/libdispatch-339.90.1\/src\/once.c\">dispatch_once_f<\/a> contains an interesting comment:<\/p>\n<blockquote cite=\"http:\/\/opensource.apple.com\/source\/libdispatch\/libdispatch-339.90.1\/src\/once.c\"><p> Normally, a barrier on the read side is used to workaround\n the weakly ordered memory model. But barriers are expensive\n and we only need to synchronize once! After func(ctxt)\n completes, the predicate will be marked as &ldquo;done&rdquo; and the\n branch predictor will correctly skip the call to\n dispatch_once*().<\/p>\n<p>A far faster alternative solution: Defeat the speculative\n read-ahead of peer CPUs.<\/p>\n<p>Modern architectures will throw away speculative results\n once a branch mis-prediction occurs. Therefore, if we can\n ensure that the predicate is not marked as being complete\n until long after the last store by func(ctxt), then we have\n defeated the read-ahead of peer CPUs.<\/p>\n<p>In other words, the last &ldquo;store&rdquo; by func(ctxt) must complete\n and then N cycles must elapse before ~0l is stored to *val.\n The value of N is whatever is sufficient to defeat the\n read-ahead mechanism of peer CPUs.<\/p>\n<p>On some CPUs, the most fully synchronizing instruction might\n need to be issued.<\/p><\/blockquote>\n<p><em>N<\/em> is determined by <code>dispatch_atomic_maximally_synchronizing_barrier()<\/code>, which has different assembly language <a href=\"http:\/\/www.opensource.apple.com\/source\/libdispatch\/libdispatch-339.90.1\/src\/shims\/atomic.h\">implementations<\/a> for different architectures.<\/p>\n<p>Update (2014-05-28): <a href=\"http:\/\/stackoverflow.com\/questions\/13856037\/can-i-declare-dispatch-once-t-predicate-as-a-member-variable-instead-of-static\/19845164#19845164\">Greg Parker<\/a> explains a consequence of this optimization:<\/p>\n<blockquote cite=\"http:\/\/stackoverflow.com\/questions\/13856037\/can-i-declare-dispatch-once-t-predicate-as-a-member-variable-instead-of-static\/19845164#19845164\"><p><code>dispatch_once_t<\/code> must not be an instance variable. <\/p>\n<p>The implementation of <code>dispatch_once()<\/code> requires that the <code>dispatch_once_t<\/code> is zero, and <strong>has never been non-zero<\/strong>. The previously-not-zero case would need additional memory barriers to work correctly, but <code>dispatch_once()<\/code> omits those barriers for performance reasons.<\/p>\n<p>Instance variables are initialized to zero, but their memory may have previously stored another value. This makes them unsafe for <code>dispatch_once()<\/code> use.<\/p><\/blockquote>\n<p>Update (2014-06-06): <a href=\"https:\/\/www.mikeash.com\/pyblog\/friday-qa-2014-06-06-secrets-of-dispatch_once.html\">Mike Ash<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.mikeash.com\/pyblog\/friday-qa-2014-06-06-secrets-of-dispatch_once.html\">While the comment in the <code>dispatch_once<\/code> source code is fascinating and informative, it doesn&rsquo;t quite delve into the detail that some would like to see. Since this is one of my favorite hacks, for today&rsquo;s article I&rsquo;m going to discuss exactly what&rsquo;s going on there and how it all works.<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>I had assumed that dispatch_once() was implemented as a basic atomic compare-and-swap, but the source for dispatch_once_f contains an interesting comment: Normally, a barrier on the read side is used to workaround the weakly ordered memory model. But barriers are expensive and we only need to synchronize once! After func(ctxt) completes, the predicate will be [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"","apple_news_api_id":"","apple_news_api_modified_at":"","apple_news_api_revision":"","apple_news_api_share_url":"","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[770,45,800,880,31,30,74,138,71],"class_list":["post-8842","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-assembly-language","tag-c","tag-concurrency","tag-grand-central-dispatch-gcd","tag-ios","tag-mac","tag-opensource","tag-optimization","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/8842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=8842"}],"version-history":[{"count":5,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/8842\/revisions"}],"predecessor-version":[{"id":8929,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/8842\/revisions\/8929"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=8842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=8842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=8842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}