{"id":27926,"date":"2020-01-22T16:19:40","date_gmt":"2020-01-22T21:19:40","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=27926"},"modified":"2020-01-22T16:19:40","modified_gmt":"2020-01-22T21:19:40","slug":"the-hunt-for-the-fastest-zero","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2020\/01\/22\/the-hunt-for-the-fastest-zero\/","title":{"rendered":"The Hunt for the Fastest Zero"},"content":{"rendered":"<p><a href=\"https:\/\/lemire.me\/blog\/2020\/01\/20\/filling-large-arrays-with-zeroes-quickly-in-c\/\">Daniel Lemire<\/a> (via <a href=\"https:\/\/twitter.com\/ThE_JacO\/status\/1219373485497077761\">Raffaele Fragapane<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=22104491\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/lemire.me\/blog\/2020\/01\/20\/filling-large-arrays-with-zeroes-quickly-in-c\/\"><p>Typically, to fill an array with some value, C++ programmers invoke the <code>std::fill<\/code>. We might assume that for a task as simple as filling an array with zeroes, the C++ standard library would provide the absolute best performance. However, with GNU GCC compilers, that is not the case.<\/p><p>The following line of C++ code gets compiled as a loop that fills each individual byte with zeroes when applying the conventional -O2 optimization flag.<\/p><pre>std::fill(p, p + n, 0);<\/pre><p>When the array is large, it can become inefficient compared to a highly efficient implementation like the C function like <code>memset<\/code>.<\/p><pre>memset(p, 0, n);<\/pre><\/blockquote>\n<p>Zeroing memory can actually become a bottleneck.<\/p>\n\n<p><a href=\"https:\/\/travisdowns.github.io\/blog\/2020\/01\/20\/zero.html\">Travis Downs<\/a> (<a href=\"https:\/\/news.ycombinator.com\/item?id=22104576\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/travisdowns.github.io\/blog\/2020\/01\/20\/zero.html\"><p>Now we see how the <code>memset<\/code> appears. It is called explicitly by the second implementation shown above, selected by <code>enable_if<\/code> when the SFINAE condition <code>__is_byte&lt;_Tp&gt;<\/code> is true. Note, however, that unlike the general function, this variant has a single template argument: <code>template&lt;typename _Tp&gt;<\/code>, and the function signature is:<\/p><pre>__fill_a(_Tp* __first, _Tp* __last, const_Tp&amp; __c)<\/pre><p>Hence, it will only be considered when the <code>__first<\/code> and <code>__last<\/code> pointers which delimit the range have the <em>exact same type as the value being filled<\/em>. When when you write <code>std::fill(p, p + n, 0)<\/code> where <code>p<\/code> is <code>char *<\/code>, you rely on template type deduction for the parameters, which ends up deducing <code>char *<\/code> and <code>int<\/code> for the iterator type and value-to-fill type, <em>because <code>0<\/code> is an integer constant<\/em>.<\/p><\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Daniel Lemire (via Raffaele Fragapane, Hacker News): Typically, to fill an array with some value, C++ programmers invoke the std::fill. We might assume that for a task as simple as filling an array with zeroes, the C++ standard library would provide the absolute best performance. However, with GNU GCC compilers, that is not the case.The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2020-01-22T21:19:43Z","apple_news_api_id":"5b9f7b7c-f110-4a9b-ab92-27af84461cc9","apple_news_api_modified_at":"2020-01-22T21:19:44Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/AW597fPEQSpurkievhEYcyQ","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[326,230,255,285,138,71],"class_list":["post-27926","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-c-plus-plus","tag-clang","tag-compiler","tag-gcc","tag-optimization","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/27926","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=27926"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/27926\/revisions"}],"predecessor-version":[{"id":27927,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/27926\/revisions\/27927"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=27926"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=27926"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=27926"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}