The Hunt for the Fastest Zero
Daniel Lemire (via Raffaele Fragapane, Hacker News):
Typically, to fill an array with some value, C++ programmers invoke the
std::fill
. We might assume that for a task as simple as filling an array with zeroes, the C++ standard library would provide the absolute best performance. However, with GNU GCC compilers, that is not the case.The following line of C++ code gets compiled as a loop that fills each individual byte with zeroes when applying the conventional -O2 optimization flag.
std::fill(p, p + n, 0);When the array is large, it can become inefficient compared to a highly efficient implementation like the C function like
memset
.memset(p, 0, n);
Zeroing memory can actually become a bottleneck.
Now we see how the
memset
appears. It is called explicitly by the second implementation shown above, selected byenable_if
when the SFINAE condition__is_byte<_Tp>
is true. Note, however, that unlike the general function, this variant has a single template argument:template<typename _Tp>
, and the function signature is:__fill_a(_Tp* __first, _Tp* __last, const_Tp& __c)Hence, it will only be considered when the
__first
and__last
pointers which delimit the range have the exact same type as the value being filled. When when you writestd::fill(p, p + n, 0)
wherep
ischar *
, you rely on template type deduction for the parameters, which ends up deducingchar *
andint
for the iterator type and value-to-fill type, because0
is an integer constant.