Wednesday, January 22, 2020

The Hunt for the Fastest Zero

Daniel Lemire (via Raffaele Fragapane, Hacker News):

Typically, to fill an array with some value, C++ programmers invoke the std::fill. We might assume that for a task as simple as filling an array with zeroes, the C++ standard library would provide the absolute best performance. However, with GNU GCC compilers, that is not the case.
The following line of C++ code gets compiled as a loop that fills each individual byte with zeroes when applying the conventional -O2 optimization flag.
std::fill(p, p + n, 0);
When the array is large, it can become inefficient compared to a highly efficient implementation like the C function like memset.
memset(p, 0, n);

Zeroing memory can actually become a bottleneck.

Travis Downs (Hacker News):

Now we see how the memset appears. It is called explicitly by the second implementation shown above, selected by enable_if when the SFINAE condition __is_byte<_Tp> is true. Note, however, that unlike the general function, this variant has a single template argument: template<typename _Tp>, and the function signature is:
__fill_a(_Tp* __first, _Tp* __last, const_Tp& __c)
Hence, it will only be considered when the __first and __last pointers which delimit the range have the exact same type as the value being filled. When when you write std::fill(p, p + n, 0) where p is char *, you rely on template type deduction for the parameters, which ends up deducing char * and int for the iterator type and value-to-fill type, because 0 is an integer constant.

C++ Programming Language Clang Compiler GNU Compiler Collection (GCC) Optimization Programming

The Hunt for the Fastest Zero

Comments RSS · Twitter

Leave a Comment