JVM Compares Strings Using the pcmpestri x86 Instruction
Jackson Davis (2016, tweet, Hacker News):
String.compareTo
is one of a few methods that is important enough to also get a special hand-rolled assembly version.[…]
Introduced in SSE4.2,
pcmpestri
is a member of thepcmpxstrx
family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough to get their own subsection in the x86 ISR. […] Now that’s really putting the C in CISC![…]
If this wasn’t complicated enough for you, have a quick gander at the indexOfimplementations (there are 2, depending on the size of the matching string), which use control byte
0x0d
, which does “equal ordered” (aka substring) matching.
It sounds like it only compares the Unicode code points, so that equivalent precomposed and decomposed strings are not considered equal.
One thing I learned about pcmpxstrx is that it’s surprisingly slow: latency of 10-11 cycles and reciprocal throughput of 3-5 cycles on Haswell according to Agner’s tables, depending on the precise instruction variant. The instructions are also limited in the ALU ports they can use. Since AVX2 has made SIMD on x86 fairly flexible, it can sometimes not be worth using the string comparison instructions if simpler instructions suffice: even a slightly longer sequence of simpler SIMD instructions sometimes beats a single string compare.
Previously: