Monday, August 14, 2023

JVM Compares Strings Using the pcmpestri x86 Instruction

Jackson Davis (2016, tweet, Hacker News):

String.compareTo is one of a few methods that is important enough to also get a special hand-rolled assembly version.


Introduced in SSE4.2, pcmpestri is a member of the pcmpxstrx family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough to get their own subsection in the x86 ISR. […] Now that’s really putting the C in CISC!


If this wasn’t complicated enough for you, have a quick gander at the indexOfimplementations (there are 2, depending on the size of the matching string), which use control byte 0x0d, which does “equal ordered” (aka substring) matching.

It sounds like it only compares the Unicode code points, so that equivalent precomposed and decomposed strings are not considered equal.


One thing I learned about pcmpxstrx is that it’s surprisingly slow: latency of 10-11 cycles and reciprocal throughput of 3-5 cycles on Haswell according to Agner’s tables, depending on the precise instruction variant. The instructions are also limited in the ALU ports they can use. Since AVX2 has made SIMD on x86 fairly flexible, it can sometimes not be worth using the string comparison instructions if simpler instructions suffice: even a slightly longer sequence of simpler SIMD instructions sometimes beats a single string compare.


1 Comment RSS · Twitter · Mastodon

Fun quirk of Java, pretty much all String operations work lexicographically. The API docs for compareTo call this out specifically.

To work with “text,” you need to use the Collator class.

This is little known, hard-won knowledge that is unfortunately not often shared.

Leave a Comment