{"id":40327,"date":"2023-08-14T14:39:25","date_gmt":"2023-08-14T18:39:25","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=40327"},"modified":"2023-08-14T14:39:25","modified_gmt":"2023-08-14T18:39:25","slug":"jvm-compares-strings-using-the-pcmpestri-x86-instruction","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2023\/08\/14\/jvm-compares-strings-using-the-pcmpestri-x86-instruction\/","title":{"rendered":"JVM Compares Strings Using the pcmpestri x86 Instruction"},"content":{"rendered":"<p><a href=\"http:\/\/jcdav.is\/2016\/09\/01\/How-the-JVM-compares-your-strings\/\">Jackson Davis<\/a> (2016, <a href=\"https:\/\/twitter.com\/jcdavis\/status\/771399974231740416\">tweet<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=16089736\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"http:\/\/jcdav.is\/2016\/09\/01\/How-the-JVM-compares-your-strings\/\"><p><code>String.compareTo<\/code> is one of a few methods that is important enough to also get a special hand-rolled assembly version.<\/p><p>[&#8230;]<\/p><p>Introduced in SSE4.2, <code>pcmpestri<\/code> is a member of the <code>pcmpxstrx<\/code> family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough to get their own subsection in the x86 ISR. [&#8230;] Now that&rsquo;s <em>really<\/em> putting the C in CISC!<\/p><p>[&#8230;]<\/p><p>If this wasn&rsquo;t complicated enough for you, have a quick gander at the <a href=\"http:\/\/hg.openjdk.java.net\/jdk8\/jdk8\/hotspot\/file\/87ee5ee27509\/src\/cpu\/x86\/vm\/macroAssembler_x86.cpp#l5456\">indexOf<\/a><a href=\"http:\/\/hg.openjdk.java.net\/jdk8\/jdk8\/hotspot\/file\/87ee5ee27509\/src\/cpu\/x86\/vm\/macroAssembler_x86.cpp#l5305\">implementations<\/a> (there are 2, depending on the size of the matching string), which use control byte <code>0x0d<\/code>, which does &ldquo;equal ordered&rdquo; (aka substring) matching.<\/p><\/blockquote>\n\n<p>It sounds like it only compares the Unicode code points, so that equivalent precomposed and decomposed strings are not considered equal.<\/p>\n\n<p><a href=\"https:\/\/news.ycombinator.com\/item?id=16091712\">pcwalton<\/a>:<\/p>\n<blockquote cite=\"https:\/\/news.ycombinator.com\/item?id=16091712\"><p>One thing I learned about pcmpxstrx is that it&rsquo;s surprisingly slow: latency of 10-11 cycles and reciprocal throughput of 3-5 cycles on Haswell according to Agner&rsquo;s tables, depending on the precise instruction variant. The instructions are also limited in the ALU ports they can use. Since AVX2 has made SIMD on x86 fairly flexible, it can sometimes not be worth using the string comparison instructions if simpler instructions suffice: even a slightly longer sequence of simpler SIMD instructions sometimes beats a single string compare.<\/p><\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/08\/08\/unicode-is-harder-than-you-think\/\">Unicode Is Harder Than You Think<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2014\/07\/06\/strings-in-swift\/\">Strings in Swift<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Jackson Davis (2016, tweet, Hacker News): String.compareTo is one of a few methods that is important enough to also get a special hand-rolled assembly version.[&#8230;]Introduced in SSE4.2, pcmpestri is a member of the pcmpxstrx family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2023-08-14T18:39:28Z","apple_news_api_id":"b89ee6f3-85cc-49e1-8e6c-37cfe030f296","apple_news_api_modified_at":"2023-08-14T18:39:28Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/AuJ7m84XMSeGObDfP4DDylg","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[770,261,84,138,260,258],"class_list":["post-40327","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-assembly-language","tag-intel","tag-java","tag-optimization","tag-processors","tag-unicode"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/40327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=40327"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/40327\/revisions"}],"predecessor-version":[{"id":40328,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/40327\/revisions\/40328"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=40327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=40327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=40327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}