{"id":31785,"date":"2021-03-04T20:40:59","date_gmt":"2021-03-05T01:40:59","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=31785"},"modified":"2021-03-04T20:40:59","modified_gmt":"2021-03-05T01:40:59","slug":"accidentally-quadratic-parsing-with-sscanf","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2021\/03\/04\/accidentally-quadratic-parsing-with-sscanf\/","title":{"rendered":"Accidentally Quadratic Parsing With sscanf"},"content":{"rendered":"<p><a href=\"https:\/\/nee.lv\/2021\/02\/28\/How-I-cut-GTA-Online-loading-times-by-70\/\">T0ST<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=26296339\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/nee.lv\/2021\/02\/28\/How-I-cut-GTA-Online-loading-times-by-70\/\"><p>GTA Online. <a href=\"https:\/\/www.reddit.com\/r\/gtaonline\/comments\/9vgo0g\/how_the_fuck_are_20_minute_load_times_acceptable\/\">Infamous<\/a> for its slow loading times. Having picked up the game again to finish some of the newer heists I was <em>shocked<\/em> (\/s) to discover that it still loads just as slow as the day it was released 7 years ago.<\/p><p>[&#8230;]<\/p><p>Enter stack sampling: for closed source applications there&rsquo;s only one option. Dump the running process&rsquo; stack and current instruction pointer&rsquo;s location to build a calling tree in set intervals. Then add them up to get statistics on what&rsquo;s going on.<\/p><p>[&#8230;]<\/p><p>Disassembling the now-less-obfuscated dump reveals that one of the addresses has a label pulled out of somewhere! It&rsquo;s <code>strlen<\/code>? Going down the call stack the next one is labeled <code>vscan_fn<\/code> and after that the labels end, tho I&rsquo;m fairly confident it&rsquo;s <a href=\"https:\/\/github.com\/chakra-core\/ChakraCore\/blob\/master\/pal\/src\/safecrt\/sscanf.c#L47\"><code>sscanf<\/code><\/a>.<\/p><p>It&rsquo;s parsing something. Parsing what? Untangling the disassembly would take forever so I decided to dump some samples from the running process using <a href=\"https:\/\/x64dbg.com\/\">x64dbg<\/a>. Some debug-stepping later it turns out it&rsquo;s&#8230; JSON!<\/p>\n<p>[&#8230;]<\/p>\n<p>To be fair I had no idea most <code>sscanf<\/code> implementations called <code>strlen<\/code> so I can&rsquo;t blame the developer who wrote this. I would assume it just scanned byte by byte and could stop on a <code>NULL<\/code>.<\/p><\/blockquote>\n\n<p>And then there&rsquo;s another quadratic array membership test.<\/p>\n\n<p><a href=\"https:\/\/twitter.com\/realmrmichaelrb\/status\/1366245113266118658\">Michael Brown<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/realmrmichaelrb\/status\/1366245113266118658\"><p>The performance problem with <code>sscanf<\/code> O(N<sup>2<\/sup>) in glibc has been known since at least 2014 (see <a href=\"https:\/\/sourceware.org\/bugzilla\/show_bug.cgi?id=17577\">bug 17577<\/a>). Ironically, if they&rsquo;d used <code>fscanf<\/code> (reading from a file instead of loading it into memory first) the problem wouldn&rsquo;t exist.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.mattkeeter.com\/blog\/2021-03-01-happen\/\">Matt Keeter<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.mattkeeter.com\/blog\/2021-03-01-happen\/\"><p>This sparked a <a href=\"https:\/\/news.ycombinator.com\/item?id=26296339\">great deal<\/a> of discussion:\nWas this C&rsquo;s fault?\nPerhaps <a href=\"https:\/\/twitter.com\/Jonathan_Blow\/status\/1366452792563359744\">&ldquo;web shit&rdquo;<\/a>?\n<a href=\"https:\/\/twitter.com\/fasterthanlime\/status\/1366187333507293184\">Capitalism and incentives<\/a>?<\/p><p>Still, folks in the comments section generally agreed:\n<em>they<\/em> wouldn&rsquo;t write anything that silly.<\/p><p>[&#8230;]<\/p><p>Yes, I had made the exact same mistake as the programmers working on GTA Online: I had an accidentally quadratic parser!<\/p>\n<p>[&#8230;]<\/p>\n<p>As someone that has been programming for many years,\nthis was a perfectly-timed reminder that there are <em>always<\/em> pitfalls out there.\nThe <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/io\/c\/fscanf\">documentation for <code>sscanf<\/code><\/a>\ndoes not include a time complexity,\nso this is particularly tricky <a href=\"https:\/\/en.wiktionary.org\/wiki\/footgun\">footgun<\/a>,\nand I&rsquo;m sure it&rsquo;s not the only one lurking in the darkness.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/github.com\/git\/git\/blob\/master\/banned.h\">Git<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=26347867\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/github.com\/git\/git\/blob\/master\/banned.h\"><p>This header lists functions that have been banned from our code base, because they&rsquo;re too easy to misuse (and even if used correctly, complicate audits).<\/p><\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2019\/08\/30\/accidentally-quadratic-constant-folding\/\">Accidentally Quadratic Constant Folding<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/06\/30\/writing-a-really-really-fast-json-parser\/\">Writing a Really, Really Fast JSON Parser<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/02\/20\/rubys-reject\/\">Ruby&rsquo;s reject!<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2016\/12\/09\/accidentally-quadratic-rust-hash-tables\/\">Accidentally Quadratic Rust Hash Tables<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>T0ST (via Hacker News): GTA Online. Infamous for its slow loading times. Having picked up the game again to finish some of the newer heists I was shocked (\/s) to discover that it still loads just as slow as the day it was released 7 years ago.[&#8230;]Enter stack sampling: for closed source applications there&rsquo;s only [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2021-03-05T01:41:03Z","apple_news_api_id":"51fbd9d0-7822-4446-a58f-2b5d6ab168dd","apple_news_api_modified_at":"2021-03-05T01:41:03Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/AUfvZ0HgiREaljytdarFo3Q","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[45,418,507,138,270,71],"class_list":["post-31785","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-c","tag-game","tag-json","tag-optimization","tag-parser","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/31785","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=31785"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/31785\/revisions"}],"predecessor-version":[{"id":31786,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/31785\/revisions\/31786"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=31785"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=31785"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=31785"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}