Accidentally Quadratic Parsing With sscanf
T0ST (via Hacker News):
GTA Online. Infamous for its slow loading times. Having picked up the game again to finish some of the newer heists I was shocked (/s) to discover that it still loads just as slow as the day it was released 7 years ago.
[…]
Enter stack sampling: for closed source applications there’s only one option. Dump the running process’ stack and current instruction pointer’s location to build a calling tree in set intervals. Then add them up to get statistics on what’s going on.
[…]
Disassembling the now-less-obfuscated dump reveals that one of the addresses has a label pulled out of somewhere! It’s
strlen
? Going down the call stack the next one is labeledvscan_fn
and after that the labels end, tho I’m fairly confident it’ssscanf
.It’s parsing something. Parsing what? Untangling the disassembly would take forever so I decided to dump some samples from the running process using x64dbg. Some debug-stepping later it turns out it’s… JSON!
[…]
To be fair I had no idea most
sscanf
implementations calledstrlen
so I can’t blame the developer who wrote this. I would assume it just scanned byte by byte and could stop on aNULL
.
And then there’s another quadratic array membership test.
The performance problem with
sscanf
O(N2) in glibc has been known since at least 2014 (see bug 17577). Ironically, if they’d usedfscanf
(reading from a file instead of loading it into memory first) the problem wouldn’t exist.
This sparked a great deal of discussion: Was this C’s fault? Perhaps “web shit”? Capitalism and incentives?
Still, folks in the comments section generally agreed: they wouldn’t write anything that silly.
[…]
Yes, I had made the exact same mistake as the programmers working on GTA Online: I had an accidentally quadratic parser!
[…]
As someone that has been programming for many years, this was a perfectly-timed reminder that there are always pitfalls out there. The documentation for
sscanf
does not include a time complexity, so this is particularly tricky footgun, and I’m sure it’s not the only one lurking in the darkness.
Git (via Hacker News):
This header lists functions that have been banned from our code base, because they’re too easy to misuse (and even if used correctly, complicate audits).
Previously:
- Accidentally Quadratic Constant Folding
- Writing a Really, Really Fast JSON Parser
- Ruby’s reject!
- Accidentally Quadratic Rust Hash Tables