Thursday, March 4, 2021

Accidentally Quadratic Parsing With sscanf

T0ST (via Hacker News):

GTA Online. Infamous for its slow loading times. Having picked up the game again to finish some of the newer heists I was shocked (/s) to discover that it still loads just as slow as the day it was released 7 years ago.

[…]

Enter stack sampling: for closed source applications there’s only one option. Dump the running process’ stack and current instruction pointer’s location to build a calling tree in set intervals. Then add them up to get statistics on what’s going on.

[…]

Disassembling the now-less-obfuscated dump reveals that one of the addresses has a label pulled out of somewhere! It’s strlen? Going down the call stack the next one is labeled vscan_fn and after that the labels end, tho I’m fairly confident it’s sscanf.

It’s parsing something. Parsing what? Untangling the disassembly would take forever so I decided to dump some samples from the running process using x64dbg. Some debug-stepping later it turns out it’s… JSON!

[…]

To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this. I would assume it just scanned byte by byte and could stop on a NULL.

And then there’s another quadratic array membership test.

Michael Brown:

The performance problem with sscanf O(N2) in glibc has been known since at least 2014 (see bug 17577). Ironically, if they’d used fscanf (reading from a file instead of loading it into memory first) the problem wouldn’t exist.

Matt Keeter:

This sparked a great deal of discussion: Was this C’s fault? Perhaps “web shit”? Capitalism and incentives?

Still, folks in the comments section generally agreed: they wouldn’t write anything that silly.

[…]

Yes, I had made the exact same mistake as the programmers working on GTA Online: I had an accidentally quadratic parser!

[…]

As someone that has been programming for many years, this was a perfectly-timed reminder that there are always pitfalls out there. The documentation for sscanf does not include a time complexity, so this is particularly tricky footgun, and I’m sure it’s not the only one lurking in the darkness.

Git (via Hacker News):

This header lists functions that have been banned from our code base, because they’re too easy to misuse (and even if used correctly, complicate audits).

Previously:

Comments RSS · Twitter

Leave a Comment