Solving a Mysterious Heap Corruption Crash
Agnes Vasarhelyi (tweet, via Alexis Gallagher):
I removed every third-party dependency, to exclude the possibility that the problem is not in our code.
[…]
Move suspicious pieces to an empty project
[…]
The code was fairly slim at this point - a few thousand lines of parsing 3D models into all kinds of data structures. Nothing concurrent, everything running synchronously. I wanted to try and look at the crash site again. Even though I knew the cause of the heap corruption could be elsewhere, seeing the stack trace in the same piece of code every time made me want to look closer there.
The pattern I started to see was that there was always a
Dictionary
involved, and there was always asimd
type such asdouble3
in the dictionary.[…]
But what if.. what if it’s really a Swift bug? 🙀
[…]
When their elements had unusually wide alignments, storage for the standard library’s collection types was not guaranteed to be always allocated with correct alignment. If the start of the storage did not fall on a suitable address, Dictionary rounded it up to the closest alignment boundary. This offset ensured correct alignment, but it also meant that the last Dictionary element may have ended up partially outside of the allocated buffer — leading to a form of buffer overflow. Some innocuous combination of OS/language/device parameters probably caused this issue to trigger more frequently — which is probably why it became noticeable on particular devices running iOS 11.
Update (2018-03-23): Greg Heo (via Agnes Vasarhelyi):
The tail-allocated size is sufficient, but the system didn’t take alignment into account. The alignment boundary we need is not at the start of the tail allocation.
The result? A buffer overflow. Corrupted heap. 💥