Thursday, May 10, 2018 [Tweets] [Favorites]

“Black Dot” Unicode Bug

Benjamin Mayo:

A new Unicode text bug is being spread around today, popularised by a video by EverythingApplePro. It’s being called the ‘black dot’ bug because of its origins on Android as a bug relating to WhatsApp: it was being spread with the following emoji: <⚫>👈🏻. The iOS version of this bug is a bit different in its mechanics, but neither variants actually rely on the visible black dot character to cause the freezes and crashes.

The secret is that the strings contain thousands of hidden invisible Unicode characters, which churns through CPU cycles as the system attempts to process them. If this specially crafted text is sent through Messages, it will result in repeated crashes when the recipient tries to read it.

ctxppc:

I’ve already said it but Apple should revisit the text rendering architecture. Unicode is so complex it cannot be trusted in the same process as the app (or SpringBoard). Rendering should be done off-process just like how the window server on macOS deals with windows, Mission Control, and the mouse cursor (which continue to work even when an app freezes).

Once a particular string hangs or crashes the rendering process, it should be blacklisted and dealt appropriately until an update comes around which fixes the issue. The process could even report the blacklisted string to Apple (with permission from the user) so that it could be fixed early on.

Various Web rendering and font tasks are also handled out-of-process already.

Previously: Another iOS Crash Caused By Sending Unicode Character.

2 Comments

Apple needs to make some critical changes to their UNICODE rendering engine. This bug is just one in a long line of crashing bugs that result from maliciously attaching thousands (if not millions) of cascading accents, non-printing characters and other annotations to single characters.

Apple could easily block all this by modifying their code so that (for example), it wouldn't attempt to render more than 100 "annotation" characters per printable character. If it encounters more then it can either skip over the rest until reaching the next printable character or it can just abort rendering of the entire string. This might violate the UNICODE spec, but in actual practice, the only text that will be impacted will be text that is deliberately intended to crash applications.

TL;DR: Unicode text is inherently untrusted input. Sanitize accordingly.

(Also, much disappoint at lack of pirate references.)

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment