Tuesday, July 25, 2017

Dissecting objc_msgSend on ARM64

Mike Ash (Hacker News):

objc_msgSend has a few different paths it can take depending on circumstances. It has special code for handling things like messages to nil, tagged pointers, and hash table collisions. I’ll start by looking at the most common, straight-line case where a message is sent to a non-nil, non-tagged pointer and the method is found in the cache without any need to scan. I’ll note the various branching-off points as we go through them, and then once we’re done with the common path I’ll circle back and look at all of the others.

Greg Parker:

Incrementing to the end of the cache requires an extra instruction or two to calculate where the end of the cache is. The start of the cache is already known - it’s the pointer we loaded from the class - so we decrement towards that.


The extra scanned-twice check prevents power-draining infinite loops in some cases of memory corruption or invalid objects. For example, heap corruption could fill the cache with non-zero data, or set the cache mask to zero. Corruption like this would otherwise cause the cache scan loop to run forever without a cache hit or a cache miss. The extra check stops the loop so we can turn the problem into a crash log instead.

There are also cases where another thread simultaneously modifying the cache can cause this thread to neither hit nor miss on the first scan. The C code does extra work to resolve that race. A previous version of objc_msgSend handled this incorrectly - it immediately aborted instead of falling back to the C code - which caused rare crashes when the threads were unlucky.

Mike Ash:

However, Objective-C does not require objc_msgSend.


Instead of objc_msgSend, the runtime can provide a function which looks up the method implementation and returns it to the caller. The caller can then invoke that implementation itself. This is how the GNU runtime does it, since it needs to be more portable. Their lookup function is called objc_msg_lookup.


However, each call now suffers the overhead of two function calls, so it’s a bit slower. Apple prefers to put in the extra effort of writing assembly code to avoid this, since it’s so critical to their platform.

Louis Gerbarg:

It actually is not the extra function call that is the big hit, since if you think about it objc_msgSend also does two calls (the call to msgSend, which at the end then tail calls the imp). The dynamic instruction count is also roughly the same.

In fact objc_msgLookup actually ends up being faster in a some micro benches since it plays a lot better with modern CPU branch predictors: objc_msgSend defeats them by making every call site jump to the same dispatch function, which then makes a completely unpredictable jump to the imp. By using msgLookup you essentially decouple the branch source from the lookup which greatly improves predictably. Also, with a “sufficiently smart” compiler it can be win because it allows you to do things like hoist the lookup out of loops, etc (essentially really clever automated IMP caching tricks).

Comments RSS · Twitter

Leave a Comment