Dissecting objc_msgSend on ARM64
objc_msgSend
has a few different paths it can take depending on circumstances. It has special code for handling things like messages tonil
, tagged pointers, and hash table collisions. I’ll start by looking at the most common, straight-line case where a message is sent to a non-nil
, non-tagged pointer and the method is found in the cache without any need to scan. I’ll note the various branching-off points as we go through them, and then once we’re done with the common path I’ll circle back and look at all of the others.
Incrementing to the end of the cache requires an extra instruction or two to calculate where the end of the cache is. The start of the cache is already known - it’s the pointer we loaded from the class - so we decrement towards that.
[…]
The extra scanned-twice check prevents power-draining infinite loops in some cases of memory corruption or invalid objects. For example, heap corruption could fill the cache with non-zero data, or set the cache mask to zero. Corruption like this would otherwise cause the cache scan loop to run forever without a cache hit or a cache miss. The extra check stops the loop so we can turn the problem into a crash log instead.
There are also cases where another thread simultaneously modifying the cache can cause this thread to neither hit nor miss on the first scan. The C code does extra work to resolve that race. A previous version of
objc_msgSend
handled this incorrectly - it immediately aborted instead of falling back to the C code - which caused rare crashes when the threads were unlucky.
However, Objective-C does not require
objc_msgSend
.[…]
Instead of
objc_msgSend
, the runtime can provide a function which looks up the method implementation and returns it to the caller. The caller can then invoke that implementation itself. This is how the GNU runtime does it, since it needs to be more portable. Their lookup function is calledobjc_msg_lookup
.[…]
However, each call now suffers the overhead of two function calls, so it’s a bit slower. Apple prefers to put in the extra effort of writing assembly code to avoid this, since it’s so critical to their platform.
It actually is not the extra function call that is the big hit, since if you think about it
objc_msgSend
also does two calls (the call tomsgSend
, which at the end then tail calls the imp). The dynamic instruction count is also roughly the same.In fact
objc_msgLookup
actually ends up being faster in a some micro benches since it plays a lot better with modern CPU branch predictors:objc_msgSend
defeats them by making every call site jump to the same dispatch function, which then makes a completely unpredictable jump to the imp. By usingmsgLookup
you essentially decouple the branch source from the lookup which greatly improves predictably. Also, with a “sufficiently smart” compiler it can be win because it allows you to do things like hoist the lookup out of loops, etc (essentially really clever automatedIMP
caching tricks).