Monday, July 6, 2020

Optimizing the Objective-C Runtime in Big Sur

WWDC 2020:

Dive into the microscopic world of low-level bits and bytes that underlie every Objective-C and Swift class. Find out how recent changes to internal data structures, method lists, and tagged pointers provide better performance and lower memory usage. We’ll demonstrate how to recognize and fix crashes in code that depend on internal details, and show you how to keep your code unaffected by changes to the runtime.

Pierre Habouzit:

Also the tagged pointer change allows the piece of assembly I’m the most insanely proud of: the tagged pointer decoding is now much faster in msgSend.

Pierre Habouzit:

This structure holds Writeable runtime metadata for the classes to work at runtime. But only half of that 8-word structure was used commonly.

So we split it, and only allocate the extended part when needed (which is rare) and as Ben mentions, we saved dozens of MBs (given that we save 32B a-piece, yes it means there are several hundreds of thousands of classes initialized system wide)

[…]

We found that it’s quite common in certain UI code (but not only) to repeatedly autorelease the same object over and over again. We have implemented a small LRU that is consulted each time an object is autoreleased.

[…]

Also, because the runtime caches negative [IMP cache] entries, the speed of a lookup miss is not very relevant, so we can tolerate denser tables.

We added 2-entries hashes. tables up to 8 entries are filled up to 100% and and others up to ~90% (7/8th).

Pierre Habouzit:

[The] motivation for us is that a single method made direct saves you typically 30bytes (that’s what the average cost of an IMP entry used to be).

A monomorphic IMP cached in 100 processes gives you 3k, save 1000 such IMPs you save 3M system wide.

It also saves a lot of binary size.

David Smith:

The idea that saving 30 bytes per process per method is worth doing significant work is not intuitive until you internalize just how many processes are on a typical iOS device and how valuable memory freed up for the frontmost app is[…]

Pierre Habouzit:

[We] have pre-optimized some IMP Caches at build time. How do you think we did that...

Pierre Habouzit:

To beat a hash-table with linear probing, there’s only one thing you can do: a perfect hash table. The problem is, perfect hash tables that exist today in the literature are large, use complex hash functions (the one Obj-C uses is just a mask).

So that was quite the conundrum.

Now there are two ways to get a perfect hash table: either you have a perfect hash function…. or you cheat and make sure that all your keys hash perfectly. Keys for us, are selectors. They live in the shared cache.

Do you see it coming?

[…]

Memory savings are … substantial. There’s also a huge speed win during startup because… you don’t have to build those caches anymore and the contention on the runtime locks is reduced.

Previously:

1 Comment RSS · Twitter

Do you enjoy @mikeash’s blog posts digging into the ObjC runtime? How about one in video form

How about one in non-video, non-Tweet form that I can read, skim, and reference.

Leave a Comment