Tuesday, September 10, 2013

ARC vs. MRC Performance

Tom Davie:

However, I have to say, I have had the complete opposite experience with regards to performance. Having measured various projects before and after converting to ARC, I have seen numbers between 30% and 100% slowdown with ARC. The average is probably around 50%. I have never seen performance improve when using ARC.

Marcel Weiher:

It shouldn’t really be surprising. ARC adds an astounding number of additional reference counting ops to all code involving object pointers. If that were compiled, ObjC would be completely unusable and slower than all the so-called scripting languages out there. So for things to be usable, ARC then has the optimizer try to undo most of the damage and finally adds some clever runtime hacks to mitigate the rest.

Since the hacks and the remaining damage are somewhat orthogonal, you sometimes end up ahead and sometimes you end up behind.

The other thing that should be considered when seeing heroic hacks like the autorelease-undoer is that such techniques rarely arise spontaneously from an idle moment of relaxed performance optimization. More usually, they happen because there is some sort of “ho lee f*k” moment, where performance regression is so bad/project-threatening that something drastic/heroic needs to be done.

Corporations and people being what they are, official communication tends to focus more on the heroics than the “ho lee f*k”, so documentation on the performance of new technology tends to be, er, “optimistic”.

Jean-Daniel Dupas:

Actually, after doing a naive ARC migration I find a 30%slow-down on this sample.

It look like the main slowdown is induced by an increase of retain/release calls introduced by extra-safety.

Indeed, the conventional wisdom is that adopting ARC will make your app faster. It would be interesting to see whether the slowness is concentrated enough that it could be addressed by moving a few key methods to a separate file that’s compiled without ARC.

Update (2013-09-11): John McCall:

For what it’s worth, the autorelease optimization was planned; the performance problem it solves was extremely predictable, since it’s actually a noticeable performance problem in MRC code as well.

[…]

Overall, while we’re happy to see that some people see performance improvements, our expectation going in was always that ARC would cause some regressions, and that while in most code those would be lost in the noise, in some cases people would need to help ARC out with things like __unsafe_unretained. Ultimately, ARC is just a tool for improving your productivity as a programmer, not a magic button with no downsides.

Marcel Weiher:

More likely performance reasons/targets for opting out are things like inline reference counts and, especially, object caches. For me they generally bring factors of improvement, if not orders of magnitude, when applicable (wasn’t it CoreGraphics that had problems with their object cache no longer working on GC, thus killing performance?) Being able to mix-n-match and opt out is definitely one of the awesome features of ARC.

On the inline reference counts: when I was doing my recent tests on archiving performance, I suddenly found that object creation (1M objects) was taking longer than expected. Adding an inline retain count *halved* the running time for creating the object graph (155ms vs 300ms)! I have to admit I was bit surprised that the difference would be this big, considering the intrinsic cost of object creation. (Out of curiosity I tested it with ARC and it took 400ms)

John McCall:

ARC doesn’t replace the benefit of having an inline reference count. I think if we could magically bless all objects with an inline reference count without worrying about disrupting existing hard-coded object layouts, we probably would.

[…]

We’ve found that it usually doesn’t take very many __unsafe_unretained annotations to eliminate most regressions. There are a ton of places where any human reading the code would immediately realize that an object won’t get released, but ARC can’t quite prove that, usually because there’s an intervening message send. Most of those places don’t detectably affect performance; it’s the one or two that happen in a loop and therefore trigger 40,000 times that you notice. But by the same token, those sites tend to show up in Instruments and so are easy to track down and fix.

Update (2013-09-14): Marcel Weiher suggests some ways to add an inline reference count:

3. 3 bits in the class pointer

Since we aren’t allowed to get the isa pointer directly these days anyhow, that means we can mask out the low-order bits in object_getClass(), objc_msgSend() and the non-fragile ivar access code..hhmmm. Number of bits depends on whether you just rely on alignment or also grab what’s there from malloc() bucketing (probably shouldn’t). The “Getting Reference Counting Back into the Ring” paper claims that with 3 bits of recount, you avoid overflow for > 95% of objects, so that would be pretty good.

4. Do it yourself assistance

How about a function that takes a pointer to wherever I stashed my reference count and did all the right things, for example wrt. weak references? Or a macro.

John McCall:

We’ve certainly looked into things like this.

3 Comments RSS · Twitter

The consolation, given it's compiler managed, is that hopefully as the LLVM / Clang team improves the generated code, the performance regressions will be reduced by a simple recompile…

I've logged a couple of rdars for issues I've found

[...] is along the lines of what Marcel Weiher recently suggested except that the class objects are grouped so that the class pointer only needs 30 bits. This leaves [...]

Leave a Comment