Thursday, May 28, 2026

ARC Overhead in Swift Sorting

Sean Heber:

I made a function in Tapestry 23x faster today by sorting an array using its indices instead of using its data directly. Like this:

let unsorted = items
var indices = Array(unsorted.indices)
indices.sort { a, b in unsorted[a].thing > unsorted[b].thing }
items = indices.map { unsorted[$0] }

Versus presumably something like:

items.sort { a, b in a.thing > b.thing }

With Objective-C, at least pre-ARC, this was simple. The array owns the objects, and they do not get retained and released when passed to your comparator, or within it. Unless you do something silly, like removing objects from the array during sorting, it will be safe and have no memory management overhead.

With Swift, I find the situation rather confusing, and I was not able to find any documentation that lays out exactly what happens. Various searches and AI prompts turned up all sorts of conflicting information. Gemini, in particular, had some explanations that sounded reasonable but that I began to doubt because the sources that it linked to did not actually say what it claimed. I’ve tried to figure this out, but it may well be that some of my conclusions are wrong.

There are several things going on:

  1. Swift will insert retains and releases for the closure parameters. I can see this in the disassembly and by setting breakpoints. I think this is only because they’re accessed within the closure. They don’t need to be retained by the caller because the closure is non-escaping.

  2. If the optimizer is enabled, Swift can sometimes remove that retain counting. I’m not sure exactly when this happens. In my tests, where everything is defined in the same module, it worked for both classes and structs, but I get the impression that sometimes the optimizer is not able to remove the ARC traffic. This is a longstanding frustration with Swift, that the behavior under optimization can be wildly different and that it’s not obvious when you’re changing something in the code that defeats a crucial optimization.

  3. There can also be ARC traffic from the non-comparator aspects of the sorting. Values need to be moved and sometimes copied into temporary buffers, since Swift uses a complex sorting algorithm that does merging as well as swapping. Some of this seems to be done at a low level to avoid ARC, but at least for structs I don’t think it can be totally avoided.

  4. Swift also has exclusivity checks, which can detect (at runtime) if you’re modifying the array from inside the closure. So safety is somewhat decoupled from memory management.

Heber was sorting large structs and saw overhead related to properties that were not accessed in the closure. I’m not sure whether this was due to #1 or #3 or both. In any case, switching to indices adds (linear) overhead from creating two additional array buffers, but it reduces other types of ARC overhead because only integers are passed as parameters and moved around during sorting.

Is there a way—other than using indices—to guarantee the minimum ARC traffic? Swift now supports explicitly marking parameters as borrowing. That seems like conceptually the right idea. It directly expresses that the elements are owned by the array and so don’t need to be retained at other levels. But I don’t think it helps here. For #1, I think the parameters are already essentially borrowed, and it doesn’t address the overhead from #3. Also, the caller (sort) is not typed as taking a closure with borrowed parameters. Changing the closure wouldn’t change its behavior.

I also wondered whether it would help to use one of the other built-in sorting methods. With Comparable, you can avoid passing in a closure. But if the closure isn’t capturing anything, the signature seems identical to that of < in other respects. Still, maybe there are cases where the compiler would be able to see through more for certain Comparable types?

KeyPathComparator also seems promising in that more of the work happens internally. Maybe this could be optimized as the language evolves to do things that are not possible with normal functions/closures. But, so far, I think it’s mostly just syntactic sugar. In my tests, it still retained and released a lot with optimizations disabled.

Lastly, there’s the option of the Schwartzian transform. This can reduce the computational overhead during each comparison, e.g. if thing is a computed property. It needs extra space for a temporary array, but if you’re already making one for the indices trick, you can use the same one, just wider to store pairs.

Unfortunately, I don’t have a very satisfying conclusion except that I would be skeptical about trying to reason from first principles how the complete system will behave. I think you need to profile to see what’s actually happening. Even that’s easier said than done because the sorting algorithm switches between different modes based upon the size and character of the data.

Previously:

2 Comments RSS · Twitter · Mastodon


Pierre Lebeaupin

He clarifies later on that "items" has as elements "kind of large structs" https://mastodon.social/@bigzaphod/116163476388639954 , which implies to me some of the struct members are references. Which means ARC activity when any of these individual structs is copied, and I think these copies have no choice but to occur for passing the parameters to the comparison closure: even if parameters to non-constructor functions are borrowing by default, that keyword (even implicitly) only applies to uncopyable or reference types (the compiled code probably receives an address ultimately, but pointing to the copy). I seem to remember single-members structs (newtypes, basically) being treated specially so that they are considered equivalent to the inner type (which is made easy by the fact modifying such a struct is equivalent to recreating it with new contents), but for more complex structs I do not think there is a protocol for "treat its reference members as being borrowed", at least not without involving a CoW mechanism, which has to be done explicitly. Also, if the compiler processes simultaneously both sides of the call barrier it can apply a custom calling convention and/or whole program optimization, but I don't believe this can be applied to Array.sort, an ABI stable system function.

But yeah, not only is this kind of emerging behavior difficult to predict, which I could work with, it is just as hard to narrow down to a culprit when it inevitably occurs, even just to a culprit interaction: I once had to trace the propagation of a parameter in the disassembly (to figure out that a non-escaping closure was in fact allocating memory, but like ARC activity that is often the outcome of emerging behavior in the optimizer). As for taking precautions against such issues, well, I have once passed a parameter as inout even though I wasn't modifying it just to be sure it wouldn't be copied, but that is only an option if you control both sides of the call barrier.


@Pierre I did some more testing, and it seems like there is a special case for single-member structs. Also, a multi-member struct did not cause ARC activity when the reference properties were not accessed within the closure. That part was surprising to me because, as you say, the ABI is fixed. The caller doesn’t know which properties are being accessed. If it can delay retaining until already inside the closure just before access, why can it not just read the value for the comparison without retaining?

Leave a Comment