Archive for May 28, 2026

Thursday, May 28, 2026

ARC Overhead in Swift Sorting

Sean Heber:

I made a function in Tapestry 23x faster today by sorting an array using its indices instead of using its data directly. Like this:

let unsorted = items
var indices = Array(unsorted.indices)
indices.sort { a, b in unsorted[a].thing > unsorted[b].thing }
items = indices.map { unsorted[$0] }

Versus presumably something like:

items.sort { a, b in a.thing > b.thing }

With Objective-C, at least pre-ARC, this was simple. The array owns the objects, and they do not get retained and released when passed to your comparator, or within it. Unless you do something silly, like removing objects from the array during sorting, it will be safe and have no memory management overhead.

With Swift, I find the situation rather confusing, and I was not able to find any documentation that lays out exactly what happens. Various searches and AI prompts turned up all sorts of conflicting information. Gemini, in particular, had some explanations that sounded reasonable but that I began to doubt because the sources that it linked to did not actually say what it claimed. I’ve tried to figure this out, but it may well be that some of my conclusions are wrong.

There are several things going on:

  1. Swift will insert retains and releases for the closure parameters. I can see this in the disassembly and by setting breakpoints. I think this is only because they’re accessed within the closure. They don’t need to be retained by the caller because the closure is non-escaping.

  2. If the optimizer is enabled, Swift can sometimes remove that retain counting. I’m not sure exactly when this happens. In my tests, where everything is defined in the same module, it worked for both classes and structs, but I get the impression that sometimes the optimizer is not able to remove the ARC traffic. This is a longstanding frustration with Swift, that the behavior under optimization can be wildly different and that it’s not obvious when you’re changing something in the code that defeats a crucial optimization.

  3. There can also be ARC traffic from the non-comparator aspects of the sorting. Values need to be moved and sometimes copied into temporary buffers, since Swift uses a complex sorting algorithm that does merging as well as swapping. Some of this seems to be done at a low level to avoid ARC, but at least for structs I don’t think it can be totally avoided.

  4. Swift also has exclusivity checks, which can detect (at runtime) if you’re modifying the array from inside the closure. So safety is somewhat decoupled from memory management.

Heber was sorting large structs and saw overhead related to properties that were not accessed in the closure. I’m not sure whether this was due to #1 or #3 or both. In any case, switching to indices adds (linear) overhead from creating two additional array buffers, but it reduces other types of ARC overhead because only integers are passed as parameters and moved around during sorting.

Is there a way—other than using indices—to guarantee the minimum ARC traffic? Swift now supports explicitly marking parameters as borrowing. That seems like conceptually the right idea. It directly expresses that the elements are owned by the array and so don’t need to be retained at other levels. But I don’t think it helps here. For #1, I think the parameters are already essentially borrowed, and it doesn’t address the overhead from #3. Also, the caller (sort) is not typed as taking a closure with borrowed parameters. Changing the closure wouldn’t change its behavior.

I also wondered whether it would help to use one of the other built-in sorting methods. With Comparable, you can avoid passing in a closure. But if the closure isn’t capturing anything, the signature seems identical to that of < in other respects. Still, maybe there are cases where the compiler would be able to see through more for certain Comparable types?

KeyPathComparator also seems promising in that more of the work happens internally. Maybe this could be optimized as the language evolves to do things that are not possible with normal functions/closures. But, so far, I think it’s mostly just syntactic sugar. In my tests, it still retained and released a lot with optimizations disabled.

Lastly, there’s the option of the Schwartzian transform. This can reduce the computational overhead during each comparison, e.g. if thing is a computed property. It needs extra space for a temporary array, but if you’re already making one for the indices trick, you can use the same one, just wider to store pairs.

Unfortunately, I don’t have a very satisfying conclusion except that I would be skeptical about trying to reason from first principles how the complete system will behave. I think you need to profile to see what’s actually happening. Even that’s easier said than done because the sorting algorithm switches between different modes based upon the size and character of the data.

Previously: