Saturday, August 1, 2015

Swift Array Performance

Airspeed Velocity:

Your reminder that building arrays with reduce, while fun, is accidentally quadratic.

I would have thought that extending an array would be optimized to not make a copy.

Update (2015-08-01): Joe Groff:

I don’t think + is implemented to check for unique referencing at all.

Update (2015-08-03): Airspeed Velocity (tweet):

I was surprised at how surprising some found this. Quite a few people suggested the reduce version could be changed to not do the array copy (I don’t think it can). Some suggested maybe + could be optimized so it doesn’t perform a copy (I don’t think that it could easily, as we’ll see shortly).

[…]

Assuming combine here is { $0 + [transform($1)] }, you can see that + similarly has no knowledge of the fact that we’re actually going to assign the outcome directly to the result variable. We know, on inspecting the code, that it’ll just be fine to add the right-hand side onto the left-hand side, if that were even possible (in theory it is, since even though the array is passed immutably by value, the buffer is a class and so could be mutated since it has reference semantics). But + can’t know that from where it sits. It definitely knows it’s copy of the left-hand side isn’t the only owner of the buffer. There is another owner too: reduce holds a copy of result – and is about to throw it away and replace it with a new result, but that’s coming after + has run.

Joe Groff:

Swift’s parameter convention is callee-release, so isUniquelyRefd can work inside +. It’d only succeed if + is the last use.

That optimization could lead to optimizer-dependent algorithmic complexity though, sorta like TCO.

“TCO” refers to tail call optimization.

8 Comments RSS · Twitter

The example using "map" is appends to the array and is fast. The example using reduce uses x+y, on an ever increasing x. x+y is O(N) in the length of X and Y since it has to copy them.

@ChrisL Why does + have to copy x? Is there another reference to that array somewhere?

+= and + are different, in that += intentionally and obviously mutates the input array in place. + is an example of the same operation in a functional form. Because the functional form isn't allowed to change the inputs to it, it is required to copy the inputs to produce a new and unique destination. You would be very unhappy if in:

let x = ...
let y = ...
print(x); print(y)
let z = x + y

print(x); print(y); print(z)

The addition changed either x or y. Not being able to mutate one of them obviously requires them to be copied, and in the case of arrays, that is an O(n) operation.

If you're interested in why mutation can be a *good* thing (at least when carefully combined with value semantics), check out the "Building Better Apps with Value Types in Swift" talk at WWDC'15. The "Protocol-Oriented Programming in Swift" talk is a great follow on to it if you missed it.

-Chris

@ChrisL I understand the difference between the two operators. My point is that there is already an optimization so that append can mutate the array’s storage instead of copying it if the array is uniquely referenced. A bunch of us thought that this optimization could also apply in Airspeed Velocity’s example (but not in your example where there is an obvious second reference). It doesn’t seem like there would be any other references to the $0 array, so its storage could be mutated without this being observable to any other code.

So my question is, does this optimization not work because:

  1. There is another reference (e.g. reduce is holding onto it)?
  2. There’s no other reference, but it’s impossible for Swift to prove that?
  3. Swift could prove it but the optimization is not implemented yet?

I haven't looked into this specific case, so I don't know. It certainly seems that a sufficiently smart optimizer should be able to handle this case. That said, do you want the algorithmic complexity of your code to depend on whether a specific compiler optimization is working? :-)

@ChrisL No, but I was hoping this could be handled in the standard library rather than in the compiler’s optimizer. I assume that’s what’s already happening with append—otherwise we’re already down that road.

Yep, append handles this, because it knows that one input is obviously destroyed. Optimizing x+y into append when x or y is unused is definitely possible, but relies on data flow information as well as knowledge of the high level semantics of array, which we don't have in the optimizer yet.

[…] is currently too much of a pain to use. Likewise, the standard library is still in flux and not fully understood, whereas Foundation is stable and a known quantity. Foundation is great. Why not keep using […]

Leave a Comment