Wednesday, February 14, 2007

C Is the New Assembly

Daniel Jalkut (paraphrasing John Gruber):

He suggests that a typical developer will write everything in Ruby or Python, and then do performance testing. Anything that needs a speed-up can be redone in Objective-C. You know, that “slow” dynamic variant of C :)

This analysis is foreboding, because it’s exactly what programmers have always done when they switched to a higher level language. 10 years ago a programmer would be more likely to “switch to assembly” for a much-needed performance boost. Has it come to this?

Yes, it has. However, I’d like to add a few points. First, while in general it’s true that “scripting” languages are slow(er) and Objective-C is fast(er), this is not always the case. I suspect that common computations using lots of arrays, dictionaries, numbers, strings, etc. are faster in Python than in Objective-C/Cocoa due to less memory management and dispatch overhead. Likewise for text processing, since in the scripting languages the regex engine works directly with the native string type. Of course, Objective-C has the potential for very high performance if you drop below the object level or use Objective-C++.

Second, achieving better performance by recoding in Objective-C doesn’t work quite the same way as recoding parts of a C program in assembly. The difference is the bridge. A hybrid application must be developed with careful attention to bridging overhead, which can be a much bigger performance drain than the fact that the scripting language isn’t compiled down to native code. You must decide early on which areas of your application will use Objective-C objects and which will use scripting-language objects.

If you do everything with Objective-C objects, you can recode select objects in Objective-C to improve performance, but the ones coded in your scripting language (the majority) will be doing a lot of bridging, and you won’t be able to take full advantage of your scripting language because you’re limited to the Objective-C object model.

On the other hand, if you do everything with scripting-language objects you can avoid a lot of bridge overhead (within your own code—you increase the overhead when Cocoa needs to talk to your code), but it becomes more difficult to improve performance by recoding an object in Objective-C. Doing so will make that object locally faster but will add overhead due to the fact that the rest of your code now has to talk to this object via a proxy.

8 Comments

Hey, does blockquote work here?

I suspect that common computations using lots of arrays, dictionaries, numbers, strings, etc. are faster in Python than in Objective-C/Cocoa due to less memory management and dispatch overhead.

By this, I assume you mean for code locality reasons, smaller working set, preloading of literal content, etc?

I'm wondering if you're overestimating the bridging overhead. This depends on the approach used, of course, but cross-bridge method dispatch doesn't have to be inefficient.

Further, I don't see object model impedence mismatch being a problem, but maybe that reflects my Ruby bias. Since Ruby and Objective-C both derive from the SmallTalk object model, it's not really an issue. Dunno about Python.

Lastly, you left out the issue of dealing with cross-bridge exception handling.


By this, I assume you mean for code locality reasons, smaller working set, preloading of literal content, etc?

No, I mean that with Objective-C you have dispatch and function overhead for retain/release and autorelease pools, whereas in Python the reference counts are manipulated directly. Due to auto-releasing, objects can live longer than necessary if you aren’t careful. I suspect that the allocation of lots of small string/number/array/dictionary objects in Python is more optimized due to recycling, etc. In Cocoa, a mutable string that's used as a dictionary key has to be copied, even if no other object has a reference to it. Anyway, I don't have hard data, but based on my experience developing applications that use large numbers of basic Foundation and Python objects, and reading the Core Foundation and Python sources, I would be skeptical of anyone who assumes that Cocoa would definitely be faster when dealing with large numbers of basic objects and mostly built-in operations.


I'm wondering if you're overestimating the bridging overhead. This depends on the approach used, of course, but cross-bridge method dispatch doesn't have to be inefficient.

Well, PyObjC allocates a proxy object for every Objective-C object that's accessed from Python and every Python object that's accessed from Objective-C. I saw a blog post a few weeks ago showing that NSProxy adds about an order of magnitude overhead for each method call. Plus, some objects like strings are copied when they cross the bridge.


Further, I don't see object model impedence mismatch being a problem, but maybe that reflects my Ruby bias. Since Ruby and Objective-C both derive from the SmallTalk object model, it's not really an issue. Dunno about Python.

Python supports multiple inheritance. I'm not sure how Ruby's mixins would be affected. Ruby and Python both have variadic methods with keyword arguments and optional parameters, which don't mix well with Objective-C. Python decorators and generators don't work in Objective-C.

I suspect that...

That immediately made me ignore the rest of your posting as you should MEASURE, not SPECULATE.

Show me a bottleneck, and I'll show you a programmer's assumption.

I think the usual case will be to have scripts run inside a sandbox, talking to your non-scripted code using some kind of internal API. In that case, the "bridging" (if you can still call it bridging; probably not) is actually well-defined and independent of any object models. Mapping object models between languages with different assumptions (multiple inheritance, interfaces, prototypical inheritance, no "real" objects, and so on) is very messy and not needed.

Hence, if something runs too slowly in your scripting language, make it part of the API your scripting language is using and just call non-scripted code.

I assume that PyObjC isn't what Daniel had in mind; what he's talking about - or maybe that's just my own opinion of how this should be done - is more like using LUA in an app like Lightroom.

Steve: the point of my comments was not that scripting languages should be considered faster, but that one shouldn't assume that they are slower. I'm for more measurement, not less.

LKM: I think Daniel was referring to PyObjC and RubyCocoa, since those are the bridges included with Leopard. Although it's certainly possible to use those to develop an app with an internal API, like Lightroom, I think the common pattern is to write everything in the scripting language, except for performance-critical sections, library code that you already have, and code that needs to call C.

Yeah, I just read it again and I see that you're right. My own work was clouding my memory of his blog entry, obviously :-)

Personally, I think mixing scripting languages and compiled languages based simply on whether something is performance-critical is not an ideal way to program.

Bill Bumgarner has an interesting post on this subject

Since John Gruber just linked to this post, I want to clarify my comments about bridging overhead. It’s not that the bridge is the biggest or most common performance hit, but rather that bridge overhead seems to be largely unappreciated and undiscussed. People are already aware that the heavy processing should be done by compiled code, either through Mac OS X frameworks or Python built-ins or extension modules. Crossing the bridge is a source of overhead that they might not have been expecting, and which cannot easily be optimized away by rewriting select portions of the code after the fact.

Secondly, some people are suggesting that Objective-C 2.0’s support for garbage collection and properties (foreach is already available via a macro) makes languages like Python and Ruby less attractive. This is true as far as it goes—if you were attracted to Ruby because of its garbage collection, then you won’t be so drawn to it once Objective-C gains (Leopard-only) garbage collection. However, if you are taking full advantage of Python or Ruby, Objective-C 2.0 isn’t really that enticing. It’s much closer to Java than to a “scripting” language, both in terms of powerful features and verbosity.

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment