Sunday, March 30, 2003

Iterators

Erik Barzeski posted about iterators in Cocoa, so I thought I’d give some perspectives from other languages. I think there are really two concepts here: iteration and generation.

I don’t know that everyone agrees with these definitions, but to my mind iteration is about obtaining a sequence of values, without caring about how those values are produced. This is embodied by the NSEnumerator interface, java.util.Iterator, and the Python iteration protocols. Iteration is about how you get the values that you want to consume.

Here’s what iteration looks like in Cocoa:

NSEnumerator *enumerator = [objectStructure objectEnumerator];
id o;
while ( o = [enumerator nextObject] )
    // do something with o

I use a C macro so that I can write it like this:

foreach ( o, [objectStructure objectEnumerator] )
    // do something with o

Some languages, like Python, have a syntax for iteration:

for o in objectStructure:
    # do something with o

In the above example, Python will call objectStructure’s __iter__() method to get an iterator object.

Generation is about how to produce the values. The generator returned by -[NSArray objectEnumerator] probably keeps a reference to the array and remembers an index into that array. When you send it -nextObject, it increments its counter and returns another object. java.util.Iterator is similar. This is straightforward to implement for list-like things. For instance, my SQLite wrapper includes an NSEnumerator subclass for rows in a database table. It took seconds to write. But how do you write this kind of generator for the nodes in a tree? Or the interesting tokens in a mail message? You need to remember more state than just an index, and it can be a real pain. In practice, people often perform a full traversal of the structure, stuffing the objects into an array, and then return the array’s generator. That makes the code much simpler, but inefficient.

I’ve seen two other ways of writing generators, which work better in the more complex cases. What I call the closure approach inverts the problem. Instead of writing code that gets values from a generator object and then does something with them, you put your code in a closure (a.k.a. lambda or block) and give the closure to the generator. This is the approach taken by Smalltalk and Ruby, and it could also work in other languages with closures, such as Lisp and Perl. From the caller’s perspective, we get something like this (in Ruby):

objectStructure.each { |o|
    # do something with o
}

Here, the block (in curly braces) becomes an argument to the method call. The each method traverses objectStructure and evaluates the block many times, passing in different o’s.

Other languages use what I call the yield approach, which was popularized by CLU in the mid-70s. Under this approach, generation is handled by a function that can, in effect, return multiple times. Instead of using a return statement to return, it uses a yield statement. yield is like return, except that the next time the function is called, it resumes after the last-executed yield statement.

Here’s an example in Python:

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x 

Is that beautiful or what?

Interestingly, the keyword that Ruby uses to evaluate a block parameter is also called yield. It’s used for roughly the same reasons, but what’s going on under the hood is quite different. Ruby’s yield is like a function call, whereas CLU/Python’s is like a function return.

I like return-style yield better. It can even be used to create infinite generators (a.k.a. streams):

def odds():
    i = 1
    while True:
        yield i
        i = i + 2

Unfortunately, there is no yield, of either kind, in the C family. You can kind of fake it using macros.

Comments RSS · Twitter

Leave a Comment