Archive for November 25, 2002

Monday, November 25, 2002

Perl vs. Python vs. Ruby

I’m evaluating Python and Ruby as replacements for Perl. I’ve been using Perl for several years and am very comfortable with it, although I’m definitely not an expert. Perl is a powerful language, but I think it’s ugly and encourages writing bad code, so I want to get rid of it. Python and Ruby both come with Mac OS X 10.2, both have BBEdit language modules, and both promise a cleaner approach to scripting. Over the past few weeks I read the Python Tutorial and the non-reference parts of Programming Ruby, however as of this afternoon I’d not written any Python or Ruby code yet.

Here’s a toy problem I wanted to solve. eSellerate gives me a tab-delimited file containing information about the people who bought my shareware. I wanted a script to extract from this file the e-mail addresses of people who asked to be contacted when I release the new versions of the products.

I decided to solve this problem in each language and then compare the resulting programs. The algorithm I chose was just the first one that came to mind. I coded it first in Ruby, and then ported the code to Python and Perl, changing it as little as possible. Thus, the style is perhaps not canonical Python or Perl, although since I’m new to Ruby it’s probably not canonical Ruby either. If I were just writing this in Perl, I might have tried to avoid Perl’s messy syntax for nested arrays and instead used an array of strings.

Here’s the basic algorithm:

  1. Read each line of standard input and break it into fields at each tab.
  2. Each field is wrapped in quotation marks, so remove them. Assume that there are no quotation marks in the interior of the field.
  3. Store the fields in an array called record.
  4. Create another array, records and fill it with all the records.
  5. Make a new array, contactRecords, that contains arrays of just the fields we care about: SKUTITLE, CONTACTME, EMAIL.
  6. Sort contactRecords by SKUTITLE.
  7. Remove the elements of contactRecords where CONTACTME is not 1.
  8. Print contactRecords to standard output, with the fields separated by tabs and the records separated by newlines.

And here’s the code:

Perl

#!/usr/bin/perl -w

use strict;

my @records = ();

foreach my $line ( <> )
{
    my @record = map {s/"//g; $_} split("\t", $line);
    push(@records, \@record);
}

my $EMAIL = 17;
my $CONTACTME = 27;
my $SKUTITLE = 34;

my @contactRecords = ();
foreach my $r ( @records )
{
    push(@contactRecords, [$$r[$SKUTITLE], $$r[$CONTACTME], $$r[$EMAIL]]);
}

@contactRecords = sort {$$a[0] cmp $$b[0]} @contactRecords;
@contactRecords = grep($$_[1] eq "1", @contactRecords);

foreach my $r ( @contactRecords )
{
    print join("\t", @$r), "\n";
}

The punctuation and my’s make this harder to read than it should be.

Python

#!/usr/bin/python

import fileinput

records = []

for line in fileinput.input():
    record = [field.replace('"', '') for field in line.split("\t")]
    records.append(record)

EMAIL = 17
CONTACTME = 27
SKUTITLE = 34

contactRecords = [[r[SKUTITLE], r[CONTACTME], r[EMAIL]] for r in records]
contactRecords.sort() # default sort will group by sku title
contactRecords = filter(lambda r: r[1] == "1", contactRecords)

for r in contactRecords:
    print "\t".join(r)

I think the Python version is generally the cleanest to read—that is, it’s the most English-like. I had to look up how join and filter worked, because they weren’t methods of list as I had guessed.

Ruby

#!/usr/bin/ruby

records = []

while gets
    record = $_.split('\t').collect! {|field| field.gsub('"', '') }
    records << record
end

EMAIL = 17
CONTACTME = 27
SKUTITLE = 34

contactRecords = records.collect {|r| [r[SKUTITLE], r[CONTACTME], r[EMAIL]] }
contactRecords.sort! # default sort will group by sku title
contactRecords.reject! {|a| a[1] != "1"}

contactRecords.each {|r|
    print r.join("\t"), "\n"
}

This is actually the shortest version, and I think it’s the easiest to read if you aren’t put off by the block syntax. I like how the sequence of operations in the first line of the while isn’t “backwards” as it is in the Perl and Python versions. Also, I correctly guessed which classes “owned” the methods and whether they were mutators.

Related Blogs

Mark Pilgrim has a page that uses Phil Pearson’s Blogging Ecosystem to find interesting sites that you might not be reading.