Wednesday, October 24, 2007

A Localization Horror Story

The documentation for Perl’s Maketext module mentions an interesting issue:

First off, your code for “I scanned %g directory.” or “I scanned %g directories.” assumes there’s only singular or plural. But, to use linguistic jargon again, Arabic has grammatical number, like English (but unlike Chinese), but it’s a three-term category: singular, dual, and plural. In other words, the way you say “directory” depends on whether there’s one directory, or two of them, or more than two of them. Your test of ($directory == 1) no longer does the job. And it means that where English’s grammatical category of number necessitates only the two permutations of the first sentence based on “directory [singular]” and “directories [plural],” Arabic has three—and, worse, in the second sentence (“Your query matched %g file in %g directory.”), where English has four, Arabic has nine. You sense an unwelcome, exponential trend taking shape.

I’ve run into simpler versions of this problem several times, in different programming languages, and never found a satisfying solution.

2 Comments RSS · Twitter

Dude, it’s simple. Universal translator. You pin it to your uniform, and it’ll deal with it: Arabic, Romulan, whatever.

Wait. Does only work for audio, not text?


I don't know any good solutions for the problem of multiple pluralization-related parameters in a single message, but good old `ngettext` works rather nicely if there's only one such parameter:

Leave a Comment