Monday, August 23, 2004

Java Regex Speed

Tim Bray:

There are all sorts of variations around I/O and so on, but my finding is that for this problem, the Java 1.4.2 regex processing is somewhere around twice as fast as Perl 5.8.1. Frankly, I’m astounded.

3 Comments RSS · Twitter

Behold, the power of UTF-8!

Tim's likely running over Unicode data. Perl 5 stores unicode in UTF-8 format, a variable-width storage form. It's really, really inefficient to access, though it does take up very little space. Java uses UTF-16, which is a fixed-width format. (And yes, I know about combining characters and alternate planes and such) I fully expect the place perl buys it big time is in the code that has to do character boundary checking. (This is one of the reasons Parrot's going with a fixed-width encoding scheme. Variable width schemes suck)

Dan,
If we use a normal method calling it lets say some 10,00,000 times,i have found a 10 time difference in the speed of a normal validation method and Regex.Is there any way in which i can speed up Regex as i need it for Validations.
Thanks,

I have coded and tested in Java
Thanks

Leave a Comment