Tokenizing in Cocoa
I don’t have numbers, but I was somewhat surprised to discover that, for my use, it’s faster to convert a string to a UTF-8 buffer and feed it to PCRE than it is to use NSScanner and NSCharacterSet. If you do end up using NSScanner
, be sure to use -scanUpToCharactersFromSet:intoString:
rather than -scanCharactersFromSet:intoString:
. The latter spends a lot of time inverting the character set.