{"id":38572,"date":"2023-02-23T14:33:38","date_gmt":"2023-02-23T19:33:38","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=38572"},"modified":"2023-03-10T15:15:36","modified_gmt":"2023-03-10T20:15:36","slug":"speeding-up-scanner-in-swift","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2023\/02\/23\/speeding-up-scanner-in-swift\/","title":{"rendered":"Speeding Up Scanner in Swift"},"content":{"rendered":"<p>My first tip goes back to when I started using <code>NSScanner<\/code> in the Puma days. In short, you should never call <code>scanCharacters(from:into:)<\/code> in a loop because every time it&rsquo;s called it creates an inverted copy of the character set. It then delegates to <code>NSString.rangeOfCharacter(from:options:range:)<\/code>, passing that copy. The <a href=\"https:\/\/developer.apple.com\/documentation\/foundation\/nscharacterset\/1414025-invertedset\">documentation<\/a> contains the cryptic comment:<\/p>\n<blockquote cite=\"https:\/\/developer.apple.com\/documentation\/foundation\/nscharacterset\/1414025-invertedset\">\n<p>Using the inverse of an immutable character set is much more efficient than inverting a mutable character set.<\/p><\/blockquote>\n<p>But my experience is that it&rsquo;s not fast with immutable characters sets, either. It seems like there should be an <code>NSCharacterSet<\/code> subclass that flips the membership of another object. Then each character set could store its own inverse with minimal overhead and just return the same one each time. But there&rsquo;s apparently no such optimization, so I recommend calling <code>inverted<\/code> yourself, storing the result, and then using <code>scanUpToCharacters(from:into:)<\/code>, which will then use the character set unchanged.<\/p>\n<p>Even this is very slow when calling from Swift, though. Whenever you call <code>scanUpToCharacters(from:into:)<\/code> with a <code>CharacterSet<\/code>, it calls <code>CharacterSet._bridgeToObjectiveC()<\/code>, which calls <code>__CFCharacterSetCreateCopy()<\/code>, which again makes an expensive copy. (I have been doing a lot of profiling but somehow didn&rsquo;t notice this until Ventura, so I wonder whether something changed there.) In any case, currently <code>CharacterSet<\/code> does not bridge efficiently like <code>Data<\/code> and <code>String<\/code> do.<\/p>\n<p>My first try at working around this was to do the bridging up front:<\/p>\n<pre>let fast = characterSet as NSCharacterSet<\/pre>\n<p>and then pass the same <code>NSCharacterSet<\/code>, which should bridge cheaply, each time. But this didn&rsquo;t help.<\/p>\n<p>What did work was to create an <code>NSCharacterSet<\/code> directly:<\/p>\n<pre>let fast = NSCharacterSet(bitmapRepresentation: characterSet.bitmapRepresentation)<\/pre>\n<p>With that change, the bridging overhead goes way. <code>Scanner<\/code> is still not particularly fast, though. Maybe this will improve with the forthcoming <a href=\"https:\/\/mjtsai.com\/blog\/2022\/12\/12\/the-swifty-future-of-foundation\/\">Swifty Foundation<\/a>, or I may end up writing a replacement for just the few cases that I need that works directly on Swift strings.<\/p>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2022\/12\/12\/the-swifty-future-of-foundation\/\">The Swifty Future of Foundation<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2019\/02\/22\/swift-subclass-of-nstextstorage-is-slow-because-of-swift-bridging\/\">Swift Subclass of NSTextStorage Is Slow Because of Swift Bridging<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/12\/05\/key-difference-between-dictionary-and-nsdictionary\/\">Key Difference Between Dictionary and NSDictionary<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/08\/29\/swift-4-bridging-peephole-for-as-casts\/\">Swift 4: Bridging Peephole for &ldquo;as&rdquo; Casts<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2016\/04\/22\/swift-proposal-mutability-and-foundation-value-types\/\">Swift Proposal: Mutability and Foundation Value Types<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2015\/10\/15\/swift-casting-with-_objectivecbridgeable\/\">Swift Casting With _ObjectiveCBridgeable<\/a><\/li>\n<\/ul>\n\n<p id=\"speeding-up-scanner-in-swift-update-2023-02-24\">Update (2023-02-24): Another point to be aware of is that the documention implies that the <a href=\"https:\/\/developer.apple.com\/documentation\/foundation\/nsscanner\/1409488-casesensitive\">caseSensitive<\/a> option applies to <code>scanCharacters(from:into:)<\/code>, and <code>scanCharacters(from:into:)<\/code> does actually pass the option into <code>NSString.rangeOfCharacter(from:options:range:)<\/code>, but <code>NSString.rangeOfCharacter(from:options:range:)<\/code> is <a href=\"https:\/\/developer.apple.com\/documentation\/foundation\/nsstring\/1416898-rangeofcharacterfromset\">documented to ignore that flag<\/a>, and in fact it does. So <a href=\"https:\/\/developer.apple.com\/documentation\/foundation\/nsscanner\/1409488-casesensitive\">caseSensitive<\/a> only actually applies to the <code>Scanner<\/code> methods that take strings.<\/p>\n\n<p><a href=\"https:\/\/mastodon.social\/@rhysmorgan\/109915855602495560\">Rhys Morgan<\/a>:<\/p>\n<blockquote cite=\"https:\/\/mastodon.social\/@rhysmorgan\/109915855602495560\">\n<p><a href=\"https:\/\/github.com\/pointfreeco\/swift-parsing\">swift-parsing<\/a> from @pointfreeco is a really good library that&rsquo;s usually faster than Foundation&rsquo;s Scanner!<\/p>\n<\/blockquote>\n\n<p id=\"speeding-up-scanner-in-swift-update-2023-03-10\">Update (2023-03-10): <a href=\"https:\/\/mastodon.social\/@schwa\/109997417512745627\">Jonathan Wight<\/a>:<\/p>\n<blockquote cite=\"https:\/\/mastodon.social\/@schwa\/109997417512745627\"><p>(NS)Scanner is truly one of the most under appreciated features of Foundation. I use it whenever I need to do structured parsing of text when a simple regex isn&rsquo;t appropriate (or even possible).<\/p><p>But why limit your Scanning to just Strings?<\/p><p>Here&rsquo;s my <code><a href=\"https:\/\/github.com\/schwa\/Everything\/blob\/main\/Sources\/Everything\/Parsing\/CollectionScanner.swift\">CollectionScanner<\/a><\/code> that can scan any collection of arbitrary elements. Useful if you need to process arrays of data that aren&rsquo;t necessarily Strings.<\/p><\/blockquote>\n<p>Indeed, I&rsquo;ve found it really useful to have a <code>Data<\/code> scanner.<\/p>","protected":false},"excerpt":{"rendered":"<p>My first tip goes back to when I started using NSScanner in the Puma days. In short, you should never call scanCharacters(from:into:) in a loop because every time it&rsquo;s called it creates an inverted copy of the character set. It then delegates to NSString.rangeOfCharacter(from:options:range:), passing that copy. The documentation contains the cryptic comment: Using the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2023-02-23T19:33:41Z","apple_news_api_id":"0b2285ba-4b9c-4b87-ba25-ae629370c235","apple_news_api_modified_at":"2023-03-10T20:15:39Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAg==","apple_news_api_share_url":"https:\/\/apple.news\/ACyKFukucS4e6Ja5ik3DCNQ","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[69,30,1387,2223,138,71,901],"class_list":["post-38572","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-cocoa","tag-mac","tag-mac-os-x-10-1-puma","tag-macos-13-ventura","tag-optimization","tag-programming","tag-swift-programming-language"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/38572","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=38572"}],"version-history":[{"count":5,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/38572\/revisions"}],"predecessor-version":[{"id":38734,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/38572\/revisions\/38734"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=38572"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=38572"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=38572"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}