{"id":24691,"date":"2019-03-21T15:53:23","date_gmt":"2019-03-21T19:53:23","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=24691"},"modified":"2019-03-21T15:53:50","modified_gmt":"2019-03-21T19:53:50","slug":"utf-8-string-in-swift-5","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2019\/03\/21\/utf-8-string-in-swift-5\/","title":{"rendered":"UTF-8 String in Swift 5"},"content":{"rendered":"<p><a href=\"https:\/\/swift.org\/blog\/utf8-string\/\">Michael Ilseman<\/a>:<\/p>\n<blockquote cite=\"https:\/\/swift.org\/blog\/utf8-string\/\">\n<p>Switching to UTF-8 fulfills one of String&rsquo;s long-term goals to enable <a href=\"https:\/\/github.com\/apple\/swift\/blob\/master\/docs\/StringManifesto.md#high-performance-string-processing\">high-performance processing<\/a>, which is the <a href=\"https:\/\/bugs.swift.org\/browse\/SR-7602\">most passionate request<\/a> from performance-sensitive developers. It also lays the groundwork for providing even more performant APIs in the future. String&rsquo;s preferred encoding is baked into Swift&rsquo;s ABI for performance, so it was imperative that this switch happen in time for ABI stability in Swift 5.<\/p>\n<p>[&#8230;]<\/p>\n<p>Swift 5, like Rust, performs encoding validation once on creation, when it is far more efficient to do so. NSStrings, which are lazily bridged (zero-copy) into Swift and use UTF-16, may contain invalid content (i.e. isolated surrogates). As in Swift 4.2, these are lazily validated when read from.<\/p>\n<\/blockquote>\n<p>This sounds great, as I&rsquo;ve run into problems in Objective-C where strings that are not valid Unicode would cause strange failures a layer or two below my code. I don&rsquo;t see it documented <a href=\"https:\/\/bugs.swift.org\/browse\/SR-7602?focusedCommentId=35396&amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-35396\">what happens<\/a> when validation fails, but my guess <a href=\"https:\/\/github.com\/apple\/swift\/blob\/81e87ac83817280418b9c7dee0a1703546304c61\/stdlib\/public\/core\/StringUTF8Validation.swift\">from the code<\/a> is that it repairs the string using replacement characters. That makes sense given the cases I&rsquo;ve seen. Set one bad attribute on a managed object, and the entire context fails to save. If validation were eager, <em>maybe<\/em> I could do better at the point of creation than replacement characters (assuming I&rsquo;m even creating the strings myself). But, this much later, I don&rsquo;t think there&rsquo;s much to be done. It&rsquo;s not worth risking data loss for the common case where the developer hasn&rsquo;t anticipated this happening and  written code to fix the strings.<\/p>\n<blockquote cite=\"https:\/\/swift.org\/blog\/utf8-string\/\"><p>As mentioned above, Swift 5 switches from two native storage representations to one. This allows for better analyses and more aggressive optimizations with fewer potential code-size or compilation time costs.<\/p>\n<p>For example, inlining is a compiler optimization that can improve run-time performance at a potential cost to code size. In Swift 4.2, most string methods contained a pair of implementations, one for each storage representation. No matter what form a 4.2 string was in, an entire portion of potentially-inlined code wouldn&rsquo;t even be run; this increases the cost and diminishes the benefits of inlining. Furthermore, the greatest benefits of inlining come from follow-on analyses and optimizations specific to one call-site, which are exponentially more difficult to perform on a dual representation. Swift 5&rsquo;s unified storage representation is far more amenable to inlining and follow-on optimizations.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/forums.swift.org\/t\/piercing-the-string-veil\/21700\">Michael Ilseman<\/a>:<\/p>\n<blockquote cite=\"https:\/\/forums.swift.org\/t\/piercing-the-string-veil\/21700\"><p>String remembers performance-relevant information about its contents through the use of <a href=\"https:\/\/github.com\/apple\/swift\/blob\/de75829d7d5a181453f331e39e3bdbf8d0cc6765\/stdlib\/public\/core\/StringObject.swift#L604\">performance flags<\/a>.<\/p>\n<p>For example, a String that is known to be all-ASCII has a trivial UTF8View, UTF16View, and UnicodeScalarView. Also, mapping offsets between the two code unit views is trivial, so there is no need for any bookkeeping as part of Cocoa interop.<\/p><\/blockquote>\n\n<p>Previously: <a href=\"https:\/\/mjtsai.com\/blog\/2018\/11\/06\/strings-abi-and-utf-8\/\">String&rsquo;s ABI and UTF-8<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Michael Ilseman: Switching to UTF-8 fulfills one of String&rsquo;s long-term goals to enable high-performance processing, which is the most passionate request from performance-sensitive developers. It also lays the groundwork for providing even more performant APIs in the future. String&rsquo;s preferred encoding is baked into Swift&rsquo;s ABI for performance, so it was imperative that this switch [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2019-03-21T19:53:26Z","apple_news_api_id":"97bc2f9a-2f3f-4598-978d-24d6017f0394","apple_news_api_modified_at":"2019-03-21T19:53:54Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAA==","apple_news_api_share_url":"https:\/\/apple.news\/Al7wvmi8_RZiXjSTWAX8DlA","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[69,109,31,1610,46,30,1609,138,71,901,258],"class_list":["post-24691","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-cocoa","tag-coredata","tag-ios","tag-ios-12","tag-languagedesign","tag-mac","tag-macos-10-14","tag-optimization","tag-programming","tag-swift-programming-language","tag-unicode"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/24691","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=24691"}],"version-history":[{"count":2,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/24691\/revisions"}],"predecessor-version":[{"id":24693,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/24691\/revisions\/24693"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=24691"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=24691"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=24691"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}