Friday, November 6, 2015

Why Is Swift’s String API So Hard?

Mike Ash:

Incidentally, I think that representing all these different concepts as a single string type is a mistake. Human-readable text, file paths, SQL statements, and others are all conceptually different, and this should be represented as different types at the language level. I think that having different conceptual kinds of strings be distinct types would eliminate a lot of bugs.

[…]

Swift’s String type takes a different approach. It has no canonical representation, and instead provides views on various representations of the string. This lets you use whichever representation makes the most sense for the task at hand.

[…]

Going from an arbitrary sequence of UTF-16 code units back to a String is pretty obscure. UTF16View has no public initializers and few mutating functions. The solution is to use the global transcode function, which works with the UnicodeCodecType protocol. There are three implementations of this protocol: UTF8, UTF16, and UTF32. The transcode function can be used to convert between them. It’s pretty gnarly, though. For the input, it takes a GeneratorType which produces the input, and for the output it takes a function which is called for each unit of output. This can be used to build up a string piece by piece by converting to UTF32, then converting each UTF-32 code unit to a UnicodeScalar and appending it to a String[…]

[…]

The various views are all indexable collections, but they are very much not arrays. The index types are weird custom structs. This means you can’t index views by number […] Instead, you have to start with either the collection’s startIndex or endIndex, then use methods like successor() or advancedBy() to move around […] Why not make it easier, and allow indexing with an integer? It’s essentially Swift’s way of reinforcing the fact that this is an expensive operation.

Comments RSS · Twitter

Leave a Comment