Archive for July 31, 2015

Friday, July 31, 2015

NSTaggedPointerString

Mike Ash (comments):

Thus we can see that the structure of the tagged pointer strings is:

  1. If the length is between 0 and 7, store the string as raw eight-bit characters.
  2. If the length is 8 or 9, store the string in a six-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013bDNvwyUL2O856P-B79AFKEWV_zGJ/HYX".
  3. If the length is 10 or 11, store the string in a five-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013"

[…]

The five-bit alphabet is extremely limited, and doesn’t include the letter b! That letter must not be common enough to warrant a place in the 32 hallowed characters of the five-bit alphabet.

[…]

Because this table is used for both six-bit and five-bit encodings, it makes sense that it wouldn’t be entirely in alphabetical order. Characters that are used most frequently should be in the first half, while characters that are used less frequently should go in the second half. This ensures that the maximum number of longer strings can use the five-bit encoding.

Address Sanitizer

WWDC 2015 Session 413 (video, PDF):

Address Sanitizer is an LLVM tool for C-based languages, that serves the same purpose as Guard Malloc, as it finds memory errors at runtime, however, it has a lot of benefits over the other tools.

It has much less runtime overhead and it also produces comprehensive, detailed diagnostics that are integrated directly into the Xcode UI.

What’s also very important is that it is the only such tool that works on iOS devices.

Nigel Timothy Barber:

The Xcode 7 Address Sanitizer is a revelation. Essential tool.

Mike Ash:

Address Sanitizer uses a simple but brilliant approach. It reserves a fixed section within the process’s address space called the shadow memory. In Address Sanitizer terms, a byte that is marked as forbidden is “poisoned,” and the shadow memory tracks which bytes are poisoned. A simple formula translates each address within the process’s address space into a spot in the shadow memory. Each eight-byte chunk of regular memory maps to a byte of shadow memory, which tracks the poison state of those eight bytes.

[…]

With this table structure in place, Address Sanitizer generates extra code in the program to check every read and write through a pointer, and throw an error if the memory in question is poisoned. This is the advantage of being integrated into the compiler and not merely existing as an external library or runtime environment: every pointer access can be reliably identified and the appropriate checks added into the machine code.

Compiler integration also allows neat tricks like the ability to poison and guard local and global variables, not just heap allocations. Locals and globals are allocated with a bit of extra padding in between them, and the padding is poisoned to catch any overflows. This is something that Guard Malloc can’t do, and that Valgrind has difficulty with.

Keith Harrison:

It is important to understand that the Address Sanitizer is a run-time tool so you need to exercise your code for it to find problems. This makes it a good candidate to enable when running unit and UI tests. Whilst Apple claims it has low overhead (2x-5x CPU and 2x-3x memory) it is not performance free so you may want to experiment if you have large test suites before leaving it enabled.

Objective-C Improvements and Swift Interoperability

WWDC 2015 Session 401 (video, PDF):

Go to any Objective-C header and choose the ‘show related items’ button in the top-left corner. This will bring down a menu of related items, one of which is ‘generated interface.’ And this will show you the Swift mapping for that header. This is the exact same view that you got in Xcode 6 using the ‘jump to definition’ feature, but now you can get it easily from any header in your target.

[…]

Now, on both of these examples, I have shown that these methods in Objective-C have multiple parameters, only one of which is the error parameter, but there are also cases where methods only have one parameter and you will get something like ‘check resource is researchable and return error.’ As you can see in Swift, since we already know that the method can return an error from that ‘throws’ keyword, we will chop off those last three words for you, just for you!

I guess that answers my question.

Jordan Morgan:

The first step to making this better is one of my favorite new Objective-C features, nullability annotations.

[…]

Though it took about thirty-two years, generics is now a thing in Objective-C.

[…]

Essentially, we’ve informed the compiler that the property and its collection will have some kind of a UIView. This allows for more information in the type contract that we’ve not had before.

The real treat? Implicit downcasting.

drekka:

There’s a couple of this going on here. Firstly you will notice that we are not using ObjectType anywhere. It appears that the name of a Generic reference is only usable in the header of a class. I’ve tried all sorts of syntaxes to see if there was a way to declare and use it in the implementation, but I’ve been unsuccessful. So I’ve concluded that Objective-C Generics simply don’t make the placeholders available to implementations. Unfortunately I cannot confirm against Apples code.

What I have concluded is that when a Generic is resolved by the compiler, it resolves to the top most class that can replace it. Normally this is id, however bounded Generics can make this more specific. See Restricting Generics below.

Joe Groff:

ObjC generics also can’t be part of the canonical type system without breaking C++ templates

Bitcode

drfuchs:

I managed to ask Chris Lattner this very question at WWDC (during a moment when he wasn’t surrounded by adoring crowds). “So, you’re signaling a new CPU architecture?” But, “No; think more along the lines of ‘adding a new multiply instruction’. By the time you’re in Bitcode, you’re already fairly architecture-specific” says he.

Wolf Rentzsch:

surprised most dev gripes about bitcode is unpredictable optimization effects. Folks, we’ve been living in emulated ISAs since the 90s

I think bitcode is a huge win. About only thing I’ll miss is cool ISA-specific instructions. Now reliant on OS vendor providing access.

Landon Fuller:

I’m a lot less worried about emulated ISAs given that the chances I have to debug one are pretty much nil.

Bitcode: non-reproducible Apple-internal toolchain bugs, emergent bugs from undefined behavior that previously worked, etc ...

dshirley:

When it becomes a requirement to submit apps in bitcode format, how will this impact architecture specific code (ie. assembly, or anything that is ifdef’d for that matter). It makes sense that assembly isn’t converted to bitcode, but doesn’t everything need to be in bitcode in order for an archive to be fully encoded in bitcode? I have an app that’s hitting a compile warning when archiving complaining that a specific 3rd party library doesn’t contain bitcode so the app cannot be archived with bitcode. That 3rd party library won’t emit bitcode ostensibly because it contains assembly (I could be wong about the cause, though).

Rainer Brockerhoff:

I suppose this would also allow Swift ABIs to change at any time, without dylibs in the app.

See also Accidental Tech Podcast episodes 122, 123, and 124.

Update (2015-09-25): Alex Denisov:

This picture clearly demonstrates how communication between frontend and backend is done using IR, LLVM has it’s own format, that can be encoded using LLVM bitstream file format - Bitcode.

Just to recall it explicitly - Bitcode is a bitstream representation of LLVM IR.

Frederic Jacobs:

Bitcode will enable support for better microarchitecture support but gets nowhere close to target independence. Applications compiled for the armv7 target could still run on armv7s devices but additional optimisations make applications faster if they contain a armv7s slice. The advantage that Bitcode provides on top of app thinning is negligible in my opinion since it will only provide a slight speed up until the developer uploads a new build with the optimized slice.

[…]

The centralization of the building and signing process is what worries me: an adversary could find a vulnerability in the LLVM backend to obtain remote code execution on Apple’s Bitcode compilation infrastructure to inject a compiler trojan that would affect every single app on the App Store that was submitted with Bitcode.

Falsehoods Programmers Believe

A bunch of links via Jeff Atwood:

Update (2018-07-26): Dave DeLong:

Your calendrical fallacy is thinking…

Update (2019-05-16): Alex Chan (via Hacker News):

These three facts all seem eminently sensible and reasonable, right?

  1. Unix time is the number of seconds since 1 January 1970 00:00:00 UTC
  2. If I wait exactly one second, Unix time advances by exactly one second
  3. Unix time can never go backwards

False, false, false.

Apple Music Matches Files With Metadata Only

Kirk McElhearn (comments):

If you’ve used iTunes Match in the past, you may know that it matches music using acoustic fingerprinting, which means that iTunes scans the music, and matches it to the same music. It doesn’t matter what tags files have: you could have, say, a Grateful Dead song labeled as a song by 50 Cent, and iTunes Match will match the Grateful Dead song correctly. (Here’s how Wikipedia defines acoustic fingerprinting.)

Apple Music, however, works differently. It does not use the more onerous (in time and processing power) acoustic fingerprinting technique, but simply uses the tags your files contain. And it can lead to errors. Here’s an example of how this can be a bit surprising.

Update (2015-07-31): Marco Arment:

This is embarrassing. No wonder people have had so many problems and so much data loss with Apple Music’s cloud-library features.

It’s as if nobody who made this implementation decision had ever encountered remasters, re-recordings, clean versions, live performances, or the many other extremely common reasons why two very different audio recordings might have the same artist and title.

[…]

Don’t let these cloud-matching “features” anywhere near your music collection.

Update (2015-08-01): Kirk McElhearn:

I’ve been unable to reproduce this issue, and my guess is that there was a glitch with Apple’s servers that has since been corrected. If you only subscribe to Apple Music, or are using it on a free trial, then your songs are matched using metadata only. If you subscribe to both iTunes Match and Apple Music, then iTunes matches your songs using digital fingerprinting.

Marco Arment:

That this happened at all (and I got reports from many other people who were affected) means that iTunes Match is less trustworthy as primary storage, and it never really was trustworthy as primary storage because it has always been buggy and inconsistent, so my recommendations remain to avoid letting these features integrate with your music collection.

Serenity Caldwell:

Apple’s been doing Match (as have other companies) for four years now and none of them have it anywhere near perfect.

Marco Arment:

I wonder if matching is still the right choice. I bet most people’s music libraries are smaller than their photo libraries.

Update (2015-08-03): Kirk McElhearn:

FYI, I was able to reproduce the change-the-metadata-and-match thing in iTunes this morning. I made a screen recording of it.

SQLite FTS5

SQLite Release Notes:

Added the experimental FTS5 extension. Note that this extension is experimental and subject to change in incompatible ways.

SQLite FTS5 Extension:

The principle difference between FTS3/4 and FTS5 is that in FTS3/4, each instance-list is stored as a single large database record, whereas in FTS5 large instance-lists are divided between multiple database records. This has the following implications for dealing with large databases that contain large lists:

  • FTS5 is able to load instance-lists into memory incrementally in order to reduce memory usage and peak allocation size. FTS3/4 very often loads entire instance-lists into memory.

  • When processing queries that feature more than one token, FTS5 is sometimes able to determine that the query can be answered by inspecting a subset of a large instance-list. FTS3/4 almost always has to traverse entire instance-lists.

  • If an instance-list grows so large that it exceeds the SQLITE_MAX_LENGTH limit, FTS3/4 is unable to handle it. FTS5 does not have this problem.

[…]

FTS5 has no matchinfo() or offsets() function, and the snippet() function is not as fully-featured as in FTS3/4. However, since FTS5 does provide an API allowing applications to create custom auxiliary functions, any required functionality may be implemented within the application code.

The set of built-in auxiliary functions provided by FTS5 may be improved upon in the future.