Friday, July 31, 2015


Mike Ash (comments):

Thus we can see that the structure of the tagged pointer strings is:

  1. If the length is between 0 and 7, store the string as raw eight-bit characters.
  2. If the length is 8 or 9, store the string in a six-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013bDNvwyUL2O856P-B79AFKEWV_zGJ/HYX".
  3. If the length is 10 or 11, store the string in a five-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013"


The five-bit alphabet is extremely limited, and doesn’t include the letter b! That letter must not be common enough to warrant a place in the 32 hallowed characters of the five-bit alphabet.


Because this table is used for both six-bit and five-bit encodings, it makes sense that it wouldn’t be entirely in alphabetical order. Characters that are used most frequently should be in the first half, while characters that are used less frequently should go in the second half. This ensures that the maximum number of longer strings can use the five-bit encoding.

1 Comment RSS · Twitter

[…] them may be inlined. In contrast, Cocoa’s NSString has an internal representation that has evolved over time. This was possible because access was indirected through the Objective-C […]

Leave a Comment