Friday, March 31, 2017

APFS to Add Case-Insensitive Variant for Mac

Apple has updated its APFS Guide (via Thomas Zoechling):

APFS has case-sensitive and case-insensitive variants. The case-insensitive variant of APFS is normalization-preserving, but not normalization-sensitive. The case-sensitive variant of APFS is both normalization-preserving and normalization-sensitive. Filenames in APFS are encoded in UTF-8 and aren’t normalized.

HFS+, by comparison, is not normalization-preserving. Filenames in HFS+ are normalized according to Unicode 3.2 Normalization Form D, excluding substituting characters in the ranges U+2000–U+2FFF, U+F900–U+FAFF, and U+2F800–U+2FAFF.

The first developer preview of APFS, made available in macOS Sierra in June 2016, offered only the case-sensitive variant. In macOS 10.12.4, the APFS developer preview was updated to also include a case-insensitive variant. In iOS 10.3, the case-sensitive variant of APFS is used.

[…]

Directory hard links are not supported by Apple File System. All directory hard links are converted to symbolic links or aliases when you convert from HFS+ to APFS volume formats on macOS.

[…]

Apple plans to document and publish the APFS volume format specification when Apple File System is released for macOS in 2017.

Regarding the normalization issues that I raised last week:

Developers should be aware of behavior differences between normalization sensitivity and insensitivity which may arise when an iOS device upgrades to iOS 10.3 and migrates the filesystem from HFS+ to APFS. For example, attempting to create a file using one normalization behavior and opening that file using another normalization behavior may result in ENOENT, or “File Not Found” errors. Additionally, storing filenames externally, such as in the defaults database, CoreData, or iCloud storage may cause problems if the normalization scheme of the filename being stored is different from what exists on-disk.

But Apple doesn’t describe any solutions.

It’s also not documented how long APFS filenames can be. It would be nice to have an API for this.

Update (2017-03-31): I think “normalization-preserving, but not normalization-sensitive” means that (like HFS+ on the Mac, unlike APFS on iOS) you cannot have multiple files whose names differ only in normalization. And you can look up a file using the “wrong” normalization and still find it. Additionally, beyond what HFS+ offers, if you create a file and then read the directory contents, you’ll see the filename listed using the same normalization that you used.

Update (2017-04-02): Here’s a thread with someone confused because Apple’s guide said that using NSURL would handle the normalization issues, but it didn’t.

Update (2017-04-07): Howard Oakley:

The TL;DR is that both variants of APFS will cause problems – they are just different problems requiring different solutions. Either way, many current apps, tools, and scripts will perform strangely when run on APFS, and many will therefore need to be revised and updated to cope with it.

Update (2017-04-14): DropDMG 3.4.6 adds support for creating blank case-insensitive APFS disk images to help developers test their Mac apps with the new file system.

Update (2017-06-19): See also WWDC 2017 session 715.

10 Comments RSS · Twitter

For the benefit of those of us that are technically-minded, but not with file-systems, can you provide examples of each scenario?

@Ted Here’s an example:

A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é (Latin small letter e with acute accent). Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

So with the “bag of bytes” you can have two filenames that look like “é” but are made up of different byte sequences. On iOS, if you try to read the file “e followed by acute accent” but it was saved as “Latin small letter e with acute accent”, you will not find the file. With APFS on the Mac, you will. With HFS+, no matter which name you use when saving the file, you’ll get “e followed by acute accent” when you list the directory. With APFS on the Mac, you’ll get the one that you used when creating the file.

> It’s also not documented how long APFS filenames can be. It would be nice to have an API for this.

This should be available via pathconf(2) with _PC_PATH_MAX or _PC_NAME_MAX.

@Mark Thanks, however I’m not sure I trust that API. HFS+ is supposed to allow 255 UTF-16 encoding units (kHFSPlusMaxFileNameChars) but pathconf(_PC_NAME_MAX) returns 255 for me for both HFS+ and APFS paths. Same with _PC_NAME_CHARS_MAX, although I’m not sure what that is supposed to mean. The man page say that _PC_NAME_MAX is bytes, so 255 is the wrong answer for HFS+.

Also, there doesn’t seem to be a pathconf() equivalent to kHFSMaxVolumeNameChars.

[…] default for macOS 10.13 will be case-insensitive APFS. It is normalization-preserving (unlike HFS+) but not normalization-sensitive. I expect this to be […]

Vivek Verma

>The man page say that _PC_NAME_MAX is bytes, so 255 is the wrong answer for HFS+

The explanation for this is going to sound language lawery but this answer is correct for HFS (and for APFS). This applies to the input name you can give to path based system calls and the input encoding for all path based APIs on macOS and iOS on macOS is UTF8.

The answer applies to pathnames that are constructed out the "Portable Filename Character Set" as defined by POSIX ( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282 ) which is a subset of ASCII. If you were to construct a pathname out of those characters only, and 255 UTF-16 encoding units for those will take 255 bytes in utf8 and that is the maximum that you can provide as input to the APIs. If you use code points from outside the ASCII range, HFS (and APFS) will aceept greater byte lengths (depending on how many bytes it takes to encode them in utf8. I suspect _PC_NAME_CHARS_MAX is intended to be the reality of 255 unicode "characters"/code points/code units" not matter how many bytes they actually take in utf8 (which is a variable length encoding)

@Vivek Thanks for that explanation. I see what you’re saying about _PC_NAME_MAX. It makes sense but doesn't seem very helpful for real world use.

Regarding _PC_NAME_CHARS_MAX, are you saying that the APFS limit is the same as HFS+ (255 UTF-16 encoding units regardless of how many UTF-8 bytes that is)? Or has it been expanded to 255 Swift-style Unicode characters, each of which might be multiple code points?

Vivek Verma

Same as HFS, 255 UTF 16 code units though potentially allowing more characters in some cases since the HFS limit was 255 code units _after_ decomposition. But APFS has an additional restiiction, it won't allow creation of filenames with "unassigned" code points.

@Vivek Great—thanks so much for clarifying that. Sounds like my existing code for handling the max filename limit will still work.

Now I’m wondering what will happen if I create an APFS file with 255 precomposed characters and then let Time Machine try to back it up to HFS+…

[…] The APFS guide doesn’t even say what the filename limit is, although I was able to get an answer: 255 UTF-16 code units, which unlike HFS+ may be […]

Leave a Comment