Tuesday, June 27, 2017

APFS Native Normalization

The iOS transition to APFS seems to have gone very smoothly except for some Unicode normalization issues. Apple never really explained to developers how they could make their code work properly, most were not aware that there were issues at all, and the necessary app modifications were difficult to develop and fully test. In my view, pushing this responsibility onto apps was a recipe for endless obscure bugs and poor performance.

At WWDC 2017, Apple essentially admitted that they had made a mistake and told us how they are going to fix it. There is a short-term fix and also a long-term fix that will require another file system conversion. This is not yet documented in the APFS Guide, but here’s a summary of the different cases:

Update (2017-12-14): Despite the native normalization, I’m seeing problems with Git and accented filenames on macOS 10.13.2. If I edit a file with such a name, Git sees it as a new file, and therefore sees two files whose names differ only in normalization. It’s somewhat tricky to then remove the original entry.

16 Comments RSS · Twitter


What does “APFS now uses normalization-insensitive hashes” mean?
How can hashes work without normalization?


@Anonymous It means that the hash is calculated in such a way that two filenames that differ only in their Unicode normalization will get the same hash. In other words, it’s as if they are normalized and then hashed.


Vivek Verma

To add additional details

- APFS stores the filename as it is received. No transformation is applied. As long as the filename is encoded correctly in utf8, the filename is accepted. The ondisk format is also utf8 ( HFS expects to get utf8 and stores ondisk in utf16 )

To achieve normalization insensitivity, Instead of storing normalized filenames on disk, APFS stores the filenames as passed to it and stores a hash of the normalized AND case folded version of the filename ( for case insensitive APFS ) and normalized version of the filename (for case sensitive APFS) alongwith the filename.


Vivek Verma

The documentation referenced in this blog post has now been updated for macOS High Sierra and iOS 11.

https://developer.apple.com/library/content/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html


@Vivek Thanks. Is there any documentation for the new Fast Directory Sizing feature? The guide mentions that it exists, but not what the APIs are.


Vivek Verma

> Is there any documentation for the new Fast Directory Sizing feature

Unfortunately not yet. A more documentation project is in the works but at this time it is hard to say when it will be available. Radars for issues which are affecting you will help.


[…] If you do anything with file names in iOS or macOS, make sure you read APFS Native Normalization […]


[…] directory sizing is also undocumented. The APFS guide doesn’t even say what the filename limit is, although I was able to get an […]



Sebastian Röder

I have the same problem you describe with git on macOS 10.13.2. Have you tried globally setting the git config option `core.precomposeunicode=false` (it it true by default on macOS, see https://git-scm.com/docs/git-config#git-config-coreprecomposeUnicode for details).

You can easily test the difference between the the two options by running `git -c core.precomposeunicode=true status` and `git -c core.precomposeunicode=false status` inside the problematic repository.

The only thing that confuses me, is that the same problem with git exists for me on APFS and HFS+ on macOS 10.13.2. I would expect that AFPS would need `precomposeunicode=false` and HFS+ would need `precomposeunicode=true`.


@Sebastian I set that to true after resolving the problem and so far have not encountered further problems since. I think you do want it to be true for APFS as well, in case other Macs are using the repository with HFS+ or you have apps that use different normalizations.


The issue with git goes deeper than normalization. If you have a repo with a file named `doc` and another named `Doc`, APFS will treat it as a single file. This causes git to believe that one of them is deleted. There appears to be no way to get around this, making working with git on macOS 10.13+ rather challenging.


Hi, I have a question: does anybody know whether Apple has, in the meantime, applied the conversion to native normalization on a subsequent iOS update for all users?

I upgraded my iPhone from iOS 8 all the way up to 12, following the "official" upgrade path (using iTunes always), and I just don't know whether my device runs on "runtime normalization" or on "native normalization". In fact, I wasn't aware of the issue until just recently.

Is there any way to determine in which normalization mode APFS is currently running on iOS?
Or is it known among developers that some iOS version finally forced the new APFS filesystem conversion to native normalization also on "normal upgrade", and if yes, which one is it?

If anybody of you very well-informed guys knows, please let me know! ;) Thanks a lot!!

best regards,
Fabian


@fabian I have not heard anything about Apple applying the conversion, so my guess is that if you kept upgrading and ever erased/restored you probably still have runtime normalization.


Hi Michael, thank you very much for that information!


[…] normalisation; this had to be resolved in later versions of macOS 10.13 and iOS 10, as explained here and […]

Leave a Comment