APFS Native Normalization
The iOS transition to APFS seems to have gone very smoothly except for some Unicode normalization issues. Apple never really explained to developers how they could make their code work properly, most were not aware that there were issues at all, and the necessary app modifications were difficult to develop and fully test. In my view, pushing this responsibility onto apps was a recipe for endless obscure bugs and poor performance.
At WWDC 2017, Apple essentially admitted that they had made a mistake and told us how they are going to fix it. There is a short-term fix and also a long-term fix that will require another file system conversion. This is not yet documented in the APFS Guide, but here’s a summary of the different cases:
The default for macOS 10.13 will be case-insensitive APFS. It is normalization-preserving (unlike HFS+) but not normalization-sensitive. I expect this to be highly compatible with existing Mac apps. The main difference is that when you read filenames they are no longer necessarily in Form D, but you shouldn’t have been relying on that, anyway.
macOS 10.13 will also support case-sensitive APFS, which will use native normalization. This is new in the developer beta. The filenames are still stored in the same way as prior APFS (not normalized like with HFS+), but APFS now uses normalization-insensitive hashes so that it can quickly and transparently find files without knowing their normalizations. If your code worked with case-sensitive HFS+ and works with case-insensitive APFS, there’s likely nothing new that you have to do for this case.
iOS 10.3 through 10.3.2 use the problematic version of APFS that is case-sensitive, normalization-preserving, and normalization-sensitive. You can write a lot of app code to make everything work, but anyone who hasn’t done this already probably won’t.
iOS 10.3.3 and iOS 11 will also be case-sensitive, normalization-preserving, and normalization-sensitive, but they will add runtime normalization. If you try to read a file but don’t have the right normalization in your path, the file system APIs will transparently look for the file using other normalizations. This should give the correct behavior but at a performance cost.
If you get a new device or erase and restore, iOS 11 will use case-sensitive APFS with native normalization. This is what Apple should have done from the start. It should have basically the same user experience as with HFS+ but with better performance.
An unspecified future update will convert iOS devices using the “bad” APFS to case-sensitive with native normalization, thus completing the fix.
Update (2017-12-14): Despite the native normalization, I’m seeing problems with Git and accented filenames on macOS 10.13.2. If I edit a file with such a name, Git sees it as a new file, and therefore sees two files whose names differ only in normalization. It’s somewhat tricky to then remove the original entry.
16 Comments RSS · Twitter
What does “APFS now uses normalization-insensitive hashes” mean?
How can hashes work without normalization?
@Anonymous It means that the hash is calculated in such a way that two filenames that differ only in their Unicode normalization will get the same hash. In other words, it’s as if they are normalized and then hashed.
To add additional details
- APFS stores the filename as it is received. No transformation is applied. As long as the filename is encoded correctly in utf8, the filename is accepted. The ondisk format is also utf8 ( HFS expects to get utf8 and stores ondisk in utf16 )
To achieve normalization insensitivity, Instead of storing normalized filenames on disk, APFS stores the filenames as passed to it and stores a hash of the normalized AND case folded version of the filename ( for case insensitive APFS ) and normalized version of the filename (for case sensitive APFS) alongwith the filename.
The documentation referenced in this blog post has now been updated for macOS High Sierra and iOS 11.
@Vivek Thanks. Is there any documentation for the new Fast Directory Sizing feature? The guide mentions that it exists, but not what the APIs are.
> Is there any documentation for the new Fast Directory Sizing feature
Unfortunately not yet. A more documentation project is in the works but at this time it is hard to say when it will be available. Radars for issues which are affecting you will help.
[…] If you do anything with file names in iOS or macOS, make sure you read APFS Native Normalization […]
[…] directory sizing is also undocumented. The APFS guide doesn’t even say what the filename limit is, although I was able to get an […]
I have the same problem you describe with git on macOS 10.13.2. Have you tried globally setting the git config option `core.precomposeunicode=false` (it it true by default on macOS, see https://git-scm.com/docs/git-config#git-config-coreprecomposeUnicode for details).
You can easily test the difference between the the two options by running `git -c core.precomposeunicode=true status` and `git -c core.precomposeunicode=false status` inside the problematic repository.
The only thing that confuses me, is that the same problem with git exists for me on APFS and HFS+ on macOS 10.13.2. I would expect that AFPS would need `precomposeunicode=false` and HFS+ would need `precomposeunicode=true`.
@Sebastian I set that to true after resolving the problem and so far have not encountered further problems since. I think you do want it to be true for APFS as well, in case other Macs are using the repository with HFS+ or you have apps that use different normalizations.
The issue with git goes deeper than normalization. If you have a repo with a file named `doc` and another named `Doc`, APFS will treat it as a single file. This causes git to believe that one of them is deleted. There appears to be no way to get around this, making working with git on macOS 10.13+ rather challenging.
Hi, I have a question: does anybody know whether Apple has, in the meantime, applied the conversion to native normalization on a subsequent iOS update for all users?
I upgraded my iPhone from iOS 8 all the way up to 12, following the "official" upgrade path (using iTunes always), and I just don't know whether my device runs on "runtime normalization" or on "native normalization". In fact, I wasn't aware of the issue until just recently.
Is there any way to determine in which normalization mode APFS is currently running on iOS?
Or is it known among developers that some iOS version finally forced the new APFS filesystem conversion to native normalization also on "normal upgrade", and if yes, which one is it?
If anybody of you very well-informed guys knows, please let me know! ;) Thanks a lot!!
best regards,
Fabian
@fabian I have not heard anything about Apple applying the conversion, so my guess is that if you kept upgrading and ever erased/restored you probably still have runtime normalization.
[…] normalisation; this had to be resolved in later versions of macOS 10.13 and iOS 10, as explained here and […]