Thursday, December 31, 2020

NSURL/SMB Precomposed Character Bug

Thomas Tempelmann (tweet):

So, if a user on a Linux system creates a folder that contains an Umlaut such as “ü”, it’ll end up precomposed on the (ext4) file system.

Now, I can nicely access files in such a folder from macOS, as long as I only use POSIX functions (which includes the shells such as bash and zsh).

However, when I use NSURL operations, some work and others give me a -260 error. For instance, getting NSURLCanonicalPathKey fails, even in macOS 11.1. Other higher-level functions fail as well, such as trying to open the item with [NSWorkspace openURLs:…]. The NSURL’s path property does still hold the original precomposed name, and if I get the path and pass it to a POSIX function, it works. And some of the more basic getResource accessors work as well. Just not the more complex ones.

This is not a surprise because, before relenting and adding Unicode normalization support to APFS, Apple tried to solve the problem in the Cocoa layer. But that only ended up being a partial solution, the result being that some APIs never worked properly until file system support was added. Now it seems that same code is doing unwanted conversions before the paths make it down to the lower level APIs.


2 Comments RSS · Twitter

For the record:

The issue can also be caused solely on a Mac, when accessing a share on a NAS: Mount the share with NFS on the Mac, create a file with accented chars or umlauts in its name, then mount the same share via SMB and try to access the file - won't work.

Also, I suspect that Apple simply wanted to ensure that Mac apps can keep accessing files on SMB with the same normalization rules they can expect from local HFS and APFS volumes: I.e. that the file lookup is normalization-_in_sensitive.

But SMB is usually normalization sensitive, and when the server is a Linux system, it'll store files in precomposed format, whereas files on the Mac are _usally_ using decomposed names. Without SMB trying to normalize the names (i.e. precomposing names when sending them to the server, and decomposing them when passing names from server to the Mac apps), we'd run into lots of troubles where apps would not be able to handle files on a SMB share the same it works on local file systems.

Problems with this procedere are:

1. NFS does not do this conversion, but should, at least in the case where it also accesses a share that uses precomposed names by default. So far, NFS is normalization-sensitive.
2. If a file ends up on a Linux SMB server with decomposed names (which can be caused by creating it on a Mac via NFS, for instance, see my previous comment), it can't be accessed by a Mac any more: While reading the file name from the server will transmit the original name (decomposed) works, looking up the name (in order to open the file) fails because the SMB client will precompose the name when sending it to the server, and the server then can't find the name because its file system is normalization-sensitive, and thus won't find the decomposed name when looking for it with the precomposed name.

The cleanest solution would be to use a file system on the server (NAS) that is normalization-insensitive. But that's not likely to happen. Instead, we'll have to hope that files on the server never use decomposed names. Making NFS behave like SMB in regards to normalization would be one step that Apple could do. But there remains cases where users can, outside of macOS' control, still create file names on a Linux file system in decomposed form. They're fairly rare, though (one is, for example, to use the NAS's web browser UI to rename a file, which may then use decomposed chars when typing them on a Mac).

Leave a Comment