Monday, November 19, 2018

APFS in 2018

Howard Oakley:

Before explaining how Mojave’s version 3 APFS handles Fusion Drives, let me remind you how they work in good old HFS+.

Jonas Plum:

We developed different approaches to identify and recover (deleted) files on an APFS file system and published a paper about the used methodologies. Additionally, we implemented the open source recovery tool afro which was released three months ago. By using afro, we evaluated and compared the different approaches amongst each other and identified the method that so far delivers the best results and compared it to photorec. This showed that AFRO outperforms photorec on the evaluated APFS dataset. In the presentations of this research we were often asked if other tools like Blackbags Blacklight do not already support this recovery process. So, we decided to compare the file recovery capabilities of BlackLight and afro. We wanted to compare afro to the sleuth kit as well, as at the DFRWS conference it was discussed about adding APFS Support to The Sleuthkit Framework, but no implementations are public yet.

Gregory Szorc (via Peter Steinberger):

I sample profiled all processes on the system when running the Mercurial test harness. Aggregate thread stacks revealed a common pattern: readdir() being in the stack.

[…]

While the source code for APFS is not available for me to confirm, the profiling results showing excessive time spent in lck_mtx_lock_grab_mutex() combined with the fact that execution time decreases when the parallel process count decreases leads me to the conclusion that APFS obtains a global kernel lock during read-only operations such as readdir(). In other words, APFS slows down when attempting to perform parallel read-only I/O.

[…]

It is apparent that macOS 10.14 Mojave has received performance work relative to macOS 10.13! Overall kernel CPU time when performing parallel directory walks has decreased substantially - to ~50% of original on some invocations! Stacks seem to reveal new code for lock acquisition, so this might indicate generic improvements to the kernel’s locking mechanism rather than APFS specific changes. Changes to file metadata caching could also be responsible for performance changes. Although it is difficult to tell without access to the APFS source code. Despite those improvements, APFS is still spending a lot of CPU time in the kernel. And the kernel CPU time is still comparatively very high compared to Linux/EXT4, even for single process operation.

Jonathan Levin (PDF, via Objective-See):

APFS has become the de facto file system for MacOS and iOS as for 10.13/10.3- but what do we really know about it? Apple has promised the spec would be released “later this year” … over two years ago!

Reversing the complex filesystem structures, container blocks, snapshots and trees is a lousy job, but someone had to do it. Jonathan will present the unofficial APFS specification as it appears in Volume II of the “*OS Internals” trilogy, and present a free tool for inspecting and traversing APFS partitions and disk images for MacOS, iOS - and Linux.

Previously: macOS 10.14 Mojave Released, Apple File System Reference.

Update (2018-12-04): Monkeybread Software:

This is a blog article to remind everyone writing application in C to no longer use FSCreateFileUnicode function because it is very slow on APFS in our tests.

Update (2019-02-07): Howard Oakley:

I turned then to the latest and much fuller APFS documentation, where there is no mention of Fast Directory Sizing at all.

[…]

Here it all gets more difficult, because macOS doesn’t appear to provide high level languages such as Swift or Objective-C with ready access to the j_inode_flags which can tell whether it is enabled, nor does it provide any single call to return the size of a folder which might take advantage of this new feature of APFS.

[…]

It’s time for FileManager to get a function to return a folder’s total size (and total number of items) making best use of APFS Fast Directory Sizing when it’s available.

2 Comments RSS · Twitter

I like to point out that the good old "CatSearch" function, now only available on the BSD API level as "searchfs()", has become about 5 to 6 times slower when used on APFS vs. HFS+ (http://www.openradar.me/radar?id=4933961239232512). Basically, searchfs on APFS is just about as fast as iterating recursively using high-level directory scanning operations, even though the latter cross the kernel/userland barrier much more often, and transport more data through it. No idea what's up with that, but that, too, suggests that APFS still has potential for some performance improvements.

Leave a Comment