Apple File System is a new, modern file system for iOS, OS X, tvOS and watchOS. It is optimized for Flash/SSD storage and features strong encryption, copy-on-write metadata, space sharing, cloning for files and directories, snapshots, fast directory sizing, atomic safe-save primitives, and improved file system fundamentals.
A Developer Preview of Apple File System is included in OS X 10.12. The APFS on-disk volume format is pre-release and subject to change. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.
APFS also adds a copy-on-write metadata scheme that Apple calls “Crash Protection,” which aims to ensure that file system commits and writes to the file system journal stay in sync even if something happens during the write—like if the system loses power.
Snapshots and clones both are going to be available in APFS. Snapshots let you throw off a read-only instance of a file system at any given point in time; as the file system’s state diverges away from the snapshot, the changed blocks are saved as part of the snapshot. This is similar in concept to Microsoft’s shadow copies, and it’s an incredibly handy feature. There are obviously huge implications here in how Time Machine works—a true file system set of snapshots could totally replace the kludgy and aging mechanism of hard links that Time Machine builds and maintains.
Clones differ from snapshots in that clones are writable instead of read-only. According to the documentation, APFS can create file or directory clones—and like a proper next-generation file system, it does so instantly, rather than having to wait for data to be copied. A cloned file or directory stores the changes made between it and the original objects, giving you a writable, editable, point-in-time copy of a file or a directory. As the documentation points out, this is an easy way to create document revisions or do versioning of anything you might want to track.
Also interesting is the concept of “space sharing,” where multiple volumes can be created out of the same chunk of underlying physical space. This sounds on first glance a lot like enterprise-style thin provisioning, where you can do things like create four 1TB volumes on a single 1TB disk, and each volume grows as space is added to it. You can add physical storage to keep up with the volume’s growth without having to resize the logical volume.
Although APFS does checksum metadata blocks it does not do anything to provide resilience for data blocks. That is a huge omission in a modern filesystem, a point I tried to politely but forcefully make in the File System Lab directly to a responsible engineer. I got the feeling that the APFS team is divided on the necessity of this feature and some people on the team would appreciate some ammo to help win the argument internally. I would encourage anyone who agrees to file radars ASAP requesting this feature.
I did get confirmation that Apple intends to provide an open specification in 2017 when the design is finalized. They also emphasized that the data structures are designed to be extended in backward and forward compatible ways so things like parity checks should be easily implemented even if they don’t make the first release.
I love the sense of humor the low-level engineers have, as demonstrated by the
When it was first released, HFS+ did not support hard links, journaling, extended attributes, hot files, and online defragmentation. These features were gradually added with subsequent releases of Mac OS X. But they are basically hacked to death, which leads to a complicated, slow and not so reliable implementation.
Finally there is Bit Rot. Over time data stored on spinning hard disks or SSDs degrade and become incorrect. […] HFS+ lost a total of 28 files over the course of 6 years.
By the next WWDC it seemed that Sun had been forgiven. ZFS was featured in the keynotes, it was on the developer disc handed out to attendees, and it was even mentioned on the Mac OS X Server website. Apple had been working on their port since 2006 and now it was functional enough to be put on full display. I took it for a spin myself; it was really real. The feature that everyone wanted (but most couldn’t say why) was coming!
I’m told by folks in Apple at the time that certain leads and managers preferred to build their own rather adopting external technology—even technology that was best of breed. They pitched their own project, an Apple project, that would bring modern filesystem technologies to Mac OS X. The design center for ZFS was servers, not laptops—and certainly not phones, tablets, and watches—his argument was likely that it would be better to start from scratch than adapt ZFS. Combined with the uncertainty above and, I’m told, no shortage of political savvy their arguments carried the day. Licensing FUD was thrown into the mix; even today folks at Apple see the ZFS license as nefarious and toxic in some way whereas the DTrace license works just fine for them. Note that both use the same license with the same grants and same restrictions. Maybe the technical arguments really were overwhelming (note however that ZFS was working internally on the iPhone), and maybe the risks really were insurmountable. I obviously have my own opinions, and think this was a great missed opportunity for the industry, but I never had the burden of weighing the totality of the facts and deciding. Nevertheless Apple put an end to its ZFS work; Apple’s from-scratch filesystem efforts were underway.
In the 7 years since ZFS development halted at Apple, they’ve worked on a variety of improvements in HFS and Core Storage, and hacked at at least two replacements for HFS that didn’t make it out the door. This week Apple announced their new filesystem, APFS, after 2 years in development. It’s not done; some features are still in development, and they’ve announced the ambitious goal of rolling it out to laptop, phone, watch, and tv within the next 18 months. At Sun we started ZFS in 2001. It shipped in 2005 and that was really the starting line, not the finish line.
APFS, the Apple File System, was itself started in 2014 with Dominic as its lead engineer. It’s a stand-alone, from-scratch implementation (an earlier version of this post noted a dependency on Core Storage, but Dominic set me straight). I asked him about looking for inspiration in other modern file systems […] he was aware of them, but didn’t delve too deeply for fear, he said, of tainting himself.
ZFS includes snapshots and serialization mechanisms that make it efficient to backup file systems or transfer file systems to a remote location. Will APFS work like that? Probably not[…]
Space sharing is more like an operational detail than a game changing feature. You can think of it like special folders with snapshot and encryption controls…
Clones open the door for potential confusion. While copying a file may take up no space, so too deleting a file may free no space. Imagine trying to free space on your system, and needing to hunt down the last clone of a large file to actually get your space back.
Cloning fixes an important problem with hard links. The issue with hard links is that some applications will write “into” hard links, effectively changing both “copies” of the file, and other applications will “break” the link (copy on write).
It’s also worth pointing out that hard links typically share the same metadata (such as file permissions). APFS clones assumedly would be less strict about this (e.g., allow two clones of the same data have different file permissions).
Apple controls the full stack including the SSD, FTL, and file system; they could have built something differentiated, optimizing this components to work together. What APFS does, however, is simply write in patterns known to be more easily handled by NAND.
APFS also focuses on latency; Apple’s number one goal is to avoid the beachball of doom. APFS addresses this with I/O QoS (quality of service) to prioritize accesses that are immediately visible to the user over background activity that doesn’t have the same time-constraints.
APFS removes the most common way of a user achieving local data redundancy: copying files. A copied file in APFS actually creates a lightweight clone with no duplicated data. Corruption of the underlying device would mean that both “copies” were damaged whereas with full copies localized data corruption would affect just one.
APFS checksums its own metadata but not user data. The justification for checksumming metadata is strong: there’s relatively not much of it (so the checksums don’t consume much storage) and losing metadata can cast a potentially huge shadow of data loss. If, for example, metadata for a top level directory is corrupted then potentially all data on the disk could be rendered inaccessible. ZFS duplicates metadata (and triple duplicates top-level metadata) for exactly this reason.
Explicitly not checksumming user data is a little more interesting. […] Apple engineers I spoke with claimed that bit rot was not a problem for users of their devices, but if your software can’t detect errors then you have no idea how your devices really perform in the field. ZFS has found data corruption on multi-million dollar storage arrays; I would be surprised if it didn’t find errors coming from TLC (i.e. the cheapest) NAND chips in some of Apple’s devices.
This is disappointing. EagleFiler has detected lots of bit rot for me over the years, including on SSDs, and I’ve seen lots of unviewable old JPEG files.
Those are great goals that will benefit all Apple users, and based on the WWDC demos APFS seems to be on track (though the macOS Sierra beta isn’t quite as far along).
Update (2016-07-03): See also: Accidental Tech Podcast.
Stay up-to-date by subscribing to the Comments RSS Feed for this post.