Friday, June 17, 2016

Apple File System (APFS)

Apple File System Guide (Hacker News):

Apple File System is a new, modern file system for iOS, OS X, tvOS and watchOS. It is optimized for Flash/SSD storage and features strong encryption, copy-on-write metadata, space sharing, cloning for files and directories, snapshots, fast directory sizing, atomic safe-save primitives, and improved file system fundamentals.

[…]

A Developer Preview of Apple File System is included in OS X 10.12. The APFS on-disk volume format is pre-release and subject to change. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.

Lee Hutchinson (Slashdot):

APFS also adds a copy-on-write metadata scheme that Apple calls “Crash Protection,” which aims to ensure that file system commits and writes to the file system journal stay in sync even if something happens during the write—like if the system loses power.

[…]

Snapshots and clones both are going to be available in APFS. Snapshots let you throw off a read-only instance of a file system at any given point in time; as the file system’s state diverges away from the snapshot, the changed blocks are saved as part of the snapshot. This is similar in concept to Microsoft’s shadow copies, and it’s an incredibly handy feature. There are obviously huge implications here in how Time Machine works—a true file system set of snapshots could totally replace the kludgy and aging mechanism of hard links that Time Machine builds and maintains.

Clones differ from snapshots in that clones are writable instead of read-only. According to the documentation, APFS can create file or directory clones—and like a proper next-generation file system, it does so instantly, rather than having to wait for data to be copied. A cloned file or directory stores the changes made between it and the original objects, giving you a writable, editable, point-in-time copy of a file or a directory. As the documentation points out, this is an easy way to create document revisions or do versioning of anything you might want to track.

Also interesting is the concept of “space sharing,” where multiple volumes can be created out of the same chunk of underlying physical space. This sounds on first glance a lot like enterprise-style thin provisioning, where you can do things like create four 1TB volumes on a single 1TB disk, and each volume grows as space is added to it. You can add physical storage to keep up with the volume’s growth without having to resize the logical volume.

Russ Bishop:

Although APFS does checksum metadata blocks it does not do anything to provide resilience for data blocks. That is a huge omission in a modern filesystem, a point I tried to politely but forcefully make in the File System Lab directly to a responsible engineer. I got the feeling that the APFS team is divided on the necessity of this feature and some people on the team would appreciate some ammo to help win the argument internally. I would encourage anyone who agrees to file radars ASAP requesting this feature.

[…]

I did get confirmation that Apple intends to provide an open specification in 2017 when the design is finalized. They also emphasized that the data structures are designed to be extended in backward and forward compatible ways so things like parity checks should be easily implemented even if they don’t make the first release.

Mark Dalrymple:

I love the sense of humor the low-level engineers have, as demonstrated by the -IHaveBeenWarnedThatAPFSIsPreReleaseAndThatIMayLoseData option.

Aymeric Barthe (via Benjamin Encz):

When it was first released, HFS+ did not support hard links, journaling, extended attributes, hot files, and online defragmentation. These features were gradually added with subsequent releases of Mac OS X. But they are basically hacked to death, which leads to a complicated, slow and not so reliable implementation.

[…]

Finally there is Bit Rot. Over time data stored on spinning hard disks or SSDs degrade and become incorrect. […] HFS+ lost a total of 28 files over the course of 6 years.

Adam Leventhal (via Hacker News):

By the next WWDC it seemed that Sun had been forgiven. ZFS was featured in the keynotes, it was on the developer disc handed out to attendees, and it was even mentioned on the Mac OS X Server website. Apple had been working on their port since 2006 and now it was functional enough to be put on full display. I took it for a spin myself; it was really real. The feature that everyone wanted (but most couldn’t say why) was coming!

[…]

I’m told by folks in Apple at the time that certain leads and managers preferred to build their own rather adopting external technology—even technology that was best of breed. They pitched their own project, an Apple project, that would bring modern filesystem technologies to Mac OS X. The design center for ZFS was servers, not laptops—and certainly not phones, tablets, and watches—his argument was likely that it would be better to start from scratch than adapt ZFS. Combined with the uncertainty above and, I’m told, no shortage of political savvy their arguments carried the day. Licensing FUD was thrown into the mix; even today folks at Apple see the ZFS license as nefarious and toxic in some way whereas the DTrace license works just fine for them. Note that both use the same license with the same grants and same restrictions. Maybe the technical arguments really were overwhelming (note however that ZFS was working internally on the iPhone), and maybe the risks really were insurmountable. I obviously have my own opinions, and think this was a great missed opportunity for the industry, but I never had the burden of weighing the totality of the facts and deciding. Nevertheless Apple put an end to its ZFS work; Apple’s from-scratch filesystem efforts were underway.

[…]

In the 7 years since ZFS development halted at Apple, they’ve worked on a variety of improvements in HFS and Core Storage, and hacked at at least two replacements for HFS that didn’t make it out the door. This week Apple announced their new filesystem, APFS, after 2 years in development. It’s not done; some features are still in development, and they’ve announced the ambitious goal of rolling it out to laptop, phone, watch, and tv within the next 18 months. At Sun we started ZFS in 2001. It shipped in 2005 and that was really the starting line, not the finish line.

Previously: ZFS, The Loss of ZFS.

See also: WWDC Session 701: Introducing Apple File System, ZFS data integrity explained, Trust, But Verify, Belt and Suspenders, Computational Skeuomorphism.

Update (2016-06-20): Adam Leventhal has written a series of posts based on his talks with Apple engineers Dominic Giampaolo and Eric Tamura at WWDC. Overview (Hacker News):

APFS, the Apple File System, was itself started in 2014 with Dominic as its lead engineer. It’s a stand-alone, from-scratch implementation (an earlier version of this post noted a dependency on Core Storage, but Dominic set me straight). I asked him about looking for inspiration in other modern file systems […] he was aware of them, but didn’t delve too deeply for fear, he said, of tainting himself.

Encryption, Snapshots, and Backup:

ZFS includes snapshots and serialization mechanisms that make it efficient to backup file systems or transfer file systems to a remote location. Will APFS work like that? Probably not[…]

[…]

Space sharing is more like an operational detail than a game changing feature. You can think of it like special folders with snapshot and encryption controls…

Space Efficiency and Clones:

Clones open the door for potential confusion. While copying a file may take up no space, so too deleting a file may free no space. Imagine trying to free space on your system, and needing to hunt down the last clone of a large file to actually get your space back.

Aaron Meurer:

Cloning fixes an important problem with hard links. The issue with hard links is that some applications will write “into” hard links, effectively changing both “copies” of the file, and other applications will “break” the link (copy on write).

[…]

It’s also worth pointing out that hard links typically share the same metadata (such as file permissions). APFS clones assumedly would be less strict about this (e.g., allow two clones of the same data have different file permissions).

Performance:

Apple controls the full stack including the SSD, FTL, and file system; they could have built something differentiated, optimizing this components to work together. What APFS does, however, is simply write in patterns known to be more easily handled by NAND.

[…]

APFS also focuses on latency; Apple’s number one goal is to avoid the beachball of doom. APFS addresses this with I/O QoS (quality of service) to prioritize accesses that are immediately visible to the user over background activity that doesn’t have the same time-constraints.

Data Integrity:

APFS removes the most common way of a user achieving local data redundancy: copying files. A copied file in APFS actually creates a lightweight clone with no duplicated data. Corruption of the underlying device would mean that both “copies” were damaged whereas with full copies localized data corruption would affect just one.

[…]

APFS checksums its own metadata but not user data. The justification for checksumming metadata is strong: there’s relatively not much of it (so the checksums don’t consume much storage) and losing metadata can cast a potentially huge shadow of data loss. If, for example, metadata for a top level directory is corrupted then potentially all data on the disk could be rendered inaccessible. ZFS duplicates metadata (and triple duplicates top-level metadata) for exactly this reason.

Explicitly not checksumming user data is a little more interesting. […] Apple engineers I spoke with claimed that bit rot was not a problem for users of their devices, but if your software can’t detect errors then you have no idea how your devices really perform in the field. ZFS has found data corruption on multi-million dollar storage arrays; I would be surprised if it didn’t find errors coming from TLC (i.e. the cheapest) NAND chips in some of Apple’s devices.

This is disappointing. EagleFiler has detected lots of bit rot for me over the years, including on SSDs, and I’ve seen lots of unviewable old JPEG files.

Conclusions:

Those are great goals that will benefit all Apple users, and based on the WWDC demos APFS seems to be on track (though the macOS Sierra beta isn’t quite as far along).

Update (2016-06-27): See also: TidBITS, comments on ArsTechnica, comments on Hacker News, Accidental Tech Podcast.

Update (2016-07-03): See also: Accidental Tech Podcast.

12 Comments RSS · Twitter

At least a basic level of resiliency to corruption is pretty much the only thing I want from a new file system. The fact that Apple decided to create a whole new file system, and *not* tackle this, is pretty disappointing. This feels so much like Swift. Yay, they're finally doing it... in a way that is a bit underwhelming to everybody who knows what kinds of options are available outside of the Apple ecosystem.

Media coverage to this point has been mostly breathless elongations of Apple’s developer documentation. With a dearth of detail I decided to attend the presentation and Q&A with the APFS team at WWDC. Dominic Giampaolo and Eric Tamura, two members of the APFS team, gave an overview to a packed room; along with other members of the team, they patiently answered questions later in the day. With those data points and some first hand usage I wanted to provide an overview and analysis both as a user of Apple-ecosystem products and as a long-time operating system and file system developer.
[...]

APFS, the Apple File System, was itself started in 2014 with Dominic as its lead engineer. It’s a stand-alone, from-scratch implementation (an earlier version of this post noted a dependency on Core Storage, but Dominic set me straight). I asked him about looking for inspiration in other modern file systems such as BSD’s HAMMER, Linux’s btrfs, or OpenZFS (Solaris, illumos, FreeBSD, Mac OS X, Ubuntu Linux, etc.), all of which have features similar to what APFS intends to deliver. (And note that Apple built a fairly complete port of ZFS, though Dominic was not apparently part of the group advocating for it.) Dominic explained that while, as a self-described file system guy (he built the file system in BeOS, unfairly relegated to obscurity when Apple opted to purchase NeXTSTEP instead), he was aware of them, but didn’t delve too deeply for fear, he said, of tainting himself.

Dominic praised the APFS testing team as being exemplary. This is absolutely critical. A common adage is that it takes a decade to mature a file system. And my experience with ZFS more or less confirms this. Apple will be delivering APFS broadly with 3-4 years of development so will need to accelerate quickly to maturity.

http://dtrace.org/blogs/ahl/2016/06/19/apfs-part1/

So many people are talking about the lack of checksums that you have to imagine Apple is listening. I suspect it'll be there after they get the basics matured.

BTW Dominic's book on file system design is a must read. I used it as inspiration to create an object store for a database we'd written some years ago. Even if you don't care too much about such things, it really it illuminating in understanding how file systems work.

Lets try that for a couple of other things a lot of people talk about:

"So many people are talking about the bloat of iTunes that you have to imagine Apple is listening."
"So many people are talking about the lack of upgrade prices and demo versions in App Store that you have to imagine Apple is listening."
"So many people are talking about the need for battery life over thinness that you have to imagine Apple is listening."
"So many people are talking about ability to upgrade RAM and storage in MacBook Pro that you have to imagine Apple is listening."

Not working… ;)

(I'm not saying checksums aren't coming to APFS – on the contrary I do believe it's on their future agenda already – just that what many people talk about rarely makes Apple change their mind if they don't want to.)

Anyone have any experience reports with Synology's btrfs new implementation?

I'll go on record as stating it first: This will be a total technical disaster.

[…] think I would bet on either at this point. It’s true that we eventually got Swift and APFS. The difference is that both of those projects have direct benefits for […]

[…] Apple failed to tell developers that it was making this change when it announced AFPS at WWDC 2016. And, as far as I can tell, it’s not mentioned anywhere in the APFS guide. iOS 10.3 is […]

[…] last bit relates to APFS not checksumming its data […]

Leave a Comment