Wednesday, May 18, 2022 [Tweets] [Favorites]

Time Machine Evolution and APFS

Howard Oakley:

You don’t have to prepare an APFS volume, as Time Machine will do that for you, but I prefer to make one ready using Disk Utility.

[…]

Time Machine will then prepare its backup volume by making another APFS volume using the case-sensitive option and designating it as a Backup volume.

Howard Oakley:

To distinguish between what are effectively two different systems, I’m going to refer to Time Machine backing up to APFS as TMA, and when backing up to HFS+ as TMH.

[…]

From its release, TMH has been dependent on features of the HFS+ file system to create its Finder illusion. Every hour the backup service examined the record of changes made to the file system since the last backup was made, using its FSEvents database. It worked out what had changed and needed to be copied into the backup. During the backup phase itself, it only copied across those files which had been created or changed since the last backup was made.

[…]

Apple later introduced what it called Mobile Time Machine, intended for laptops which could be away from their normal backup destination for some time. In around 10,000 lines of code, Mac OS X came to create something like a primitive snapshot, but on HFS+.

When Apple released the first version of APFS on Mac OS X in High Sierra, its new snapshot feature was incorporated into TMH. They were initially used instead of the FSEvents database to determine what should be backed up.

[…]

Catalina introduced a more complicated scheme to replace snapshots as the normal means for determining what to back up. This was presumably because computing a snapshot delta proved slow, and the introduction of the Volume Group, with specialist types of APFS volume for which snapshot deltas would be inappropriate or impossible.

[…]

TMA interestingly reverses the design of TMH in High Sierra: instead of using snapshots to determine what needs to be backed up before creating a backup using traditional hard links, most of the time TMA determines what’s changed using the traditional method with FSEvents, then creates its backup as a snapshot on the backup volume. The latter is essential, as without directory hard links, there’s no way of using the TMH method to make backups to an APFS volume.

Howard Oakley:

Once again, the changes required to enable Time Machine backups to be made to APFS volumes appear to have required modifications in the file system. Before macOS 11.0, we’ve always thought of snapshots as being straightforward copies of the metadata for a specific volume, something essentially generated by making a copy of those file system metadata. If that was what Time Machine relied on in Big Sur, then it could only back up whole volumes.

Instead, when backing up to an APFS volume, Time Machine now creates snapshots of individual folders when necessary, a feature which hasn’t yet been made available through the command tool tmutil. Not only that, but during the process of making a backup, Time Machine copies snapshots between volumes and seems able to assemble a backup snapshot from its file system metadata and constituent items, including changed blocks within a file.

Howard Oakley

Early versions of Time Machine started making backups when they were run by launchd scheduling (rather than cron), exactly every hour. When Apple introduced its new scheduling system with DAS (Duet Activity Scheduler) and CTS-XPC (Centralized Task Scheduling), Time Machine backups were among the first to take advantage of it. Since then, instead of backups being made at exactly hourly intervals, they’re more flexible in timing, accommodating to environmental conditions such as thermal pressure and power state.

Howard Oakley:

One feature which appears to have been lost in the new Time Machine backups to APFS volumes is the ability to check the integrity of their contents, which now appears confined to backups made to HFS+ volumes.

Howard Oakley:

There’s a clear advantage to this new scheme in that it functions not just with whole files, but with changed blocks within files. Just as a snapshot references the data blocks which make up each file, so a snapshot-based backup can back up individual blocks which have changed, which can be significantly more efficient in the storage space required.

This new scheme not only retains hourly snapshots on the source volume, which are still kept for up to 24 hours, but provides its backups in the form of snapshots on the backup volume, where the file system data are stored in addition to the snapshot itself.

The final piece of magic used by Time Machine backups to APFS volumes is that its snapshots can be made not only for whole volumes, but for individual folders within a volume. If you want to back up just your Home Documents folder, Time Machine will do that, rather than having to back up your complete Data volume.

Howard Oakley:

Creating the backup begins in earnest with a local snapshot, which is termed the stable snapshot. This appears to be a whole-volume snapshot, which is then mounted at a path based on /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/[machine]/[datestamp]/[volume name]. The stable snapshot made in the last backup is located and mounted as the reference snapshot, at a similar path, but with a different datestamp, of course. TMA locates the “volume store”, into which the backed up files and other items will be copied. This should be at a path like /Volumes/[backup volname]/[datestamp].previous/.

[…]

TMA states the possible strategies which it could use to work out what is to be backed up[…]

[…]

Currently, with Big Sur only backing up Data and unroled general-purpose volumes, those are the four possibilities. The full list of what’s possible in Catalina includes:

  • full first backups
  • deep scans
  • FSEvents
  • snapshot diffing
  • consistency scans
  • cached events.

Howard Oakley:

Now that I’ve worked through the steps involved in an automatic Time Machine backup to APFS local storage (TMA), this article draws that out into a chart, and compares that against the processes used when backing up to HFS+ in Catalina (TMH).

Howard Oakley:

The backup store has a distinctive volume structure, consisting of hidden Spotlight indexes, any mounted backup snapshots listed by their datestamp, a property list containing details of all the backups in that store, and an optional property list giving information about any inheritance of those backups.

Backup snapshots themselves contain the eventdb constructed during their original backup process, a checkpoint file, a property list containing all the exclusions which applied to TMA backups when that backup was made, and the backup itself listed by volume name.

[…]

TMA backups provide good access to the user through the Time Machine app and the Finder. However, their ability to manipulate backups is severely limited. As these rely on snapshots, individual files can’t be removed from them, a feature restricted to TMH. Whole backups, complete with their snapshots, can be removed, though. That isn’t supported by the Time Machine app, but can be performed directly in the Finder.

Howard Oakley:

One day we’ll look back and remember fondly when it was possible to clone any Mac volume to an identical copy. Sadly, since the introduction of APFS in High Sierra, that has now stopped working for those volumes formatted using the new file system. Try making a clone of an APFS volume now and you may not even notice what’s missing from the copy: snapshots.

Howard Oakley:

Periodically, backup snapshots are indexed by Spotlight for the volume’s Spotlight-V100 indexes. This may require background mounting and unmounting of those snapshots, and often results in the spinning Time Machine icon being displayed against that volume in the Finder.

Howard Oakley:

Attention now needs to be devoted to addressing the problems identified here with changing source and destination volumes. It isn’t acceptable to have to start completely fresh backups whenever a disk needs to be changed, and Apple needs to document the processes required to inherit source and destination disks, without driving users to experiment with tmutil.

For all its ingenuity and sophisticated engineering, of the three backup utilities which I use, Time Machine is by far the most complex, and the only one which makes changes like these so difficult.

Howard Oakley:

This confirms that TMA is as efficient as possible in both the copying and storage of APFS sparse files and clones. This is far superior to TMH, which would of course have had to copy across almost 15 GB extra data, and required a total of 15 GB space in the backups for these three files. With sparse files and clones being relatively common in APFS volumes, the efficiency of TMA can make a big difference to the time taken to make backups, and use of storage space on the backup volume.

Howard Oakley:

This article is a summary of the current benefits and limitations of TMA as its stands in macOS 11.2.3, at the end of which is a list of those detailed accounts. These should help you make a more informed decision as to whether to use TMA.

Howard Oakley:

One significant exclusion which has been added in Big Sur is the hidden and locked folder on each volume containing its local version database; this is to work around the bug which plagued TMH in Catalina when trying to back up large and complex version databases.

Howard Oakley:

This article looks at configuring and using TMA’s backup storage on a shared disk over a network – in this case, a shared APFS volume on another Mac running Big Sur.

Howard Oakley:

Being more complex and dependent on other systems, making Time Machine backups to shared storage on your network is more prone to fail. As I’ve spent much of the day sorting one such failure out, I thought it might be useful to discuss what went wrong and what went wronger.

Howard Oakley:

A quick check on one of my working folders, just a part of my extended Home folder, found almost 1600 clone files totalling over 10 GB in size. If I were still backing up to HFS+, every one of those files would have to be saved in full into my backup, and in every subsequent backup it would have to be present either in the form of a complete copy (if it had changed), a hard link, or one to a directory above it. Instead, my APFS backup just contains a directory entry for the clone.

Howard Oakley:

With Time Machine it’s so tempting just to let it back up your entire Data volume, and be done with it. This article explains how you can do better than that, and exclude items from that backup which would merely waste space.

Howard Oakley:

In just over a month’s time, Time Machine will turn fourteen, making it one of the longest-lasting and externally almost unchanged features in macOS. As I’ve been trawling back through my archives preparing a talk for MacSysAdmins about Time Machine, I though you might enjoy a stroll down memory lane, and return to autumn 2007 for a few moments.

Howard Oakley:

I have been honoured this year to be invited to present at MacSysAdmin Online 2021, talking about Time Machine.

You can watch my presentation and download my slides from here, where you’ll also find plenty of even better presentations.

Howard Oakley:

It’s that time of year when many of us are planning our upgrades, either to Big Sur or Monterey. One vital consideration is how to migrate our backups: this article looks at what you need to do for existing Time Machine backups, whether stored locally on an external disk, or on your network.

Howard Oakley:

If you’re going to use SMB as recommended by Apple, the most robust way seems to be disable AFP on all NAS which support SMB. It may sometimes help if you’re not connected to the NAS in the Finder, before you start setting that up as your TM backup destination.

Howard Oakley:

Earlier this month, I provided a presentation for the MacSysAdmins virtual conference. Now that’s done and dusted, I’m pleased to provide a copy of my slides in high quality PDF, together with my script, available from here: timemachinemacsysadmins

Howard Oakley:

If you’re intending to back up from Time Machine to network storage such as a NAS or another Mac, the above figures can be used to provide an estimate of how long those backups are likely to take. First, connect to a share on the server, and copy to it a single large file, such as the 10 GB test file used here. Measure the total time for that to transfer, and take two-thirds (0.67) of that transfer rate as the likely overall rate to be delivered by Time Machine. Then use that with your expected average hourly backup to calculate how long that is likely to take.

[…]

Time Machine settings you should consider for improving performance include:

  • exclude items containing many small files, such as Xcode;
  • back up very large files such as VMs separately;
  • use default automatic hourly backups, rather than backing up less frequently;
  • back up to APFS, which works at a block rather than file level, and backs up clones and sparse files much more efficiently.

[…]

If you should encounter poor performance when making backups, use T2M2’s Check Speed feature to identify which items are causing most slowing.

Howard Oakley:

After four years in which it had offered frustratingly limited support for the new features of APFS, Disk Utility is now complete: this version has excellent support for snapshots, no matter which app created them.

Howard Oakley:

However, we know that snapshots are strictly read-only, and the only user experiences that I can discover confirm my suspicion that all fsck_apfs does when it finds an error in a snapshot is to throw its hands in the air, report an opaque error code, and not even attempt a repair.

For a backup snapshot, that’s fatal. All you can then do is delete the whole snapshot, knocking a hole in your backups which can never be replaced. Disk Utility’s typical response only rubs salt into the wound by telling the user to make a backup of the affected disk. As it’s currently impossible to copy backup snapshots to another disk, a single error on that storage compromises all your backups stored there: every single one of them, and there’s absolutely nothing that macOS offers to help that.

Howard Oakley:

Snapshots are one of the huge advances in APFS, but like other features, they can cause more problems than they’re worth. This article explains how they can go wrong, and what you can do to manage them so they don’t swallow all the free space on your storage.

[…]

Third-party backup products which incorporate snapshot features, like Carbon Copy Cloner (CCC), are more flexible and you can set custom snapshot retention policies for individual volumes. These operate independently, so CCC can’t change Time Machine’s policy, neither can Time Machine delete CCC’s snapshots automatically.

[…]

In most cases, apps should provide settings in their Preferences which let you store their temporary files on a volume which isn’t backed up, so avoiding them swelling the size of hourly snapshots.

Previously:

Update (2022-05-20): Howard Oakley:

Time Machine manages its backups actively to ensure that, whenever possible, they remain well within the disk space available to the backup volume. It does this by a process of ‘thinning’ older backups to recover the space that they occupy. If the backup volume is inside a container which completely fills that disk, then Time Machine will normally fill the volume, thus the container, thus the whole disk. Simply adding another non-backup volume to the same container will put the two volumes into competition for the same free disk space, and will eventually result in your use of the non-backup volume being determined by Time Machine’s thinning of its backups.

One way to avoid that competition from occurring is to reserve space on the backup disk for your non-backup storage. There are two ways to do that: you can set a reserve size on a volume in the same container as the backup volume, or you can put your non-backup volume into a separate container. Neither of those is mentioned in Apple’s guide.

[…]

However, you can only set reserve and quota sizes when you create the volume. Even diskutil can’t change them once that volume has been created. The only way that I know to make such a change is to create a new volume with the desired sizes set, then move the contents of the original volume across to it.

12 Comments

> There’s a clear advantage to this new scheme in that it functions not just with whole files, but with changed blocks within files. Just as a snapshot references the data blocks which make up each file, so a snapshot-based backup can back up individual blocks which have changed, which can be significantly more efficient in the storage space required.

So starting with 11.0, Time Machine finally stores block-level differences? Hooray! I guess I can use it to back up VMs now?

@Sören: Absolutely correct.

But the actual utility for real-world files is questionable. When documented are edited, the length usually changes. If something is inserted/deleted near the start of a file, all the blocks will end up changing.

And the use of compressed-archive documents (e.g. Microsoft's XML-based Office documents: .docx, .xlsx, .pptx, .etc) has a similar effect - because it is a compressed archive of many files, any change is likely to change all of the blocks.

But it should work very well with things like database files, where records can usually be added/removed/updated without affecting the location or content of other records.

There's still one thing I don't get: how exactly are the diffs for a file calculated and then applied to the destination? Where is that metadata maintained until it is backed up? Otherwise, AFAICT, you have to read both source and destination to compute the differences, which is surely not what's happening as that would consume disk I/O.

@Sebby My guess is that it depends on the source drive still having the snapshot from the latest backup. Then the the diffs would just be there naturally.

@ David: that sounds quite plausible indeed, but I'd still be curious to see real-world metrics of space savings. (It's unclear to me whether it's even possible to ask APFS for that information, given that it's inclined to serve you a full file either way.)

I'm guessing VMware *does* benefit, though: if the virtual disk shrunk, there will be free space, which can now simply be treated as empty blocks. If it grew, there will be added blocks, but existing blocks don't need to be touched. (You can already observe this, to a point, even without block-level behavior: if you set VMware to slice its disks into a series of 4 GiB files, then use a VM for a few weeks, you'll see that some of those files don't get touched at all for days. Presumably, the same behavior would now apply at the block level.)

@ David: and as an aside, I wonder how much thought efforts like Open Packaging Conventions (the Office 2007 and beyond zip archive plus metadata format), OpenDocument, etc. put into a zip alternative that block-aligns the data, avoiding shifting bytes around within the blocks (by deliberately leaving gaps).

@Sören From what I’ve heard, it still fills up the drive like crazy.

Ben Kennedy

@Sören re the zip format: How big's a block? That's an implementation detail of the local filesystem.

@ Ben: indeed. An implementation of the format might add filesystem metadata (extended attribute, etc.) with an array of segments.

Or, conversely, a file system might add format-specific comprehensions to figure out the segments itself.

The one hurdle that remains is Time Machine backups cloning to larger drives. Recently I heard that Superduper may have this functionality. The importance of this:

Preserving your archives on drives before your archive drive fails.
Bringing your existing time machine to a larger hard drive in capacity.
Avoiding having to use the Finder to copy folders to an appropriately authenticated hard drive, losing the actual integrity of the archives on the destination machine for usability within Time Machine and system restore. One possibility is to use system restore to restore to a similar or newer versioned drive that is larger, but that only restores the most recent backup. Maintaining the archives in perpetuity needs to be done via a usable cloning software.

@Abraham I agree that’s needed. Pretty sure SuperDuper doesn’t do it.

The BIGGEST flaw in TM with APFS is that you cannot clone a TM APFS formatted drive to any other drive. You are stuck with the original drive, so you cannot migrate to another drive, period, even if it is larger. Superduper can do it with HFS formatted drives, just like finder, but more robust. For APFS formatted, nothing - finder, disk util, superduper, cccloner - will copy the backups. For that reason alone, I avoid APFS backups, so all the advantages in the article for APFS drives are moot until Apple provides a way to copy the backup files off an APFS backup!

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment