Tuesday, October 22, 2024

Time Machine in Sequoia

Der Teilweise:

Backing up to a NAS currently says 3 days (!) left, after having backed up ~160GB. Was using WiFi with TX rate 133MBit.

Now I connected using Gigabit Ethernet, does not seem to be faster.

Plus: CPU usage is ridiculously high, fans spinning up to medium/max speed several times per hour.

[…]

I did not change all files on my disk …

Maybe they removed the alert that was shown when the backup got corrupted? Seems to be the case because I did not get that alert and all my old backups (on the NAS) seem to be gone.

[…]

So it seems like Apple indeed removed the confirmation dialog that they showed when they delete corrupted backups, taking away the chance to manually repair it.

I wonder whether this happened pre-Sequoia? Even on Sonoma, I would regularly have Time Machine backups where I would imagine that less than 50 GB changed, but it took all day to back up to a local hard drive. (Yet it wasn’t so slow that it seemed to be starting from scratch.) I wish Time Machine were better at showing which files are being copied and how the space is being used. (I guess some of this can be figured out using BackupLoupe.)

Miguel Arroz:

There’s some annoying bug in Sequoia that makes a Time Machine backup fail with an error “The backup failed because some files were not available”.

How on earth can files “not be available” on something running locally? And whatever it is, why isn’t Time Machine dealing with it properly? It’s all made by the same company. And it had all night to do whatever it has to.

I needed to have a current backup fresh in the morning, and now I’m sitting here waiting.

Der Teilweise:

I wouldn’t be surprised if the opendir bug described by @cdfinder is a race condition that also happens (more rarely) for local filesystems.

Previously:

Update (2024-10-23): Adam Chandler:

I recently had Time Machine issues on Developer beta where I wasn’t getting successful backups. I tried everything and finally disabling the MacOS Firewall fixed it for me.

MBP wired Ethernet to Synology using SMB on x.1 developer beta.

Kurt:

I don’t use my MacBook Pro (M1 Max running Sequoia) daily, but last night I went to use it and it was warm to the touch and the fans were at full blast. The process “diskimagesiod” was using 800% CPU. I also get the “files were not available” all the time with Time Machine. Super annoying.

Howard Oakley:

You can see files and progress in the log, easily accessible from T2M2’s Speed button during a backup.

Previously:

17 Comments RSS · Twitter · Mastodon


Is this how it was in the "beleaguered" classic era?


@Manx No, everything felt less buggy back then. It’s just that when you did hit a crashing bug it could take down the whole system.


Diogo Tridapalli

I think the throttle is getting more aggressive on newer release, this old trick “sudo sysctl debug.lowpri_throttle_enabled=0” still works great for me.

https://mjtsai.com/blog/2016/03/16/massively-speed-up-time-machine-backups/


One of the original advantages of Time Machine was that it was relatively simple and non-proprietary, especially compared to its contemporaries on Windows. Your backup was just an HFS+ partition, and you could browse the files easily with Finder or a terminal. Redundancy was done using hard links. At least at first everything seemed pretty rock solid, too. (Perhaps nostalgia goggles are blinding me, but I don't recall having any serious issues with it when it in the first several years it came out.)

Over the last decade, Apple seems to keep making it more and more complex and proprietary. Now it's using lots of whiz-bang-wow features of APFS to do... something, but nothing it wasn't already able to do, and those features make it very hard to work with, especially for third party tools. And unsurprisingly it falls on its face a lot more often. I noticed things got a bit better around macOS 12 or 13 -- at the very least I would no longer get that error message when backing up to a network drive telling me the whole thing needed to be blown away and started over for no good reason. But these sorts of reports in macOS 14 and 15 strongly disincline me to upgrade past macOS 13, which has already been far buggier and aggravating compared to 10.14. But at least all of the fundamental stuff I need like backups works in it.


Time Machine backup never worked properly for me over WiFi. No matter how fast it was and I would constantly get corrupted backup error and had to start again. I even changed from QNAP to Asustor and later to Synology - all the same issues.

The only fix I managed to get is disable hourly backups and just do daily - that way system didn’t create snapshots all the time trying to back it up.

Overall Time Machine needs to be rewritten and fixed properly.


Sequoia seems to have trouble with repeated file access requests: the excellent Arq backup app has also been fighting failures since the release of Sonoma and has published multiple updates to cover various edge cases, some of which seem to be related to the improper enforcement of permissions (spurious “Access Denied” errors in spite of Arq having Full Disk Access) and some to files being unaccountably “unavailable” in spite of being stored on the local boot drive. I wonder whether Time Machine might be bumping into these problems as well…


"Is this how it was in the "beleaguered" classic era?"

Software was much simpler then. Your whole hard drive was less than the size of just today's Photoshop. I think it's fair to say that there were a lot of less of these weird edge cases, and a lot less that could go wrong in general on your system (unless you installed a lot of system extensions). There was just a lot less of everything. For example, a lot less concurrency (apps froze when you opened a menu!).

But like Michael says, when something did go wrong, your whole system usually just froze or gave you the bomb dialog, forcing you to restart.

This did mean that you didn't run into these situations where errors kept compounding and things got wonkier and wonkier, because at some point, everything came to a halt, and you had to start fresh.


Time Machine has been in a weird position ever since Time Capsule was cancelled. External drives with laptops are a hassle and only nerds have local servers, be it Mac Mini or NAS, so there's no real story around the feature. It would make no sense to release it as a new feature in its current state.

I guess they feel like they can't just remove it, but it also doesn't make sense to invest in it, so it's in a limbo where it doesn't receive sufficient resources. I hope they have a plan for its future, but it doesn't much look like it, given how long it has been like this.


Time Machine is arguably one of the reasons I moved to macOS, when it first appeared in Leopard. Here, I thought, was finally a serious, grown-up operating system ...

And yet networked backups haven't really ever worked reliably since about macOS 10.14 Mojave. It's been local backups ever since, even for my server Mini, which is otherwise well-placed to do backups for my MBP, which now uses Arq because that works. On Linux, I still use rsnapshot sometimes, albeit nowadays it's worthwhile looking at combining that with LVM or a filesystem snapshot. Hopefully bcachefs will be finished soon so people have a reasonable choice besides ZFS.


I know Apple baked some tricks into Time Machine, but wouldn't rsync largely make this work without all the problems associated with Time Machine? When I was a more or less full time Mac user, I swapped to Carbon Copy Cloner versus Time Machine and was quite happy with the results, in particular the way CCC handled things for network backups. To be fair, this predated APFS and I believe CCC was largely just using rsync for backups over the network to another Mac.

Assuming one is backing up to an APFS drive (or an APFS disk image on a non APFS drive over the network???), only changed bits need to be backed up and not whole files, right??? Rsync definitely forces one to backup whole files which is why I would exempt VMs and such, but it seems like Macs should be even better at this whole backup thing, rather than worse all these years given the upgrades to the file system itself.

To be fair, I never thought Time Machine was reliable over the network and found it choked on locally attached drives too (how many Time Machine backups were corrupted over the years, Oy vey), whereas SuperDuper and CCC were far better. Any reason so many people haven't just switched to one of the many alternatives since Time Machine is so pathetic???


Wait, is it confirmed that Apple will silently delete all old backups if Time Machine thinks it detected a problem?!!?


Note that the utilities of Howard Oakley mentioned, require you to run with an Admin account, unfortunately… :/


@Nathan_RETRO Rsync *would* provide incremental transfers, that's the idea. This would be true whether or not a disk image is used as the destination. The real issue here IMO is that TM is using a disk image on a network filesystem to back the files instead of transferring files themselves for backup at the target using a dedicated protocol (like rsync or the S3 API), which is just asking for trouble because filesystem corruption is trivial to foresee in case there is any interruption of the network and you've got all the problems caused by network filesystems in the face of unreliability and the UX consequences of that (including failed restarts caused by hard stalls of the network filesystem driver). WTF Apple made such a decision at the outset is beyond me—maybe they were all using Ethernet—but APFS, even though I find it to be less error-prone in general, seems to have made networked Time Machine *worse*, not better. I recommend people needing network backup think about using a tool like Arq, which generates encrypted incremental backups and can store them on a network target using direct mounts or the S3 or SFTP protocols among other specific supported cloud providers. Or, yes, there's always rsync, if you're comfortable with using that and don't need any encryption. Arq's main downside is that it stores gobs of cached data on your boot disk, seemingly without end, and I'm not really sure how to deal with that except blasting the caches from time to time. On other (civilised) operating systems that have working filesystem snapshots, you can (and should) take backups by streaming deltas, rebasing from time to time when the backup set gets big enough (you can Google for the scripts you'll need to do this).


I had the same issue as Miguel Arroz. Found out thanks to Reddit that it was caused by the Find My widget. Removed it from my Mac Control Center and TM backups completed without error.

https://www.reddit.com/r/applehelp/comments/z6m2bf/comment/lp2h3rc/


@Sebby
Thank you for the follow up!!! I rsync over ssh so there's encryption in transit and the drive itself can be encrypted on the other end if I desire. So that works well enough for me. I shouldn't be surprised Time Machine is a flaky mess given my own experiences with it over the years, but I don't understand how it can be getting worse…

To clarify, Time Machine connects to these shares over samba (or some other SMB implementation to be fair) and then creates an APFS sparse bundle (or whatever Apple is calling these images now) as the backup target using local file system copy commands? If I understand things, the theory is Time Machine is choking because SMB is not robust enough to host this configuration where the backup is essentially acting as a local copy as compared to a dedicated protocol that is meant for traffic over networks? I use SMB mounted shares as rsync destinations for some clients, but again, it's using rsync whether it's piping over a network or connecting locally and the protocol is designed to handle operations to shares that can disconnect at any given time.

Honestly, I never felt like Time Machine handled local backups all that well either, I really did have more trouble with Time Machine than all my other tools combined.


@Nathan_RETRO Sorry for delays, and no worries, I enjoy the vibes on here.

Yes, exactly—the disk image is mounted locally, with the sparsebundle pieces themselves accessed over the network using SMB requests. So, operations don't fail on files themselves, but on the files that back a *mounted filesystem*. Anything that goes wrong in the network therefore represents an opportunity for catastrophic failure in the filesystem, because even in the absolute best case of all operations being correctly synchronised with barriers, access could be lost at any time and there'd be no realistic way to recover from this situation but to keep retrying requests, even when there's a deadlock. It really is madness. I've no doubt the engineers who built it had complete confidence in their design, but as so often with Apple, they reckoned without human (and technical) frailty. SMB (or, really, any network filesystem) is arguably already a highly dubious abstraction (so says Joel Spolsky), but at least you can count on it to work when you treat it like any other filesystem; putting a whole other filesystem on top of it, in a disk image, is just totally asking for it. Rsync (or another like protocol) would have been the right choice here, because it's differential, and it works reliably over all sorts of network conditions, and it's aggressively pipelined so it's speedy to boot. But it would have required the files to be received by a receiver at the far end that would store them. Alternatively the client could use a more general protocol, like HTTP, pre-encrypting and pre-chunking the data, and again, the synchronisation points would be the transactions themselves for the backup files. The point is, Apple didn't do any of this, and we're still living with the consequences. :(

What's really sad though is that APFS *should* have made things better. Snapshot deltas that are streamable could have made TM wicked fast. As it stands, it looks like the main benefit to APFS has been that the snapshots are taken before the backup, for consistent backups (yes, good) and that the changes are sent in such a way that only the changed blocks are written, while retaining browsability and indexing (yes, also good). However none of these things benefited network backups very meaningfully, and "Just use a local disk" is now the best advice to give people. Sad. :(


Sequoia is more buggy than any macOS ever. Time Machine doing full backup again and again and again after upgrade to Sequoia 15.1
https://www.reddit.com/r/MacOS/comments/1gk4mmq/time_machine_doing_full_backup_again_and_again/

Leave a Comment