Thursday, May 22, 2014

What Backblaze Doesn’t Back Up

Arq recently reported hundreds of GB of missing files, across multiple backup targets. This is so at odds with Amazon Glacier’s reputed 11-nines durability that I’m guessing it’s due to an application bug. It would not surprise me if the files are still there; Arq just isn’t seeing them. In any event, my strategy is to have multiple cloud backups—Arq and CrashPlan (which has been working very well recently)—so this got me thinking about possibly adding a third.

The obvious choice is Backblaze. It has a native Mac app, is developed by ex-Apple engineers, and sponsors many fine podcasts.

I’d previously been hesitant about Backblaze because of the way it handles external drives. I’ve read about problems with large bzfileids.dat files sucking RAM and preventing backups entirely once they get too large. It’s also worrisome that it only retains deleted files for 30 days—meaning that a file is truly lost if I don’t notice that it’s missing right away. And if, for some reason, my Mac doesn’t back up for 6 months, Backblaze will expunge all my data, even if my subscription is still paid-up. The situations in which my Mac is not able to back up for a while are exactly the ones in which I (or my survivors) would want to be able to depend on a cloud backup!

My other concern is that Backblaze doesn’t actually back up everything. It fails all but one of the Backup Bouncer tests, discarding file permissions, symlinks, Finder flags and locks, creation dates (despite claims), modification date (timezone-shifted), extended attributes (which include Finder tags and the “where from” URL), and Finder comments. Arq, CrashPlan (as of version 3), SuperDuper, and Time Machine all support all of these. Dropbox supports all of them except creation dates, locks, and symlinks.

As a programmer, I especially care about metadata. But I think most users would as well, if they knew to think about it. For example, losing dates can make it harder to find your files (i.e. they disappear from smart folders or sort incorrectly), even leading to errors (i.e. not finding the correct set of invoices for a time period). You would never use a backup app that didn’t remember which folders your files were in, so I don’t know why people consider it acceptable to lose their Finder tags. (If you use EagleFiler, it can restore the tags for you.)

Some people don’t care much about metadata. Macworld’s survey of online backup services didn’t mention it. Neither did Walt Mossberg. (He also told readers that Backblaze automatically includes “every user-created file”; in fact, it skips files over 4 GB by default.) [Update (2014-05-25): The Sweet Setup doesn’t mention metadata, either (via Nick Heer).]

Backblaze has a stock answer when asked about Backup Bouncer:

This actually tests disk imaging products, a bad test for backup as items we fail on shouldn’t be backed up by data backup service.

Some people accept this explanation. I think it’s misguided and borderline nonsensical. True, Backup Bouncer tests some rather esoteric features, but Backblaze fails the basic tests, too. It would be one thing to say that there’s a limitation whereby dates, tags, comments, etc. aren’t backed up, but they’re actually saying that these shouldn’t be backed up. As if products that do back them up are in error. So presumably Backblaze doesn’t consider this a bug and won’t be fixing it.

Lastly, it’s a shame that Backblaze isn’t up front about what metadata it supports. Some users are technical enough to investigate these things themselves. Others will have read the excellent Take Control of Backing Up Your Mac and seen its appendixes, which give Backblaze a C for metadata support. But most Backblaze users won’t know that a poor choice has been made for them until they need to restore from their backup.

Update (2017-08-23): A Backblaze employee responded to this post:

Backblaze absolutely backs up and restores the “file creation date” and “file last modified date”. With these two caveats: Backblaze is only accurate down to Milliseconds (1/1,000ths of a single second) if you restore by USB hard drive restore, and only accurate to the second if you prepare a ZIP file restore. The latter is because that is a limitation of the ZIP file format.

The tool “Backup Bouncer” fails Backblaze on this test, and it irritates me. I feel “wronged” by this. The new APFS Macintosh file system has the ability to set the file creation date down to one BILLIONTH of a second, and I assume that just to be totally difficult Backup Bouncer gleefully sets every last bit.

I’ve asked for clarification, but as far I can tell the response is spreading incorrect information and seems to misunderstand various of the issues involved.

I started a Backblaze trial in order to verify the claim that the creation date is preserved, but I was unable to get an answer because 4 hours after Backblaze says that it backed up my test files, they were still not showing up in the restore interface, even though it purports to show the latest files as of this minute. After 5 hours, the files were available, I restored them, and the file creation dates were lost and changed to the modification date. The Backblaze restore also messed up the files’ modes, making them executable when they had not been.

Update (2017-08-24): Backblaze support explained to me that it’s normal for there to be a delay, which can be from 1–8 hours, before the files are actually available for restore. This is because, although the file data has been sent to the server, the server can’t access the files until the client has sent the index that describes the changes. It typically waits a few hours before doing this. What this means is that, during those hours, the Backblaze client reports that the backup is complete (“You are backed up as of: Today, 7:28 AM”), but it’s actually not. If your Mac breaks or goes offline (i.e. you pack up your MacBook for a trip) before the index has been uploaded, it’s as if the backup never happened. I assume the delay before sending the index is some sort of optimization, so perhaps it’s justified, but I consider it a major bug that the client reports the files as backed up when you can’t actually restore them (no matter how long you wait).

The Backblaze employee replied about the file creation date issue. The gist of it is that the dates are not preserved when restoring via the network. However, you can pay $99 (flash drive) or $189 (hard drive) for them to mail you your data, and in this format the dates will be preserved. If you mail the drive back (sounds like you have to pay shipping) they will refund the cost. I have not verified that this method works, however, I can confirm that the index file that’s sent to the Backblaze server contains the correct information for the creation dates.

Update (2017-09-03): See also: Accidental Tech Podcast.

Update (2018-02-05): aikinai:

I started getting emails warning that all of my external drives were offline and my data would be soon deleted. Instead of “Very sorry about that, here’s how to fix the issue,” I got this long response about the ways their system looks for new files in serial and it can get jammed and start ignoring everything, with no apology, no acknowledgement this was their issue, and no solution. I had to go fishing for solutions and drag the information out of them to finally figure out what I needed to do. Which it turns out is to get back an internal drive (totally unrelated to the other drives Backblaze abandoned) I had physically removed and repurposed, put it back the computer, wait a long time for Backblaze to see it, then uncheck that drive in Backblaze and remove it again.

[…]

The client will lie to you and you never know what’s really backed up. Even if you use the secret alt-click to force a full drive scan, it can still miss files and tell you fully backed up when files from days ago are still nowhere to be found. Luckily I’ve never actually needed to do a restore, but I almost thought I did one time and would have been furious at all the missing files I noticed.

Update (2022-03-07): Backblaze (via Adam Engst, Hacker News):

Backblaze has always kept a 30-day version history of your backed up files to help in situations like these, but today we’re giving you the option to extend your version history to one year or forever.

Daniel Jalkut:

I recently learned that @Backblaze’s 1 year extended backup doesn’t work the way I (or my brother, who ran into this) expected. I thought it was “everything just like 30 days plan - but for a year”. Instead, if you haven’t attached a drive for 30+ days you have to RE-UPLOAD it.

So the day is RETAINED for 365 days, but if you have a slow or expensive bandwidth connection, you have to make sure to re-attach drives <30 days or else you have to re-upload. Even though they have the files? Disappointing. My laziness was counting on it behaving otherwise.

Tim Wood:

I love* it when @backblaze says I’m fully backed up and I double-check a (non-excluded) file and see it doesn’t show up in the restore interface. Cool.

18 Comments RSS · Twitter

Unfortunately, I have had the same thing happen to me multiple times with Arq. It lost track of a couple backup sets and Stefan couldn't really explain why that happened. Then, it lost track of my Glacier backup, which was over 500 GB of stuff. Deleting it or fixing it on Glacier takes days or weeks to resolve, and it would take me about a month to push that much data back up over Comcast. So, reluctantly, I ditched Arq and went to BackBlaze, which appears to have been backing up perfectly well for the past 16+ months.

The one time I had a real issue with a hard drive, though, that affected my Aperture library, I considered restoring from BackBlaze. They basically give you a big disk image, and it takes time to "assemble" it. With my (now larger) 700 GB Aperture library, it took them roughly 36 hours to do this, and it was going to be a 700 GB disk image! Given this wasn't a "your house burned down" situation, I elected instead to use Time Machine, which restored things perfectly. So I'd probably only being restoring from BackBlaze as an absolute-last resort, and be wary of the inconveniences it has for "big" things where you want to do large restores. Maybe they can mail a hard drive; that's probably what I'd look into first in a disaster recovery situation. If it were a DR situation I could recreate tags and the like, it wouldn't be a huge deal to me. And my file names have date info in them for those that are critical (because OS X loses creation dates if you use DropBox or iCloud and have the files on 2 machines); I could use a script to reset the creation dates based on file names if need be.

CrashPlan's big downside for me is Java. No thanks.

"CrashPlan's big downside for me is Java. No thanks."

Can you elaborate? What is it about Java that's such a turn off?

These issues with Backblaze was the reason I switched to Crashplan two years ago. The issue with external drives was my main concern but also metadata and control over what is being backed up. Also, as I remember it, Backblaze would not allow backups of network attached storage while Crashplan did.

Looking more into it I found more advantages with Crashplan. The family plan is great. The ability to also backup to external drive, drive on LAN or even a drive over WAN simultaneous ment I could kiss Time Machine good bye (no more start-over-because-of-currupt-Time-Machine-backup, no more fans kicking in every hour). Also, there are a lot of settings for those who want more control, like which network interface to backup. For the security minded there is the possibility to use your own encryption keys separate from the CrashPlan user account.

CrashPlan is far from perfect (initial backup take forever, at least from Sweden) but I found it far better than Backblaze even though Backblaze isn't really bad, just not as good. I really don't see Java as a big problem with CrashPlan and I certainly don't think it's a deal breaker. A lot of other factors are more important.

The main disadvantage of Java (and CrashPlan) is memory usage. It depends on your backup set sizes, buton my family's Macs it often uses the best part of 1 GB (currently 828 MB RPRVT on my stepmom's Mac backing up all user data, versus 162 MB on my Mac where CrashPlan is just backing up VM images). It doesn't suck a lot of CPU and has good throttling controls to avoid backing up while you're using your Mac or while you're on battery, on a particular wireless network, etc. if you wish.

How much RAM does Backblaze use? I've never tried it but just assumed it was a lot less.

On the other hand, memory is cheap and unless you're extremely cost-constrained I see no reason not to get the maximum RAM on every Mac you buy. CrashPlan is by far the most reliable piece of Mac backup software I've ever used. I've seen a few small glitches here and there, mostly network backups stalling for no apparent reason, but they've only affected one of multiple backup destinations, they've all resolved themselves in a day or two, and there's absolutely never been any associated data loss.

Personally, I think I'm finally ready to dump Time Machine for CrashPlan for user files + a weekly SuperDuper! clone for everything else. I've been meaning to write a blog post on this for months.

@Nicholas The best CrashPlan tip I have is to turn off the live filesystem watching. It slows things down (especially if you don’t have an SSD) for little benefit. CrashPlan’s using 672 MB of memory for me right now to back up about 1.4 million files. I used to have lots of problems with CrashPlan not being able to connect to the server, for weeks at a time, but lately it’s been working well. Never had any problems with it for other family members.

Based on what I read about the bzfileids.dat files, it sounded like Backblaze also uses lots of RAM, and has limitations with large numbers of files. Arq seems to be the most efficient for lots of files.

"Arq recently reported hundreds of GB of missing files, across multiple backup targets. This is so at odds with Amazon Glacier’s reputed 11-nines durability that I’m guessing it’s due to an application bug. It would not surprise me if the files are still there; Arq just isn’t seeing them."

Hasn't Arq had intermittent problems with Glacier since they first implemented support for it? Or put another way, doesn't Glacier perhaps have hacky support for things like Arq?

Personally, I've avoided Glacier with Arq, despite the compelling price savings.

So, rather than add a third cloud backup, might you not do better by just using Arq with regular S3?

@Chucky Arq and Glacier have worked well for me in the past. When I had problems last year with Arq it was with the folders that were on S3.

I consider all of Arq to be a single cloud backup because when Arq is stuck or stalled with one backup target it stops updating the others.

[...] Michael Tsai on What Backblaze Doesn't Back Up: [...]

Thanks for putting this all in one article. I've known for a while about Backblaze's metadata problems, but not with a recent summation like this one.

You might want to check out the new Arq 4 Glacier backup method. I believe the author is using a new Glacier backup method that is less error-prone. I haven't moved my files over to it yet, but it may be better than what you experienced.

@Jesse I’ve been using the new Glacier method for months. For whatever reason, the old one worked better for me.

So the new Glacier method is what gave you problems? I haven't updated mine yet. Should I stick to the old Glacier method?

Obviously global warming issues. Amazon's glaciers are melting faster than expected, and Arq can't compensate.

(The whole idea of cheaply encoding data in ice cores was ingenious, except for the whole chaotic CO2 effect thing...)

@Jesse Yes, the new Glacier method is what gave me problems, but I don’t know if the problems were due to that, or an Arq update, or something on Amazon’s side.

[…] service to CrashPlan Home, but there doesn’t seem to be one that’s good. (See the Backblaze and Carbonite caveats below.) Right now, the leading contender is probably Arq, which is a bit more […]

I picked up the link to this thread from a podcast and I'm seeing now that it is pretty old…

I'll post my thoughts about Backblaze nevertheless.

Volatile backups

The absolute show stopper with Backblaze is indeed the fact that it forgets your backups when you haven't been backing up for some weeks. For example external disks. WTH!? This giant flaw is not even worth being discussed.

The meta data

Last time I seriously tried Backblaze it still didn't respect the com_apple_backup_excludeItem extended attribute. This attribute works with Time Machine, with CrashPlan and with Arq. It might seem finicky but I'm expecting from a backup system that it respects the rules of the OS it is working on.

So it doesn't surprise me that Backblaze doesn't (didn't) honor Finder tags either.

The Mac-nativeness

On their page you find cheese like "Made by ex-Apple Employees" and "Native and Integrated". Obviously they are aiming at CrashPlan's Java nature. And indeed when you first open the Backblaze PrefPane you truly have the impression to see an app written for the Mac (contrary to CrashPlan).

But this first impression is short-lived:

When you start to include/exclude files/folders you are already facing an interface that is absolutely inferior to CrashPlan's Java app, and even inferior to Arq's pretty clunky interface.

Besides that, have you noticed the default exclusions? .dmg and .sparseimage are excluded by default. Heck, a good part of my data is stored as dmg or as sparseimage! With the same right you could exclude all zip or tar archives by default. I guess these guys don't even know what these extensions mean.

And then the above-mentioned meta-data issues. So, "Native and integrated"? Yeah, right.

– Tom

@Tom Agreed—I was seriously unimpressed by Backblaze’s interface and default file/folder selections, especially considering that they make it sound like it backs up everything by default.

[…] in at $50/device/year for unlimited data with no weird file restrictions, but there’s some wonkiness about file permissions and time stamps, and it also only retains old file versions/deleted files for 30 […]

Leave a Comment