Thursday, May 22, 2014

What Backblaze Doesn’t Back Up

Arq recently reported hundreds of GB of missing files, across multiple backup targets. This is so at odds with Amazon Glacier’s reputed 11-nines durability that I’m guessing it’s due to an application bug. It would not surprise me if the files are still there; Arq just isn’t seeing them. In any event, my strategy is to have multiple cloud backups—Arq and CrashPlan (which has been working very well recently)—so this got me thinking about possibly adding a third.

The obvious choice is Backblaze. It has a native Mac app, is developed by ex-Apple engineers, and sponsors many fine podcasts.

I’d previously been hesitant about Backblaze because of the way it handles external drives. I’ve read about problems with large bzfileids.dat files sucking RAM and preventing backups entirely once they get too large. It’s also worrisome that it only retains deleted files for 30 days—meaning that a file is truly lost if I don’t notice that it’s missing right away. And if, for some reason, my Mac doesn’t back up for 6 months, Backblaze will expunge all my data, even if my subscription is still paid-up. The situations in which my Mac is not able to back up for a while are exactly the ones in which I (or my survivors) would want to be able to depend on a cloud backup!

My other concern is that Backblaze doesn’t actually back up everything. It fails all but one of the Backup Bouncer tests, discarding file permissions, symlinks, Finder flags and locks, creation dates (despite claims), modification date (timezone-shifted), extended attributes (which include Finder tags and the “where from” URL), and Finder comments. Arq, CrashPlan (as of version 3), SuperDuper, and Time Machine all support all of these. Dropbox supports all of them except creation dates, locks, and symlinks.

As a programmer, I especially care about metadata. But I think most users would as well, if they knew to think about it. For example, losing dates can make it harder to find your files (i.e. they disappear from smart folders or sort incorrectly), even leading to errors (i.e. not finding the correct set of invoices for a time period). You would never use a backup app that didn’t remember which folders your files were in, so I don’t know why people consider it acceptable to lose their Finder tags. (If you use EagleFiler, it can restore the tags for you.)

Some people don’t care much about metadata. Macworld’s survey of online backup services didn’t mention it. Neither did Walt Mossberg. (He also told readers that Backblaze automatically includes “every user-created file”; in fact, it skips files over 4 GB by default.) [Update (2014-05-25): The Sweet Setup doesn’t mention metadata, either (via Nick Heer).]

Backblaze has a stock answer when asked about Backup Bouncer:

This actually tests disk imaging products, a bad test for backup as items we fail on shouldn’t be backed up by data backup service.

Some people accept this explanation. I think it’s misguided and borderline nonsensical. True, Backup Bouncer tests some rather esoteric features, but Backblaze fails the basic tests, too. It would be one thing to say that there’s a limitation whereby dates, tags, comments, etc. aren’t backed up, but they’re actually saying that these shouldn’t be backed up. As if products that do back them up are in error. So presumably Backblaze doesn’t consider this a bug and won’t be fixing it.

Lastly, it’s a shame that Backblaze isn’t up front about what metadata it supports. Some users are technical enough to investigate these things themselves. Others will have read the excellent Take Control of Backing Up Your Mac and seen its appendixes, which give Backblaze a C for metadata support. But most Backblaze users won’t know that a poor choice has been made for them until they need to restore from their backup.

13 Comments

Unfortunately, I have had the same thing happen to me multiple times with Arq. It lost track of a couple backup sets and Stefan couldn't really explain why that happened. Then, it lost track of my Glacier backup, which was over 500 GB of stuff. Deleting it or fixing it on Glacier takes days or weeks to resolve, and it would take me about a month to push that much data back up over Comcast. So, reluctantly, I ditched Arq and went to BackBlaze, which appears to have been backing up perfectly well for the past 16+ months.

The one time I had a real issue with a hard drive, though, that affected my Aperture library, I considered restoring from BackBlaze. They basically give you a big disk image, and it takes time to "assemble" it. With my (now larger) 700 GB Aperture library, it took them roughly 36 hours to do this, and it was going to be a 700 GB disk image! Given this wasn't a "your house burned down" situation, I elected instead to use Time Machine, which restored things perfectly. So I'd probably only being restoring from BackBlaze as an absolute-last resort, and be wary of the inconveniences it has for "big" things where you want to do large restores. Maybe they can mail a hard drive; that's probably what I'd look into first in a disaster recovery situation. If it were a DR situation I could recreate tags and the like, it wouldn't be a huge deal to me. And my file names have date info in them for those that are critical (because OS X loses creation dates if you use DropBox or iCloud and have the files on 2 machines); I could use a script to reset the creation dates based on file names if need be.

CrashPlan's big downside for me is Java. No thanks.

"CrashPlan's big downside for me is Java. No thanks."

Can you elaborate? What is it about Java that's such a turn off?

These issues with Backblaze was the reason I switched to Crashplan two years ago. The issue with external drives was my main concern but also metadata and control over what is being backed up. Also, as I remember it, Backblaze would not allow backups of network attached storage while Crashplan did.

Looking more into it I found more advantages with Crashplan. The family plan is great. The ability to also backup to external drive, drive on LAN or even a drive over WAN simultaneous ment I could kiss Time Machine good bye (no more start-over-because-of-currupt-Time-Machine-backup, no more fans kicking in every hour). Also, there are a lot of settings for those who want more control, like which network interface to backup. For the security minded there is the possibility to use your own encryption keys separate from the CrashPlan user account.

CrashPlan is far from perfect (initial backup take forever, at least from Sweden) but I found it far better than Backblaze even though Backblaze isn't really bad, just not as good. I really don't see Java as a big problem with CrashPlan and I certainly don't think it's a deal breaker. A lot of other factors are more important.

The main disadvantage of Java (and CrashPlan) is memory usage. It depends on your backup set sizes, buton my family's Macs it often uses the best part of 1 GB (currently 828 MB RPRVT on my stepmom's Mac backing up all user data, versus 162 MB on my Mac where CrashPlan is just backing up VM images). It doesn't suck a lot of CPU and has good throttling controls to avoid backing up while you're using your Mac or while you're on battery, on a particular wireless network, etc. if you wish.

How much RAM does Backblaze use? I've never tried it but just assumed it was a lot less.

On the other hand, memory is cheap and unless you're extremely cost-constrained I see no reason not to get the maximum RAM on every Mac you buy. CrashPlan is by far the most reliable piece of Mac backup software I've ever used. I've seen a few small glitches here and there, mostly network backups stalling for no apparent reason, but they've only affected one of multiple backup destinations, they've all resolved themselves in a day or two, and there's absolutely never been any associated data loss.

Personally, I think I'm finally ready to dump Time Machine for CrashPlan for user files + a weekly SuperDuper! clone for everything else. I've been meaning to write a blog post on this for months.

@Nicholas The best CrashPlan tip I have is to turn off the live filesystem watching. It slows things down (especially if you don’t have an SSD) for little benefit. CrashPlan’s using 672 MB of memory for me right now to back up about 1.4 million files. I used to have lots of problems with CrashPlan not being able to connect to the server, for weeks at a time, but lately it’s been working well. Never had any problems with it for other family members.

Based on what I read about the bzfileids.dat files, it sounded like Backblaze also uses lots of RAM, and has limitations with large numbers of files. Arq seems to be the most efficient for lots of files.

"Arq recently reported hundreds of GB of missing files, across multiple backup targets. This is so at odds with Amazon Glacier’s reputed 11-nines durability that I’m guessing it’s due to an application bug. It would not surprise me if the files are still there; Arq just isn’t seeing them."

Hasn't Arq had intermittent problems with Glacier since they first implemented support for it? Or put another way, doesn't Glacier perhaps have hacky support for things like Arq?

Personally, I've avoided Glacier with Arq, despite the compelling price savings.

So, rather than add a third cloud backup, might you not do better by just using Arq with regular S3?

@Chucky Arq and Glacier have worked well for me in the past. When I had problems last year with Arq it was with the folders that were on S3.

I consider all of Arq to be a single cloud backup because when Arq is stuck or stalled with one backup target it stops updating the others.

[...] Michael Tsai on What Backblaze Doesn't Back Up: [...]

Thanks for putting this all in one article. I've known for a while about Backblaze's metadata problems, but not with a recent summation like this one.

You might want to check out the new Arq 4 Glacier backup method. I believe the author is using a new Glacier backup method that is less error-prone. I haven't moved my files over to it yet, but it may be better than what you experienced.

@Jesse I’ve been using the new Glacier method for months. For whatever reason, the old one worked better for me.

So the new Glacier method is what gave you problems? I haven't updated mine yet. Should I stick to the old Glacier method?

Obviously global warming issues. Amazon's glaciers are melting faster than expected, and Arq can't compensate.

(The whole idea of cheaply encoding data in ice cores was ingenious, except for the whole chaotic CO2 effect thing...)

@Jesse Yes, the new Glacier method is what gave me problems, but I don’t know if the problems were due to that, or an Arq update, or something on Amazon’s side.

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment