Monday, December 13, 2010

One Man’s Realistic Backup Strategy

Michael B. Johnson (via Daniel Jalkut):

[For] less than $300, you can have a pretty fool proof backup scheme for all but the craziest home setup. I have, I like to think, one of the crazier home Mac setups, and so I thought I’d share my current specifics.

I also recommend the Western Digital Caviar Green drives for backups. With 2 TB bare drives now at $90 and a variety of good “toaster” docks to choose from, there’s little excuse not to rotate through several hard drive clones.

9 Comments RSS · Twitter

Alternately, that'd be 6 years of unlimited online backups with CrashPlan, with absolutely no attention required beyond the initial installation, increased home power costs, etc.

I say that as a professional sysadmin with plenty of experience doing this sort of thing; the main take-home lesson these days: getting backups right is *HARD* (particularly when you consider the way data can silently rot, disks lie about media health, etc.) and it's difficult to justify the infrastructure costs on a small scale since your bootstrap costs are so high.

@Chris CrashPlan is great, but it’s no substitute for having a recent clone that you can switch to, instantly, when your drive dies. I shudder at the thought of how long it would take to restore; simply copying a terabyte of data locally takes long enough. You would need a bootable spare drive on-hand, anyway, if you want to restore from CrashPlan. So I see the two approaches as complementary.

"Alternately, that'd be 6 years of unlimited online backups with CrashPlan, with absolutely no attention required beyond the initial installation, increased home power costs, etc."

Huh? The way I calculate it is that 6 years of CrashPlan is "as low as $6/mo", which means 6 years of CrashPlan is "as low as" $432.

Two 2TB drives and a toaster cost me $160. (Newegg regularly has deals on 2TB drives for $70 shipped, if you check every so often.)

So I have two ways to get offsite backups for a multi-Mac site.

I went with the "two 2TB drives and a toaster" route over CrashPlan for a bunch of reasons:

- Cost
- Definitely don't want to run the CrashPlan software on all the Macs on my site.
- Don't want to spend the time figuring out if I can trust CrashPlan on reliability.
- Don't want to spend the time figuring out if I can trust CrashPlan on security.

CrashPlan costs me more money, costs me much more system resources on the Macs on my site, and puts crucial aspects of backup out of my control.

If I had a lower level of technical capability, I'd go with CrashPlan simply because I couldn't successfully implement a "two 2TB drives and a toaster" method. But given the choice...

Chucky: it's $50/year for unlimited online storage or completely free if you simply backup running systems to each other (i.e. your other systems, friends, etc.), using local storage, etc. That also means that e.g. your local systems can backup (realtime or on your preferred schedule) much faster than the off-site transfer, all with zero manual cost.

The main benefit is that they've done a lot of the work for things like encryption, integrity checks, etc. which are completely missing in the simple disk-based approach. For example, I saw nothing in that post which protects against silent bitrot: no Apple filesystem does integrity checks and the disks, network or software are certain to lose data over time - without some sort of checksumming, you won't even know that happened until you need to restore something. RAID only helps slightly with this, particularly if it's Apple's software RAID which hasn't been particularly reliable during hardware failures in my experience (one of the main reasons we tried to avoid Mac servers). Similarly, Time Machine doesn't even attempt to handle this problem which is one of the reasons why it's not a serious contender for important data - it's a shame that ZFS didn't make it into 10.6 because that would have been a great fit with robust integrity checks and built-in snapshots and replication.

My point wasn't that you can't do this but that it's hard to get right and it takes a lot of exhaustive testing and regular monitoring to work out some of the details. There's no way the total time doing that - or just managing drives & mailing stuff - wouldn't cost more than $50/year in time, which is the cost I'm more concerned with these days.

"Chucky: it's $50/year for unlimited online storage"

Huh. Thanks for getting me to look at the fine print.

"As low as $6/mo" actually = $10 per month. It's only $6 per month if you're willing to pre-pay for the next four years.

So for a multi-Mac site, CrashPlan is actually $120/year

But, as I say, I have several other reasons beyond just cost for avoiding CrashPlan as my primary backup method.

(As far as "silent bitrot" goes, if multiple hard drives, each containing multiple clones and multiple TimeMachine AFP created sparseimages, all simultaneously fail due to silent bitrot, I guess I'll wish I'd had followed in your path. It just doesn't seem like the most likely pitfall to me.)

"My point wasn't that you can't do this but that it's hard to get right and it takes a lot of exhaustive testing and regular monitoring to work out some of the details. "

See, I don't think that's really true. Apple, along with CCC or SD, gives you pretty much all the tools you need. Only a moderate level of expertise and small amount of time is needed to set things up on a close to automatic basis. Then you just need to pop a drive in the toaster and click a button every so often.

"There's no way the total time doing that - or just managing drives & mailing stuff - wouldn't cost more than $50/year in time"

Again, it's $120/yr for a multi-Mac site, but on the cost point alone, I agree with you. You will spend at least a few hours a year on offsite backup via a hard drive swap method. But I think you get much better value via the hard drive swap method than CrashPlan...

I don't really care about the service used but I do feel it worth reiterating: if your backups do not include checksums (either explicitly using a tool or implicitly using a filesystem like ZFS) you are likely to lose data and not realize it until you need it most. Your restore will appear to work but without checking all of the files it's unlikely that you'd notice, say, a corrupt image file - and if you were doing doing that because of, say, a home disaster it'd be really unpleasant to find that your offsite backups weren't trustworthy.

"if your backups do not include checksums (either explicitly using a tool or implicitly using a filesystem like ZFS) you are likely to lose data and not realize it until you need it most."

Well, here is the always trustworthy Mike Bombich on the tools used by CCC:

To provide this level of integrity, rsync will calculate a checksum of each chunk of data that is transferred, on both side of the transfer. This checksum calculation is the performance bottleneck you're running into.

"Your restore will appear to work but without checking all of the files it's unlikely that you'd notice, say, a corrupt image file - and if you were doing doing that because of, say, a home disaster it'd be really unpleasant to find that your offsite backups weren't trustworthy."

Well, again, that's why I rely on redundancy. To lose archival data, I'd need multiple failures of multiple disk images on multiple hard drives in multiple locations.

Now, let's say that an image file gets silently corrupted on a user Mac's hard drive somehow. (I've never seen such a thing happen in practice, but it's certainly within the realm of the possible.) And I start cycling through backups for years before ever trying to access that image file, only to suddenly learn 10 years later that the image file is now corrupted on all of my backups. Well, that's a bad outcome. But it seems both far less likely and far less catastrophic than several other types of data loss I might encounter by relying on CrashPlan as my primary backup method for the next 10 years, no?

I agree that the backup should be checksummed. But I don’t think that’s what the Bombich quote is getting at. rsync uses checksums as an optimization to know which parts of the files it can skip during an incremental backup, but it cannot, as far as I know, actually verify a backup that’s sitting on the disk.

Anyway, I think it’s more likely for data loss to occur through damage to the source. It does no good to have backups if you’re copying files that have already been damaged. I’ve had lots of bitrot over the years, not finding out until later that image and music files (and, occasionally, other types) had been corrupted. To prevent this, I checksum my important files using Git, EagleFiler, and IntegrityChecker (just one, depending on the type of file, not all three at once). Periodic validation lets me nip any problems in the bud, and the added benefit is that if I do a full restore from a clone drive I can validate the important files even though the backup itself wasn’t checksummed.

"I’ve had lots of bitrot over the years, not finding out until later that image and music files (and, occasionally, other types) had been corrupted. To prevent this, I checksum my important files"

I learn something new every day. My file storage dates back twenty years, and I've never personally noticed such a thing happening to me, but I take this as useful knowledge moving forward.

Leave a Comment