Tuesday, November 6, 2012

Arq 3.0

My favorite backup program has added support for Amazon Glacier. This is exciting news, as S3 is great for backups of my frequently changing files, but I want to use the less expensive Glacier to archive large quantities of mostly static files.

With Arq 3, when you choose a folder to back up you can select whether to use S3 or Glacier. The two services can have different backup schedules. Unlike with S3, Arq does not let you set a target budget for Glacier, however it does help calculate the Glacier retrieval costs and set the restoration transfer rate to maximize speed or minimize cost. Since it takes hours to retrieve files from Glacier, Arq stores the bookkeeping for the Glacier data on S3.

In my opinion, Arq is one of the best new apps of the last few years (along with Tower and Hibari). That said, due to some unfortunate interface design, I haven’t actually been able to get Arq to use Glacier yet. Each folder must be set to use either S3 or Glacier, and Arq prevents me from creating two folders that overlap. So, for example, I currently have my Photos folder backing up to S3, with all but the last few months excluded. I would like to back up the entire folder to Glacier, but Arq won’t let me create another entry for it.

14 Comments RSS · Twitter

Woo-hoo! Snowy support!

A nice upgrade I can actually buy.

"My favorite backup program has added support for Amazon Glacier."

Most excellent news. I've got cause to use that feature.

Good dev, so I assume they'll fix the edge case bugs.

Glaciar pricing is just so damn weird and non-intuitive for my intended use-case scenario of backup.

So, say I store 100GB in Glacier.

I only pay $12 a year for as long as I don't want my data back. Super!

But if I ever want to retrieve that 100GB in one fell swoop, my best calculation is that it'd cost me over $200 to get my data back over a 5 day period. Weird. (If I was willing to wait 2 weeks to get my data back, it'd only cost me $85 or so, but 2 weeks?)

It's definitely much cheaper than S3 for backup, but that large distinction starts to really disappear if you actually need to retrieve large amounts of what you've stored. And we are talking backup for the purposes of a possible retrieval here, of course.

So the use-case-scenario of backing up 2TB bare drives really doesn't make economic sense. But I guess tossing up clones of OS X systems along with associated User folders purged of A/V data every four months could well make economic sense.

@Chucky I’m thinking of using Glacier for two scenarios:

(1) To restore a pristine copy of the occasional file that goes bad. In this case, the retrieval may even be free.

(2) Catastrophic recovery, if for some reason my bare drives don’t work, are inaccessible, or I mess up and propagate the mistake to the clones before catching my mistake. In this case, I would gladly play a few hundred dollars to get my data back.

And, yes, Stefan has made some mistakes, and there have been a few serious bugs, but (contra CrashPlan) he seems to be good about fixing them.

I'm also excited about Arq 3 with Glacier support. And I also find the inability to have overlapping folders disappointing, but for a different reason. I would like to keep my Photos backup in S3 until Arq finishes with a Photos backup to Glacier. I'd hate to have to delete the S3 backup, then wait a month or so before the Glacier backup is done.

"(1) To restore a pristine copy of the occasional file that goes bad. In this case, the retrieval may even be free."

That one doesn't work for me, since I'd rather rely on my own encrypted disk images than Arq's encryption. So pulling one file for me means pulling the whole disk image.

(Though I suppose I could reconsider trusting Arq's encryption...)

"(2) Catastrophic recovery, if for some reason my bare drives don’t work, are inaccessible, or I mess up and propagate the mistake to the clones before catching my mistake. In this case, I would gladly play a few hundred dollars to get my data back."

Yup. That's exactly how I'm thinking of using it. Regularly toss up clones of all my machines, and keep older ones up there for quite a while for the your precise reasoning. In case of catastrophic local failure, I very likely won't have to pull all the data down, and costs will be reasonable. (And if I do have to pull several older clones down to get my proper data back, I'll be happy to pay.)

But it certainly won't economically work for retrieval of a whole bare drive that goes bad. Retrieval of a single 2TB drive that fails would cost something like $1,000 to $5,000, depending on speed. Much cheaper to dupe them locally and store off-site.

"I'd hate to have to delete the S3 backup, then wait a month or so before the Glacier backup is done."

Glacier uploads aren't slow, only downloads. (Unless my understanding is badly mistaken somewhere.) So your Glacier backup won't take a month, or even a day...

@Chucky — No, Glacier's upload isn't slow. What's slow is my 1990's-quality ADSL connection here in southern Spain. :-(

"No, Glacier's upload isn't slow. What's slow is my 1990's-quality ADSL connection here in southern Spain"

A-ha! But on the upside, you do get to live in southern Spain.

(Waaaay off-topic note: if you get a free moment, inform the political leadership there that they should drop the Euro and bring back the Peseta. The Pain in Spain would get better a lot faster. Many years faster...)

@Matt To avoid deleting the S3 backup, perhaps you could create a new backup set. Leave the old one around until your Glacier upload is done (and you no longer need the historical archive).

@Michael Tsai

"there have been a few serious bugs, but (contra CrashPlan) he seems to be good about fixing them."

Just for my own education -- what serious CrashPlan bugs have you run into? Also, I assume you've tried BackBlaze as well?

Thanks much.

@Sean There were problems where CrashPlan was unable to restore any of my files due to a “backup archive I/O error.” I guess this was due to a transient problem with their data center, as it eventually resolved itself, but that was scary.

I had problems—never resolved despite full reinstalls, etc.—where CrashPlan would get stuck using 100% CPU for weeks and not let me back up or restore anything. This might be an issue of having too many files for it, as it seems to work fine for my parents.

I never actually tried BackBlaze because my initial investigation showed that it was unsuitable. Sorry, I don’t recall what the issues were.

"however it does help calculate the Glacier retrieval costs and set the restoration transfer rate to maximize speed or minimize cost"

And major kudos to Stefan simply for putting up that page showing sample Glacier retrieval pricing / speed.

I'd never been able to even get a vague sense of the pricing and speed from reading Amazon's pricing pages, due to either my own stupidity or Amazon's deliberate opaqueness on the topic...

Another Arq advantage is that Amazon’s servers seem to be much faster at receiving uploads than CrashPlan’s.

[...] been seriously using Arq since version 2, and version 3 was one of my favorite apps. Version 4 so far seems to be better still. The app itself has been [...]

Leave a Comment