Tuesday, December 20, 2011

Backups - why you need two methods and abundant paranoia

I can't say I feel good about my backups. I believe data wants to die; it wants to be free of the burden of order. Against the despair of data, even the best backup is barely adequate.

Consider tonight, when everything almost failed - Time Capsule and Carbon Copy Cloner alike.

The Time Capsule serves all the machines in our home over a wireless network. I was surprised at first that backup would work over wireless, but it does. Each machine has its own unencrypted disk image; one on the TC's old internal 500 GB drive, two others have images on an external 2TB drive. The TC sits in a closet upstairs;  it's unlikely to be stolen but fire would destroy it. I have done 1-2 file Time Machine restores from that image, so I know it can work. The only test of a backup, of course, is a restore.

I don't trust Time Machine as much as old-time DantzRetrospect, but it seems Apple has gotten most of the bugs out.

I trust Carbon Copy Cloner [3] more. Each day it clones my server, on which all the important data lives. It's more than a cloner; CCC keeps copies of changed or deleted files in "_CCC Archives". I've configured CCC to use an encrypted image it automatically mounts every night. Since that backup is encrypted I can take it offsite, which I do every few weeks. Ok, every month or two. Offsite rotation relies on me, so it's prone to failure. Still, even if the house burns, I am unlikely to lose more than a month of images and videos. I can live with that.

So I have two backup methods, both fully automated, both relatively independent [2]. If each is 95% reliable each day, then the chance both fail on a given day is 1/400. If the daily chance of a server drive failure is 1/1000, the odds of all three failing on the same day are about 1/400,000 [2], [4]

Tonight though, my data got within a few miles of the cliff it wants to meet.

My server has been having worrisome memory exception (EXC_BAD_ACCESS) crashes, and a TV show I  recently downloaded had a file error [1]. There's something wrong on my 2yo i5 iMac; I need to run Apple Hardware Test (again). So I know my server data is at risk.

Time Capsule has had problems too -- it's reporting a "communications error" periodically. I think that error message is  a scarlet herring related to the iMac issues, but clearly I can't trust that backup.

Happily there's good old CCC -- but when I restarted my server for the first time in weeks it reported a problem. The backup drive didn't mount. That was easy to diagnose -- I'd unplugged it. Probably when I was debugging my Aperture crash 3 weeks ago. Why didn't CCC report the error? Maybe it had crashed.

I wasn't that close to data loss -- but I was in a bad neighborhood. As paranoid as I am, I'm almost not paranoid enough.

It's good to have two fully automatic and completely independent backup methods. Data wants to die, and backup is still an unsolved problem.


[1] Incidentally, you can't easily report a purchase problem to Apple until they process a charge, and to reduce transaction costs they wait a few days before they process. This is very annoying! Also, the UI for reporting a purchase problem is suspiciously clumsy. More on that experience when I see what they do.
[2] In reality they common failure points of course - me, computer memory, etc. There is the older offsite backup though, so complete and total data loss is probably less than 1/1,000,000.
[3] Donationware. I donated. I wish donation ware apps would let us set a 'reminder' so I could donate yearly. I suppose I should just make donationware donations every year on my birthday against the apps I use.
[4] I'd love to have automated offsite backup too, but I've never foundan offsite vendor I trusted and I expect ISPs to eventually charge for bandwidth use.

See also:

Update 12/21/2011: I was closer to the cliff than I realized.

No comments: