Wednesday, October 10, 2007

Why we need something better than HFS+: bit errors are cumulative

I hope this analysis is not correct, but if it is then there's no debating that we need an HFS+ replacement. The geeks I read generally favor ZFS+, perhaps with Apple contributing improvements.

Recording Artist: ZFS Hater Redux

Here's a fairly typical Seagate drive with a capacity of ~150GB = ~1.2 x 1012 bits. The recoverable error rate is listed as 10 bits per 1012 bits. Let's put those numbers together. That means that if you read the entire surface of the disk, you'll typically get twelve bits back that are wrong and which a retry could have fixed.

Yes, really. Did you catch the implications of that? Silent single-bit errors are happening today. They happen much more often at high-end capacities and utilizations, and we often get lucky because some types of data (video, audio, etc) are resistant to that kind of single-bit error. But today's high end is tomorrow's medium end, and the day after tomorrow's low end. This problem is only going to get worse.

Worse, bit errors are cumulative. If you read and get a bit error, you might wind up writing it back out to disk too. Oops! Now that bit error just went from transient to permanent.

Still think end-to-end data integrity isn't worth it?

I wonder how NTFS compares? Too bad it's not open source :-)!

No comments: