Files can be written to the Vault even if a Storage Pod is down with two parity shards to protect the data. Even in the extreme — and unlikely — case where three Storage Pods in a Vault are offline, the files in the vault are still available because they can be reconstructed from the 17 available pieces.
We use Reed-Solomon erasure encoding. It’s a proven technique used in Linux RAID systems, by Microsoft in its Azure cloud storage, and by Facebook too. The Backblaze Vault Architecture is capable of delivering 99.99999% annual durability thanks in part to our Reed-Solomon erasure coding implementation.
We developed our Reed-Solomon implementation as a Java library. Why? When we first started this project, we assumed that we would need to write it in C to make it run as fast as we needed. It turns out that modern Java virtual machines working on our servers are great, and just-in-time compilers produces code that runs pretty quick.
Yes we do plan on expanding to more datacenters, and we do have emergency plans in place, though we do choose our datacenters carefully to make sure that we avoid any natural-disaster prone areas. As for backing up our own data - we certainly do make backups of our core info/necessary data. As for the user data that we store, that’s backed up across the storage pods in a vault as discussed in that post. We do not replicate customer data across multiple datacenters. At our price-point, that’s just not feasible.
Stay up-to-date by subscribing to the Comments RSS Feed for this post.