The probability of data loss in large clusters

1 · Martin Kleppmann · Jan. 26, 2017, midnight
This blog post uses MathJax to render mathematics. You need JavaScript enabled for MathJax to work. Many distributed storage systems (e.g. Cassandra, Riak, HDFS, MongoDB, Kafka, …) use replication to make data durable. They are typically deployed in a “Just a Bunch of Disks” (JBOD) configuration – that is, without......