BellJar: A new framework for testing system recoverability at scale

1 · Facebook · May 5, 2022, 7:07 p.m.
Building infrastructure that can easily recover from outages, particularly outages involving adjacent infrastructure, too often becomes a murky exploration of nuanced fate-sharing between systems. Untangling dependencies and uncovering side effects of unavailability has historically been time-consuming work. A lack of great tooling built for this, and the rarity of infrastructure outages, makes reasoning about them [...] Read More... The post BellJar: A new framework for testing system recoverab...