👩💻 Join our community of thousands of amazing developers!
Last February, I wrote about a really bad bug that was randomly crashing my CPUs. With a virtual machine (VM) environment, if one CPU crashes then the VM can continue running while crippled. When the last CPU crashes, the VM is dead and needs to be restarted. This had been going on for nearly a year. I ended up building a huge monitoring infrastructure that could notify me when a problem developed. While it tried to catch the root cause, it only got close enough to narrow down the cause. At th...