Release the Monkeys: Testing Using the Netflix Simian Army
The cloud is all about redundancy and fault tolerance. Since no single component can guarantee 100 percent uptime, we have to design architectures where individual components can fail without affecting the availability of the entire system. But just designing a fault tolerant architecture is not enough. We have to constantly test our ability to actually survive these “once in a blue moon” failures. And the best way is to test in an environment that matches production as closely as possible or, ideally, actually in production. This is the philosophy behind Netflix' Simian Army, a group of tools that randomly induces failures into individual components to make sure that the overall system can survive. Gareth Bowles introduces the main members of the Simian Army―Chaos Monkey, Latency Monkey, and Conformity Monkey. Gareth provides practical examples of how to use them in your test process—and, if you're brave enough, in production.