The Eroding Agile Test Pyramid
The test pyramid is the ideal model for agile teams to use when designing their test portfolio. Unit tests form a solid foundation for understanding whether new code is working correctly:
- They cover code easily: The developer who wrote the code is uniquely qualified to verify that their tests cover their code. It’s easy for the responsible developer to understand what’s not yet covered and create test methods that fill the gaps.
- They are fast and cheap: Unit tests can be written quickly, execute in seconds, and require only simple test harnesses (versus the more extensive test environments needed for system tests).
- They are definitive: When a unit test fails, it’s relatively easy to identify what code must be reviewed and fixed. It’s like looking for a needle in a handful of hay versus trying to find a needle in a heaping haystack.
However, there’s a problem with this model: The bottom falls out when you shift from progression testing (checking that the newly added functionality works correctly) to regression testing (checking that this functionality isn’t impacted by future changes). Your test pyramid often becomes a diamond:
At least, that’s what surfaced in the data we recently collected when examining unit testing practices across mature agile teams. In each sprint, developers are religious about writing the tests required to validate each user story. Typically, it’s unavoidable: Passing unit tests are a key part of the definition of done. By the end of most sprints, there’s a solid base of new unit tests that are critical in determining if the new code is implemented correctly and meets expectations. Our data says these tests typically cover approximately 70 percent of the new code.
From the next sprint on, these tests become regression tests. In many agile approaches, little by little, they start failing—eroding the number of working unit tests at the base of the test pyramid, as well as the level of confidence the test suite once provided.
After a few iterations, the same unit tests that once achieved 70 percent coverage provide only about 50 percent coverage of that original functionality. Our data says this drops to 35 percent after several more iterations, and it typically degrades to 25 percent after six months.
This subtle erosion can be dangerous if you’re fearlessly changing code, expecting your unit tests to serve as a safety net.
Why Unit Tests Erode
Unit tests erode for a number of reasons. Even though unit tests are theoretically more stable than other types of tests, such as UI tests, they too will inevitably start failing over time.
Code gets extended, refactored, and repaired as the application evolves. In many cases, the implementation changes are significant enough to warrant unit test updates. Other times, the code changes expose the fact that the original test methods and test harness were too tightly coupled to the technical implementation—again, requiring unit test updates.
However, those updates aren’t always made. After developers check in the tests for a new user story, they’re under pressure to pick up and complete another user story. And another. And another. Each of those new user stories need passing unit tests to be considered done—but what happens if the old user stories start failing?
Usually, nothing. The failures get ignored, or—if all tests must pass to clear a CI/CD quality gate—the offending tests get disabled. Since the developer who wrote that code will have moved on, appropriately resolving the failures would require them to get reacquainted with the long-forgotten code, diagnose why the test is failing, and figure out how to fix it. This isn’t trivial, and it can disrupt progress on the current sprint.
Frankly, unit test maintenance often presents a burden that many developers resent. Just scan Stack Overflow and similar communities to read developer frustrations related to unit test maintenance.
How to Stabilize the Erosion
I know that some exceptional organizations require unit test upkeep and even allocate appropriate resources for it. However, these tend to be organizations with the luxury of SDETs and other development resources dedicated to testing. Many enterprises are already struggling to deliver the volume and scope of software that the business expects, and they simply can’t afford to shift development resources to additional testing.
If your organization lacks the development resources required for continuous unit test maintenance, what can you do?
One option is to have testers compensate for the lost coverage through resilient tests that they can create and control. Professional testers recognize that designing and maintaining tests is their primary job and that they are ultimately evaluated by the success and effectiveness of the test suite. Let’s be honest—who’s more likely to keep tests current: the developers who are pressured to deliver more code faster, or the testers who are rewarded for finding major issues (or blamed for overlooking them)?
In the most successful organizations we studied, testers offset the risk of eroding unit tests by adding integration-level tests, primarily at the API level, when feasible. This enables them to restore the degrading “change-detection safety net” without disrupting developers’ progress on the current sprint.
This article was originally published February 20, 2019, on StickyMinds.com.
Wolfgang Platz is the Founder and Chief Product Officer of Tricentis. Wolfgang is the force behind innovations such as model-based automation and the linear expansion test design methodology. The technology he developed drives Tricentis’ Continuous Testing Platform, which is recognized as the industry’s #1 solution by all top analysts. Today, he is responsible for advancing Tricentis’ vision to make resilient enterprise automation a reality across Global 2000 organizations. His most recent book is Enterprise Continuous Testing: Transforming Testing for Agile and DevOps.