Agile + DevOps East 2020 Concurrent Session : Metric-driven CI Stability

Conference archive


Thursday, November 12, 2020 - 5:00pm to 6:00pm

Metric-driven CI Stability

When I joined our Developer Experience team, we had very little visibility into how we were serving other engineers with our CI tools. We had no hard-evidence to back up any claims. We identified what we could and should measure. Then we established SLIs/SLOs to formalize those concepts and conducted an experiment to improve Buildkite stability. In the end we reached our goal on stability going from ~95% to 99.5%+ of builds that didn't fail because of something we had control. Now that we have hard data around the job we are doing, we don't have to make a decision about when to focus on CI vs. when to work on other tasks. If the SLO isn't being met, we work on that. If the SLO is being met, then we can work to improve our other, slightly less-critical tools. The audience should walk away with some understanding of how to identify what can have an SLO applied to it, how to gather data for the SLI, and why this is good for their team’s productivity.


Lindsey Whitley is a Developer Experience Software Engineer at Gusto in New York City. Until September 2019 she was on Gusto’s Data Platform Engineering team in Denver. Over the course of a month, she applied the techniques and knowledge gained from the Data team to develop and stabilize SLOs for Gusto's Continuous Integration via Buildkite. Since simplifying the decision of whether or not CI (one of her team's core offerings) needs attention, they have been able to focus on building and improving other tools.