The Advantages of a Risk-Based Approach to Testing at Freshly
Interview with Derek Campbell, Engineering Manager, DevOps at Freshly
Key Takeaways
Risk-based testing is all about the idea of testing functionality with the highest probability and impact of failure over less risky functionality.
The core tool that Derek uses for risk analysis is a risk matrix.
Recommended steps to introduce the idea of risk-based testing and to start getting buy-in from the organization at large.
In this interview with Derek Campbell, we talk about the advantages of a risk-based approach to testing. Derek manages the DevOps and test automation teams at Freshly. At Freshly they are going through a bit of a DevOps transformation. One of the first steps in that journey was revamping their test suite so that they could move faster.
When they began working on this tests took almost an hour to run and they had a lot of flaky tests. By using a risk-based approach to testing, Derek and his team were able to rework the test suite — jettisoning flaky tests — focusing on the most critical areas of the product. Tests now run in minutes and the engineering team is able to move much faster.
Hosted by John Long & Alastair Wilkes.
• • •
Full transcript
John: Thank you for joining us. I'm John Long from Launchable, and I'm here with Alastair Wilkes (also from Launchable) and Derek Campbell. Derek works at Freshly on the test automation team. We ran into Derek a couple of weeks ago and were chatting with him a little bit about a risk-based approach to testing, and he had some really interesting thoughts. So we thought we'd record this and share it for other people to benefit from. So, first of all, Derek, welcome. Could you just tell us a little bit about your background? It's a little bit unconventional, right?
Derek: Yeah. A little bit. So I manage the DevOps and test automation teams here at Freshly, but in my past life, I actually taught first and second grade until my path led me to software development. I've actually got a pretty interesting article on LinkedIn where I kind of detail what that journey looked like. I believe that my success in engineering as a whole has really been due to tenacity, luck, and integrity. I really believe that those tenets have guided me to success in this field. I've specialized exclusively in software testing and leadership for most of my career in engineering, with a focus on QA transformation, staffing and engagement, process improvement, and implementation risk-based testing, and design and development of test data management solutions. So I've come up with some creative solutions, both for Freshly and for previous companies I've worked with, and test automation and DevOps are our areas of special specialization for me, and lastly, agile delivery. So I'm really passionate about the agile process, especially as far as Kanban is concerned. Overall I think of myself as being pretty easygoing, and I try not to take things too seriously.
John: So how did you get from elementary school teacher to tech?
Derek: Yeah. As I said, I really just kind of lucked out, but in a nutshell, I was working as a teacher teaching first and second grade at an amazing charter school. I was very lucky to get hired right out of college. And I had always promised myself that I would teach children in the way that I knew was right based on my own values and based on contemporary science. The science detailed that we should teach children in a certain way and that they learn best in this way. And unfortunately, our education system here in the United States is very outdated. It's based on centuries-old practices, and much of our legislation today that guides education does not coincide with current research trends. So I was kind of being forced to teach in a way that wasn't congruent with my values, in addition to dealing with some life stuff along with some health issues related to standing for long periods of time and a very toxic relationship.
So it was a perfect storm for me to not feel satisfied with what I was doing. And I'll never forget the day that I decided to leave teaching. I was in the car getting a ride home from a fellow teacher. She was taking her three kids home and me because I couldn't afford a car and didn't want to wait for the bus. I confided in her how scared I was about having this massive amount of student loan debt, and that I felt I would never be able to pay it off. And she looked over to me - this 50-year-old woman - she looked over to me, smiled, and said, "Oh, sweetie, I'm still paying mine off."
And it was at that point that I said, okay, I don't think this is going to be for me. I don't want to have this cloud over my head for the next few decades as I try to find success and take care of a family and all of that. So I said, all right, I'm gonna leave education. I didn't really have a plan. I just knew that that was no longer in my path. And so I left and I went the extreme route and moved to a different state and went into a completely different career, actually medicine. So I got certified as a nursing assistant and managed a care room in Georgia, where I took care of the elderly for three or four months until I realized that wasn't my calling either.
So I had a lot of indecision in my early days, but I learned a lot from that experience dealing with people and dealing with people specifically that are dealing with Alzheimer's and diseases of that nature and learn to be much more compassionate and even more patient than when dealing with first and second graders. So I learned a lot from that experience. It was great but decided it was time to try something new. And I had stumbled across a book called "How Computers Work." I saw it at Barnes and Noble, and I thought, man, computers are so cool. They're so cool. But I could never do anything with computers because I'm not a math whiz. I'm just some Joe Schmoe. And I said, but you know what, I'm curious, I'm gonna learn about it anyways.
So I grabbed the book and it just sparked my curiosity. I was so excited about it that I decided to take a risk and go back to school. I took a semester of community college courses in the good standards of Java object-oriented programming and SQL, just the real bare bones stuff, and struggled through it. Oh man, it's so hard learning to program, and anyone that tells you otherwise... they must be a natural! It was hard. I had to dedicate myself to it. And many times I thought this isn't going to work out. Like, there's no way I can do this for a career. And luckily for me, that tenacity paid off and I kept persevering and kept pushing through and kept learning and pouring myself into learning. And during a capstone presentation for one of my classes, I was lucky enough that a headhunter for American Express was looking for interns, and they happened to walk into the class for only two presentations. One of those was mine. They liked what they saw. I put a lot of time and energy into that project, and they appreciated it. So they went over to my teacher, whispered in their ear, and left the classroom. And then later my teacher comes over to me and says, how would you like to intern for American Express?
John: And that's how it started!
Derek: And of course I said, yes. And here we are.
John: Wow! I was just going to say, I think a lot of us come at tech from unconventional backgrounds. I haven't met the conventional tech background yet! That's kind of funny. 😄
Al: I'm curious how that led into a career in not just in software, but in software testing?
Derek: Sure. Absolutely! So I worked at American Express for a little over a year as a systems engineer. I worked on a few of the internal applications for American Express, including their own source control management system for one of their big mainframes. And that was great. I got a ton of experience writing Java code and writing JavaScript code, doing the whole HTML/CSS thing, lots of on-the-job experience. And I just got to a point where I wasn't feeling like I enjoyed it as much as I thought I would.
I just didn't feel like that was going to be a good fit for me in the long term. And so I left American Express, and it was at that time that a small company reached out to me about joining their team as a QA engineer. And I told them, look, I've not done QA before, so I'm not sure how good I'll be at it, but I've coded in JavaScript and I've done some stuff in Java, so I'm sure I can pick it up. And they took a chance on me, and I got in there.
Their test automation suite was written by interns with even less experience than I had! So we scrapped that, and I was able to build a framework from the ground up. That's a very rare experience in test automation, period, to have the opportunity to build a suite from the ground up, and especially that early in your career. So I really lucked out having that opportunity. I made a lot of mistakes, but I learned a lot from that experience. I learned a lot about what's right and what's wrong in test automation. And my next assignment came in as a test automation engineer again and started working up the ranks. So I was recognized as being pretty good at it.
And I've got good attention to detail, I suppose you could say, I picked up on the QA stuff very quickly because of that attention to detail. I found that it was something I was pretty good at and more passionate about than strictly software development. I liked the idea of risk mitigation, but I didn't like the strictness with which people adhered to test everything. It didn't make much sense to me. You know, we've got this idea of delivering software as quickly as possible to our clients and customers so that we can get their feedback and iterate on it. That's the whole point of agile, so we can stay competitive.
And it didn't make sense to me then why we would have a month's worth of regression testing on an application or two days' worth of automated regression tests required.
Why are we doing this if our goal is to deliver to the client as quickly as possible? So I was lucky enough then to be able to create a new test automation framework within that organization for one of the other internal tools. And then I joined and created a center of excellence with that company. I'm not sure how familiar you all are with centers of excellence, but it's a small core group of domain experts that lead architecture process standards, et cetera, for an organization in that domain. And then also has the responsibility of disseminating that knowledge and pursuing those standards across the organization.
So they're kind of the experts in that domain. So I helped to create that group in my previous role and was able to develop many frameworks as we were transitioning many of the tools that we used internally. We were trying to give them automated tests and integration tests to reduce their manual regression times. We found ourselves building a similar framework over and over, so we standardized the framework that we were building and created something that was very portable. And as we were scaling out these automation frameworks, we had thousands of manual test cases that needed to be converted into automated test cases. This is when I first started getting into the idea of risk-based testing.
I thought, I don't like the idea of having a 30-day regression test suite. Looking at the content of some of these tests, I thought to myself, how much value are these tests really adding? You know, if we have a hundred tests on a feature that's used by one person a week, is that really providing a lot of value for the time that it's taking that individual or automated suite to run? There's gotta be a way to find out. And so that's when I came to risk-based testing.
Risk-based testing is all about the idea of testing functionality with the highest probability and impact of failure over less risky functionality.
So you prioritize things that are higher risk, with risk being determined by two vectors: impact and probability. Impact being the effect of the risk on the business, whether that's operational, financial, legal, or reputational, and the probability of that risk occurring.
So the probability can come from frequency of use. If this is a feature that's used every day, that's high frequency of use. If it's used every day by a hundred users, that's very different than being used every day by a million users. We might have a feature that maybe isn't really high impact, but it's on the front page of the website. Maybe it's static content, you can't interact with it. And the impact of that page being down might be kind of low. It doesn't have any functional value, but because of its visibility, it is higher risk. If that's the face of your company, then you're talking about your reputation. And with probability, you've also got to factor in complexity. Some features are so complex that they are by their very nature more prone to risk. That's just the high-level introduction to risk-based testing.
Risk-based Approach to Testing
John: So this risk-based approach, is this something that you had done a lot of reading on? Are there books or articles that people could read on this?
Derek: To be honest, it's a little harder to find information on risk-based testing in software than you would imagine. I mean, the approach that I take is more from risk analysis and other sectors and other industries. I can't really point anyone to any good articles just off the cuff, but I could probably come up with some stuff and send them to you after this interview.
John: Yeah. I was just sort of curious, how were you exposed to this? I mean, you've mentioned working in centers of excellence? Were there people that you were working with talking about risk-based testing? Or was this a kind of an idea that had been brewing in your head as you were looking at Agile and testing and all of that?
Derek: Yeah. So it's kind of an idea that was brewing in my head. You know, I thought that this can't be the right way to do it, having a month-long regression can't be the right way. And both of my parents are in mortgage and my father, at the time, he was doing risk analysis for mortgage loans.
Al: When you mentioned just in practice and other industries, I was curious which industries you were referring to? Sounds like that's one of them? (The mortgage industry.)
Derek: Exactly. So risk analysis on loans. And it's like, okay, why don't we do that for our tests? You know, if that works for other industries and in other ways, why can't we apply the same thing? And so I just started scouring the internet for risk analysis and risk testing. And eventually I came upon this and kind of curated my own short-form version of risk-based testing in software.
Al: Let's double-click into that a little bit. I'm curious, we talked a bit about the reasoning behind this, but I was wondering if you could expand on the actual techniques or tooling that you might use either in your current role or in past roles to achieve this in your teams.
Derek: The core tool that I use for risk analysis is a risk matrix. You can look that up online pretty quickly and find some information on that. And basically what it does is visualize risk outcomes for a feature or project. The great thing about risk-based testing is it can be implemented at any level. If you're at the ideation stages in the C-suite, then you can do a risk analysis at that level. And you can continue to do risk analysis all the way down to the individual contributor.
So this risk matrix, imagine it as a table that's got impact on the top and probability on the side. And with probability, there are basically five rankings. You know, I've seen all types of versions of this online, but you've got different probability outcomes. So something is 'rare', 'unlikely', 'likely', 'almost certain', or 'definitely certain'. And then with impact, you've got 'insignificant impact'. Would this matter at all? You've got 'insignificant', 'minor', 'moderate', 'major', and 'catastrophic'. Whatever nomenclature you want to use, it's totally fine. The idea is to decide on risk. Based on the probability and impact, you get a risk score.
The risk can be decided at the project level and then again at the user story level if you're using an Agile methodology. At the project level, you might say, Hey, this is a really risky project. And maybe it's got a risk outcome of significant or high. So we want to make sure that we have a good amount of testing on this manual and automated. Or it may come out that, Hey, this is something that we want to introduce to our customer service reps. And it might not have a lot of exposure. It's going to be good for them. It's going to be useful, save them a little bit of time, but not super high risk. And the amount of testing that those features would receive would be and should be vastly different. And that's the idea of risk risk-based testing.
John: Got it, got it. So what does that look like? You're starting at a new company. Do you see they've asked you to come in and help them apply this risk-based approach? I mean, this actually happened with you at Freshly, right? You came in and helped them with converting their tests into more of a risk-based approach, I guess. Could you describe the mechanics of that? How did that work for you? How did you get started?
Derek: Yeah, definitely. It's going to be different with every organization. You've gotta take into account the culture and the readiness to adopt risk-based testing. I was lucky in that I was kind of hired on as the "expert," so I had a lot of credibility upfront. People were just like, well, our test automation suite sucks, so it can't really get much worse! Whatever it's going to be, it's going to be better. And so I introduced this idea of risk-based testing to them. So the first step is to introduce the idea of risk-based testing and to start getting buy-in from the organization at large, to prove the efficacy of something like this. And you could say, well, we've got 10,000 manual test cases that takes our QA group 30 days to execute when we're doing a full regression, let's do an audit of that.
So the first step is to introduce the idea of risk-based testing, and then you start conducting an audit on existing feature tests and start sifting through them. And that's what we did as we were migrating from one automated testing suite into a newer automated testing suite. We reduced coverage on features that either are no longer used because having coverage over something that isn't used is always a bad idea. And there were some tests that were turning on a feature that was not used in production to test it and then turning it off again. So with things like that you have to be very detail-oriented and pay attention to those things as you see them. As we migrated features over, we would say, okay, well this is either not being used or not being used very much so we can get rid of most of these scenarios and just leave the happy path and maybe a couple of common negative case scenarios.
And then for features - like the "join now" flow as an example, or our funnel for new customers, that's high visibility. That's where we get money from people. We need to test that. And that needs to have as many scenarios as we can possibly envision. And that also happened to be missing a lot of coverage. So for us, a big priority was to start adding coverage to this because it is high risk. And that's kind of the approach that we took with Freshly.
The way that I've done it in the past is by implementing it into the Agile process, which you could do once you get buy-in from the organization, then it becomes partnering up with your Agile group. If you have an Agile coach or scrum masters and product owners and saying, "Hey, look, this is an easy way for us to identify how much testing needs to go into this feature and how much time needs to be spent on this feature as well. Because this could also be applied to development efforts." And so you analyze as part of your story grooming. You include this risk analysis score for each story. And then that guides the development and testing efforts. So that's another way to do it.
John: So I'm kind of visualizing, for example, like you're doing Kanban: as the cards move from dev to QA, if it has a high risk score, then you're spending more time on it. Is that what you're, you're talking about?
Derek: Yeah, exactly. You use that to guide the amount of effort that goes into the manual and automated testing, even the development of those features.
Al: So you mentioned lots and lots of tests being migrated. You do this audit process, what's the outcome? How much time are you saving? What did that look like?
Derek: Yeah, I'm really pleased to say that the outcome was spectacular. I mean, we went from a test suite that could take upwards of an hour to run - and inconsistently. It was incredibly flaky. So one of the advantages of risk based testing is you've got higher quality tests, usually. So you've got less high quality tests. So the test run time is always going to be shorter in those cases. And the quality of the test is going to be better. So when you do have a failure, it's a real failure and not just because of bad data or because the test was poorly written or the feature itself is messed up in some way.
So the outcome for us is that now our test suite runs in less than 15 minutes at up to 100% pass rate. And at its lowest 97% pass rate, with at least 500 tests.
Al: I like your point about flakiness. You have this opportunity to really focus on the right tests. You've reduced that number of tests based on risk, which means you can really focus on those tests, make sure they're not flaky, make sure that they're testing the right things, which should hopefully increase the confidence that developers have that running these tests is worthwhile. So I was wondering if you had received feedback or anything from the teams that you're working with about this new approach.
Derek: Yes, definitely. The feedback has been overwhelmingly positive. Before, the test suite would take upwards of an hour to run. And if you were unlucky enough for it to flake, which it did very often, you could be looking at days worth of clicking the button, rolling the dice, and hoping that it passed this time. So going from that to adopting this risk-based approach and also using a new framework that's been well-architected from the ground up is far more consistent. And even the language that the engineers use around the test suite has changed. Before when red was seen on a build, or when a failure was seen on a build, that was called a 'flake.' So the thought that it could be related to the code itself didn't even occur to them. It's like, that's a flake, that's a flake, that's a flake. And over the last year and a half, since we've been actively migrating, now that we're fully migrated, that language has changed. And now it's, that's a 'failure.' Why did this fail? What's the cause of this failure? So they have begun to trust this suite a lot more because it's far more stable.
Al: That's great to hear. It sounds like this kind of migration and introduction of this methodology is somewhat complete or is complete. I'm wondering what's next for you and your team? What's the next bottleneck for your team to go and resolve?
Derek: I feel like we've achieved becoming a high-performance team. The test automation folks are exceptional. The suite is very stable. It's very fast by most standards. I mean, a 15-minute test run is pretty nice. Now our are more focused on how can we improve the quality of the tests even more? How can we improve the determinism of our tests even more so that the idea that a flake exists just disappears entirely and that thought no longer crosses anyone's mind? That is definitely a failure; what did I do?
We're also looking at new approaches to test data management. How can we enable QA to adopt this risk-based approach, and how can we help give them higher quality test data so that they can take advantage of that? And looking at solutions like maybe creating a machine learning model that evaluates the production database and then clones that data based on the structure into a separate database, so engineering and test automation and QA are all using a similar dataset and we can all speak the same language about the scenarios that are being used.
So that's kind of the next direction: how do we become elite within our organization? And then the next step will be, how do we share that with the community? How do we give back to the community by creating open-source tools? One of our approaches to test data management is actually an open-source gem called stockpot. And that allows us to dynamically generate test data using factories and Ruby on Rails. So it's like a special controller, without getting too much into the details. It's a gem, you install it into your gem file and it opens up exposes a few HTTP routes that let you dynamically generate data based on your factories.
John: Got it. Awesome.
Al: Cool. I can't wait to have another conversation about those projects and so on once that work is done now
Derek: Likewise!
John: Well, this about wraps it up for our time together. Is there anything that you want to leave us with? Closing thoughts?
Derek: Software development is super cool. Test automation is excellent. Risk-based testing. You should be doing it. And if you're not, then you should be looking at Launchable to help you get there.
John: Haha. Love that! Thanks very much!
Derek: Yeah, absolutely! Awesome. Thanks, gentlemen! Have a great day.