A topic of refrain in the red team community, particularly in the internal red team community of which I am a part, is that of red team success. What makes a red team successful? How do we signal that success to leadership? What metrics should we be producing?

Whether you’re on a red team, you work closely with one, or you’ve ever read a red team report, think about the last time you saw a report that didn’t have any interesting findings. When was the last time the red team didn’t “win”?

Red team got to their objective? That’s a win.

Red team got detected in the first day and didn’t reach their objective? That’s also a win.

It seems like the only way to not win as a red team is to not play the game. If you get your objective, you have a detailed attack path to share with the organization and can make a series of recommendations around risk mitigation and methods to further harden that particular attack path. That’s a win for the organization. If you don’t get to your objective, then you’ve successfully demonstrated the detection and response capabilities of your organization and shown what is working well. That’s also a win for the organization.

I am reminded of a school yard trick, where someone wants to solve a problem by the flip of a coin and then says “Heads I win, tails you lose.”

So how, then, does the red team determine their success?

Metrics as Signals

Before we talk about specific approaches towards measuring our success and their corresponding pros and cons, it’s important to understand what a metric is and what it isn’t. At it’s most basic, we should consider the metrics we produce to be a signal to ourselves and our management about whether or not we are successful.

Note the order I used there, as I believe it to be important. “signal to ourselves and our management.” If we produce metrics of any sort purely because management requested it of us, it won’t take long for us to realize that management has no idea what they want from us and they are just taking wild guesses as much as we are. Our managers are told “You need to show us metrics about the improvement and/or impact of your team.” So our managers, who are often not involved in the daily experiences of the team, come up with a few things they know they can measure based on our previous work. Number of engagements we completed. Number of bugs we found. Average time it took to complete our objective.

If our management gives us a set of metrics they want us to produce, then what they are telling us is that these particular things are what we need to be considered successful. Even if they tell us that’s not the case, they will report the numbers up, and that’s when they no longer get to control the assumed context of the metric. Management-provided metric requirements signal to the team what the organization considers important.

Maybe you and your team are comfortable with these numbers being your metrics. Maybe you’re able to assign meaning to these numbers that you all agree says “Yup we are successful because our number of engagements went up.” Meanwhile, you allow yourself to ignore that the size, scope, and impact of your engagements may have gone down as a result.

Are these the types of things that you want to be your signals of success? Remember that if they are your success signals, they are also your failure signals. If the number going up means you’re successful, then the number going down means you’re unsuccessful.

If you look introspectively at your team and have conversations to determine what the team feels makes them successful or unsuccessful, can you turn those thoughts into data you can capture? Can you take these new signals to your management and say “These are the things that we believe are important to our team’s success”? Instead of relying on management to signal to you what the organization thinks is important, you can signal to management, and in turn to the organization, about what you think is important about your job. Is there anyone more qualified to determine what is important about your work than you?

The Quantitative Metric

I recently left a red team role at a very large organization. That organization, like many others, developed an obsession with ensuring that every team was delivering quantitative metrics to show their value, to signal their successes. Number of bugs found this quarter. Number of engagements completed. Number of new detections written. Number of detections exercised. Number of this, Number of that. I hate hate hate “Number of” metrics as applied to the security domain. This pervasive belief by engineers and management alike that the only way to show value is by “objective” metrics that measure some quantity of something done.

Let’s take “Number of bugs found this quarter” as an example metric to explore. Before reading on, consider what this metric signals to you. A team tells you that the number of bugs they found is up 18% quarter-over-quarter. Wow, much bug very hacker wow.

You know what that number doesn’t tell us? Did the team find more bugs because they got better at finding bugs, or did they find more bugs because the organization created more bugs that quarter? Did the organization create more bugs that quarter because the organization is rapidly growing? Did the team find more bugs because they are rapidly growing? Is the number of bugs found quarter-over-quarter a positive signal that the security team is doing well, or a negative signal that the developers are writing more bugs?

It’s probably a mix of both, and simply saying “18% more bugs quarter-over-quarter” lacks any meaningful level of context. It signals good things or bad things, entirely dependent on who reads it and what their state of mind is at the time they read it.

Numbers lie. That doesn’t mean numbers can’t be useful. But it means that reporting numbers without context and without intention is likely not doing you or your management any favors. Consider the following context that can be added to our example metric above.

  • Average number of bugs found per service assessed
  • Number of services assessed vs number of services released
  • Average number of bugs found year-over-year in a particular service

What these additions do is provide just enough context to the numbers that it becomes more clear which direction these metrics should be trending to be considered successful. We want average number of bugs per service to go down. We want number of services assessed vs number of services released to go up. We want bugs in a particular service to be going down year-over-year (or whatever your chosen measurement period is). It doesn’t matter if we’re in the security organization or in the development organization, these metrics have a clear intent that we can agree on.

But these are also all terrible for measuring red team success. Red teams aren’t compliance functions, they exist outside of the SDLC, they aren’t likely to hit every service in an organization every year, maybe even never.

The Time-based Metric

Another common set of metrics that are discussed in the security domain are “Time-to-” metrics. Time to detect. Time to respond. Time to remediate. I even previously wrote about applying MTBF to the security domain. These are also great on the surface, but carry with them certain inherent dangers.

Time-based metrics for defensive teams are great. Obviously we want our time to detect an intrusion to go down. Obviously we want our time to remediate a problem to go down. Obviously. But as soon as the success of the team is gauged based on these time based metrics, it creates a perverse incentive for the defenders to take shortcuts in order to keep the time down. Of course teams will be guided to not do this, but I think it’s pretty normal for people to want to optimize for the things they know they will be graded on. We’ve had it baked into us our whole lives, ever since elementary school.

Time-based metrics for offensive teams are challenging. Not only because offensive teams want their organization to get stronger, and therefore win whether they get to their objective faster or slower than previously, but also because the very broad nature of the offensive team means that they will often be taking varied routes to their objective. If a team runs 12 engagements in a year, you could have 12 wildly different time-to-objective metrics, not trending in any particular direction. This is particularly true in larger organizations where the attack surface is probably growing significantly faster than your team can conduct engagements. If the time to objective goes up, does it mean that the security got better or just that we picked a bad route on the attack graph?

Adding context to time-based metrics is much harder, I think. And since I think that context is vital for any good metric, I don’t know that I would recommend time-based metrics for offensive teams looking for ways to signal their success. Do you have ways to contextualize time-based metrics, reduce the perverse incentives, or otherwise make time-based metrics more valuable? Let’s chat!

The Qualitative Metric

There I was, day dreaming of tables and charts and graphs and such, when I began to see stars. No, I didn’t trip and fall face first into a telescope. Instead, I had just ordered food for delivery and the app was reminding me to rate the quality of my food and my delivery. The proverbial light bulb in my head started flickering on, slowly illuminating the piles of TPS reports that surrounded me. I still don’t know what a TPS report is, I think to myself.

Businesses very routinely rely on qualitative metrics to indicate success. What’s our rating on yelp? What’s the rating of our latest product on Amazon? What are the results of our latest quarterly email blast asking customers for a little more of their time and subtly reminding them to give us a little more of their money? These are numbers, and businesses love numbers, but they reflect subjectivity. They reflect the emotions and the opinions of the customers. They reflect the experiences of the people that are most important to your business – your customers.

Businesses even routinely rely on qualitative metrics from employees in order to determine who gets raises, who gets promoted, who gets put on improvement plans, etc. We are asked to provide our subjective feedback about working with those around us, and that feedback is accumulated, aggregated, and turned into that 1.8% cost of living adjustment you got this year.

So why does security obsess over quantitative metrics? Well, I think when you hire engineers to solve problems and you promote engineers to manage engineers, you end up with engineer’s bias integrated into the business processes. You end up with the people who believe they can create algorithms without bias trying to also objectively measure organizational health and success. But we can do more.

If quantitative metrics seek to measure performance in terms of output, I believe qualitative metrics seek to measure performance in terms of trust. Quantitative metrics help us to measure the tangible outputs of our work, and qualitative metrics help us to measure the intangible outputs of our work.

It’s easy to think of qualitative data in the form of impact statements and testimonials, and while those are both useful, they are not metrics and we cannot measure them. Instead, qualitative metrics should be collected in the form of structured ratings. For a red team, some of those qualitative metrics might look like so:

  • On a scale of 1 to 10, with 10 being the most positive, how was your experience interacting with the red team during this engagement?
  • On a scale of 1 to 10, with 10 being the highest impact, how impactful to the broader organizational security do you feel this engagement was?
  • On a scale of 1 to 10, with 1 representing too little and 10 representing too much, how satisfied were you with the level of communication with the red team during this engagement?
  • On a scale of 1 to 10, with 10 representing the most confident, how confident were you in the security of the organization prior to this engagement?
  • On a scale of 1 to 10, with 10 representing the most confident, how confident are you in the security of the organization after this engagement?

These are just a few examples of methods to collect data after an engagement. You’ll notice that they focus on how the stakeholders and participants feel about the work that was done. You’ll also probably notice that these may not translate well into metrics to report up the chain. But what it does reflect is the relationships and customer satisfaction between the red team and the people they work with. Impact on the organization is difficult to measure, but collecting feedback on other people’s thoughts about your impact on the organization is much easier.

This leads us to the titular section.

The Purpose Driven Red Team

I may be biased, having been on a red team until very recently, but I believe that the role of the red team is extremely important to a healthy, resilient organization. I don’t believe that the value of the red team role can be distilled down into the number of engagements run, the number of risks identified, the number of bugs found, etc. I believe the value that the red team brings to an organization is largely organic and difficult to measure in terms of the red team in isolation.

When asked “What is the purpose of the red team?”, you’ll likely get as many unique answers as people you ask. It’s a common joke amongst the red team community. I think that most people will agree that in the broadest possible terms, the red team’s purpose is to improve the organization’s understanding of their security and their risks. Note that this is different than “the red team’s purpose is to improve the organization’s security.” While we may inadvertently do the latter, it is by and large a byproduct of our work and is not a reliable indication of success for a red team. The red team generates work for other teams, by which those other teams improve the security of the organization.

I’ve talked a lot about metrics of various types in order to get to this point. The reason for this is that I want to challenge red teams to reconsider how they are measuring their success, and to think about the pros and cons of whichever mechanisms they are currently measuring by. I’ve also talked about the difference between management-provided metrics, signaling to the red team what the organization believes to be important, and red team-provided metrics, signaling to the organization what the red team believes to be important. It’s this last thing that I want to really drive home.

If you do not work on a red team, and in many cases even if you do work on a red team, you are not likely to understand the exact value that the red team is bringing to your organization. You probably know that there is value, and you probably have some vague idea of whether or not the team is currently successful, but you may not be able to articulate exactly what that means. Maybe you see that the numbers are trending up and to the right in the latest all hands meeting and think “Wow, great success!” Or maybe you feel successful because the last engagement you were on was successful.

But the red team has its hands in a lot of different pies. Is the relationship with the blue team healthy and successful? Is the red team helping the blue team improve their technology or process? Is the relationship with service teams healthy or do service teams hate interacting with the red team? Do the people who get read into the reports at the end of an engagement have a firm understanding of the risks identified? Do the individual team members feel like they are contributing to the success of the team or are they feeling isolated or less impactful? Are the risks being reported actually being acted upon with the right level of resourcing or are they falling by the wayside after reporting?

I’m not on a red team anymore. But if I were, I would really drive towards metrics that help us, as a team, determine whether or not we feel successful. That means, before we even start thinking about collecting metrics, the team needs to sit down and determine what our purpose is, and then what types of work activities contribute to that purpose. Maybe that purpose is to really help level up the blue team and work closely with the blue team to improve detection fidelity, improve coverage, improve confidence. Maybe that purpose is to bring light to the biggest risks in the organization and get resources dedicated to managing those risks. Maybe that purpose is to be the most elite hacking squad, imitating top tier threat actors in order to tell a story of what organizational compromise looks like. All of these are admirable purposes for a team, but they also have very little overlap with one another and success in one does not necessarily contribute to success in the others.

Once the team is clear on purpose, it’s time to put together the right combination of metrics that reflect this purpose. This is going to be a long process and it’s going to involve metrics that don’t work, metrics that aren’t accurate, metrics that get in the way of the purpose, and many other failings. Determining which metrics to use isn’t going to be easy, either - they need to be relevant, they need context, and they need to have a particular goal, whether that’s trend up, trend down, or trend in a specific range. I would talk with my management about whether or not they have reasons that the team agrees with for the particular metrics they may be asking for, and then talk about what metrics the team feels are relevant to their success. Whether the management wants to accept those new red team-provided metrics or not is irrelevant. We’re not collecting metrics for management, we are collecting them for our own improvement. We are collecting them as a means of getting feedback on the progress towards our purpose.

The purpose driven red team determines its success based on improvement in pursuit of the purpose. Whether that purpose is to produce a certain number of engagements or bugs per quarter, or something deeper and more difficult to capture, I encourage you to pursue the purpose before you pursue the metrics.