MTBF As a Security Metric

Editor’s Note: Per Nicole Forsgren, by way of Kelly Shortridge, MTBF may result in perverse incentives that optimize minimizing time between failure by focusing entirely on preventing failures, an ultimately futile exercise. I believe that every metric encourages, to some degree, the gamification of those metrics in order to take the path of least resistance towards “progress.” I leave the following blog post in tact, just read it with care and consideration around how this metric could potentially be gamified.

For those of you who are not my close friends and colleagues, mind readers, private investigators, stalkers, or giant advertising companies, it may behoove you to understand that I spent my first years out of school working in the quality and reliability realm. This alternate universe is one of spreadsheets, databases, graphs about graphs, and bathtub curves. I particularly worked on solid state drives, which further entailed hundreds of thousands of dollars of extremely specialized equipment and a gratuitous overuse of “SMART”, which is often anything but.

One metric that I found particularly interesting while working in this space was that of the Mean Time Between Failures, or MTBF for those in a rush who can’t afford those extra two syllables. One might think to one’s self that surely this time can be no more mean than any other time, even if this time is sandwiched between two failures. To those I would simply respond “Quite right, you are, dear reader.” I’ve consulted an esteemed colleague, the Honorable Wik Ipedia (pronounced wick ippida, not to be confused with the free online encyclopedia Wikipedia) to bring you the definition, as follows:

Mean Time Between Failures is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation.

As you can see here, this time is no more or less mean than any other time, but rather an arithmetic average. Unfortunately in a world short for syllables, AABTF just simply will not suffice.

“But dade,” you may be saying, “this is clearly talking about mechanical or electronic systems.” To which I, again, would simply respond “Quite right, you are, dear reader.” But alas we have been given a gift, the very same gift that allows us to speak of war rooms, weaponization, kill chains, and other silly things. We can take this information from one realm of expertise and we can apply it to another, in a hope to gain new insights. So let us apply it to software, or “information systems” if you have a business degree.

In security, we see a constant need to invent new metrics by which we’ll measure ourselves, as is evidenced by this totally real and not at all made up quote from the [current year] Relentless Security Advertising conference.

“Oh you only have mean time to remediation? You don’t even have Nice Time To Remediation? n00b”

We don’t know if the metrics we collect are accurate. We don’t know if the metrics indicate good things or bad things. We mostly just don’t know anything. But we’ve collected it and created graphs of it and now we must show the world our art, which I have helpfully created a single chart for you to reflect all current and future metrics.

dadeco super scientific metrics metric

But alas, let us return to meaner times. Aforementioned metrics such as Mean Time To Remediate (MTTR) are examples of relatively good metrics. Certainly more meaningful than NOBF (Number of Bugs Filed) or AOTSFOJTTDAPTTHTYTTR (amount of time spent filling out JIRA tickets that don’t actually pertain to the thing that you’re trying to report). Measuring the time it takes from when a vulnerability or incident is detected to the moment is remediated provides a rough way to determine process efficiency. Then we can break that down even further, tracking the Mean Time To First Response (MTTFR), the Mean Time To Ticket (MTTT), and the Mean Time To First Curse (MTTFC), allowing us to get a better understanding of where the pain points are in our process - be it technical or political.

These metrics are, by and large, very inward focusing for security organizations. As if the problems will go away as long as we can efficiently smash that “Resolve” button. A metric that I’ve not seen, in my limited experience as a (RFC)1918 detective, is one which tracks a vulnerability back in time to its conception. You see, when a software engineer and a Certified Corporate Agility Training Coach Scrum Certification Guru Jedi Master hype each other up very much, vulnerabilities are sometimes created where there was once nothing. It gets created, spends 1-9 days in an incubation period known amongst experts as “code review”, and then it’s ready to venture out into the big scary world, all by its lonesome. It’s a scary world out there for a vulnerability, these days. Not because they are being hunted to extinction, but rather because they can go unnoticed and unnamed for months, years, and even decades.

If we start analyzing and aggregating information about the mean time that a vulnerable section of code was deployed (whether that be the initial time it was committed to the code base, or yolo swag 360 no scope yeeted onto prod) and that vulnerable section of code was removed or patched, we can start to uncover patterns in engineering organizations. Instead of all metrics being focused on the reactive nature of security, we could start to track metrics that allow us to make proactive decisions and reduce the number of bugs shipped. I’ve heard rumors that some call this merging of developers, security, and operations “DevSecOps”, however there is only one thing I know for certain: Knowing the true name of something, does not, in fact, give you power over that thing. I mean, honestly, what was Christopher Paolini even thinking?

If you go hunting for bugs and you find one buried body, you were probably looking for the wrong type of bug.

If we find one vulnerability, it could be one person’s mistake. If we find a series of vulnerabilities, it is more likely a organizational mistake. The level of interest in a particular developer’s output scales with the frequency of vulnerabiltiies in that output. Notice that I said frequency, not number. Finding ten vulnerabilities in ten million lines of code, eh these things happen. Finding ten vulnerabilities in ten lines of code? We’re going to have to stay after school and write (statistically vulnerable) lines on the chalkboard.

This isn’t to say that security teams should be out there making enemies with developers - quite the contrary. If we’re a security organization and our developers are shipping a lot of vulnerabilities, it is us who has failed them, not the other way around. Perhaps we need to be investing more in security architecture meetings with development teams, or maybe we need to be allocating additional time to code review. Perhaps we need to consider whether or not we’re using the right tools for the job in the first place.

Code being shipped should be tracking the Mean Time Between Failures - be it reliability, usability, vulnerability, or any otherability. In the early days, this MTBF may be small, but as the product matures, so too should the MTBF. If we’re shipping a ten year old product and we’re finding a dozen high or critical vulnerabilities every month, we have failed not only ourselves, but our customers as well. Imagine if the hardware you’re running had an AFR (annualized failure rate, not to be confused with AFI, the band that performed Miss Murder in 2006) remotely similar to that of the software you’re running. Well… I guess if we started considering security failures as failures instead of magical money juice, then maybe you don’t have to imagine after all, nudge nudge wink wink.

I must go now, for there is a task force forming beneath my window, of which I must be a part. Please, I implore you: consider the happy times, consider the Mean Times.