The Trouble with Threat Modeling

The Trouble with Threat Modeling

Rate This
  • Comments 5
 

Adam Shostack here.

I said recently that I wanted to talk more about what I do. The core of what I do is help Microsoft’s product teams analyze the security of their designs by threat modeling.   So I’m very concerned about how well we threat model, and how to help folks I work with do it better.   I’d like to start that by talking about some of the things that make the design analysis process difficult, then what we’ve done to address those things.  As each team starts a new product cycle, they have to decide how much time to spend on the tasks that are involved in security.  There’s competition for the time and attention of various people within a product team.  Human nature is that if a  process is easy or rewarding, people will spend time on it.  If it’s not, they’ll do as little of it as they can get away with.  So the process evolves, because, unlike Dr No, we want to be aligned with what our product groups and customers want

There have been a lot of variants of things called “threat modeling processes” at Microsoft, and a lot more in the wide world.   People sometimes want to argue because they think Microsoft uses the term “threat modeling” differently than the rest of the world.  This is only a little accurate.  There is a community which uses questions like “what’s your threat model” to mean “which attackers are you trying to stop?”  Microsoft uses threat model to mean “which attacks are you trying to stop?”  There are other communities whose use is more like ours.  In this paragraph, I’m attempting to mitigate a denial of service threat, where prescriptivists try to drag us into a long discussion of how we’re using words.)   The processes I’m critiquing here are the versions of threat modeling that are presented in Writing Secure Code, Threat Modeling, and The Security Development Lifecycle books.

In this first post of a series on threat modeling, I’m going to talk a lot about problems we had in the past.  In the next posts, I’ll talk about what the process looks like today, and why we’ve made the changes we’ve made.   I want to be really clear that I’m not critiquing the people who have been threat modeling, or their work.  A lot of people have put a tremendous amount of work in, and gotten some good results.  There are all sorts of issues that our customers will never experience because of that work.  I am critiquing the processes,  saying we can do better, in places we are doing better, and I intend to ensure we continue to do better.

We ask feature teams to participate in threat modeling, rather than having a central team of security experts develop threat models.  There’s a large trade-off associated with this choice.  The benefit is that everyone thinks about security early.  The cost is that we have to be very prescriptive in how we advise people to approach the problem.  Some people are great at “think like an attacker,” but others have trouble.   Even for the people who are good at it, putting a  process in place is great for coverage, assurance and reproducibility.  But the experts don’t expose the cracks in a process in the same way as asking everyone to participate.

Getting Started 

The first problem with ‘the threat modeling process’ is that there are a lot of processes.   People, eager to threat model, had a number of TM processes to choose from, which led to confusion.  If you’re a security expert, you might be able to select the right process.  If you’re not, judging and analyzing the processes might be a lot like analyzing cancer treatments.  Drugs?  Radiation?  Surgery?  It’s scary, complex, and the wrong choice might lead to a lot of unnecessary pain.   You want expert advice, and you want the experts to agree.

Most of the threat modeling processes previously taught at Microsoft were long and complex, having as many as 11 steps.  That’s a lot of steps to remember.  There are steps which are much easier if you’re an expert who understands the process.  For example, ‘asset enumeration.’  Let’s say you’re threat modeling the GDI graphics library.  What are the assets that GDI owns?  A security expert might be able to answer the question, but anyone else will come to a screeching halt, and be unable to judge if they can skip this step and come back to it.  (I’ll come back to the effects of this in a later post.)

I wasn’t around when the processes were created, and I don’t think there’s a lot of value in digging deeply into precisely how it got where it is.  I believe the core issue is that people tried to bring proven techniques to a large audience, and didn’t catch some of the problems as the audience changed from experts to novices.

The final problem people ran into as they tried to get started was an overload of jargon, and terms imported from security.  We toss around terms like repudiation as if everyone should know what it means, and sometimes implied they’re stupid if they don’t.  (Repudiation is claiming that you didn’t do something.  For example, “I didn’t write that email!,” “I don’t know what got into me last night!”  You can repudiate something you really did, and you can repudiate something you didn’t do.)  Using jargon sent several unfortunate messages:

  1. This is a process for experts only
  2. You’re not an expert
  3. You can tune out now
  4. We don't really expect you to do this well

Of course, that wasn’t the intent, but it often was the effect.

The Disconnected Process

Another set of problems is that threat modeling can feel disconnected from the development process.  The extreme programming folks are fond of only doing what they need to do to ship, and Microsoft shipped code without threat models for a long time.  The further something is from the process of building code, the less likely it is to be complete and up to date.  That problem was made worse because there weren’t a lot of people who would say “let me see the threat model for that.”   So there wasn’t a lot of pressure to keep threat models up to date, even if teams had done a good job up front with them.  There may be more pressure with other specs which are used by a broader set of people during development.

Validation

Once a team had started threat modeling, they had trouble knowing if they were doing a good job.  Had they done enough?  Was their threat model a good representation of the work they had done, or were planning to do?  When we asked people to draw diagrams, we didn’t tell them when they could stop, or what details didn’t matter.  When we asked them to brainstorm about threats, we didn’t guide them as to how many they should find.  When they found threats, what were they supposed to do about them?  This was easier when there was an expert in the room to provide advice on how to mitigate the threat effectively.   How should they track them?   Threats aren’t quite bugs—you can never remove a threat, only mitigate it.  So perhaps it didn’t make sense to track them like that, but that left threats in a limbo.

"Return on Investment"

  The time invested often didn’t seem like it was paying off.  Sometimes it really didn’t pay off.    (David LeBlanc makes this point forcefully in “Threat Modeling the Bold Button is Boring”) Sometimes it just felt that way—Larry Osterman made that point, unintentionally in “Threat Modeling Again, Presenting the PlaySound Threat Model,” where he said “Let's look at a slightly more interesting case where threat modeling exposes an issue.”  Youch!  But as I wrote in a comment on that post, “What you've been doing here is walking through a lot of possibilities.  Some of those turn out to be uninteresting, and we learn something.  Others (as we've discussed in email) were pretty clearly uninteresting”  It can be important to walk through those possibilities so we know they’re uninteresting.  Of course, we’d like to reduce the time it takes to look at each uninteresting issue.

Other Problems

Larry Osterman lays out some other reasons threat modeling is hard in a blog post: http://blogs.msdn.com/larryosterman/archive/2007/08/30/threat-modeling-once-again.aspx
 

One thing that was realized very early on is that our early efforts at threat modeling were quite ad-hoc.  We sat in a room and said "Hmm, what might the bad guys do to attack our product?" It turns out that this isn't actually a BAD way of going about threat modeling, and if that's all you do, you're way better off than you were if you'd done nothing. 

Why doesn't it work?  There are a couple of reasons:

It takes a special mindset to think like a bad guy.  Not everyone can switch into that mindset.  For instance, I can't think of the number of times I had to tell developers on my team "It doesn't matter that you've checked the value on the client, you still need to check it on the server because the client that's talking to your server might not be your code.".

Developers tend to think in terms of what a customer needs.  But many times, the things that make things really cool for a customer provide a superhighway for the bad guy to attack your code. 

It's ad-hoc.  Microsoft asks every single developer and program manager to threat model (because they're the ones who know what the code is doing).  Unfortunately that means that they're not experts on threat modeling. Providing structure helps avoid mistakes.

With all these problems, we still threat model, because it pays dividends.  In the next posts, I’ll talk about what we’ve done to improve things, what the process looks like now, and perhaps a bit about what it might look like either in the future, or adopted by other organizations.

Comments
  • PingBack from http://www.artofbam.com/wordpress/?p=3149

  • At Microsoft, we have been using various forms of threat modeling for years now, and we're always learning

  • I found Threat Modeling (TM) to be puzzling and confusing at first.  What helped in understanding TM was realizing that it is built for some specific conditions at MS.  I'm posting an explanation, should it be of help, and also, to get feedback.  I’m investigating SDL and TM for use in an open-source system being developed at North Carolina State University.

    TM focuses on finding security problems in an existing system, or perhaps in a new system whose design has largely been worked-out.  In the MS books on SDL and TM, it seems TM is the central component in the SDL.  For instance, TM is the largest section in the SDL book, and there is an entire TM book.  However, it seems security design should be a more central and prominent component of an SDL than TM.  Ideally, security should be correctly put in the system when it is designed.  TM is largely a form of testing, used to find security problems in the design.  SDL does have a security-design step.  It is SDL’s third step, and TM is the fourth.  I found it puzzling and confusing for TM to have such a central role in security development, and that it is given more attention than security design.

    Another thing that was confusing about TM is the way it overlaps with SDL’s security-design step:  the security-design step involves TM activities, and TM involves security-design activities.  For instance, during the security-design step, attacks should be considered.  However, considering attacks is what the TM process does.  Further, the TM process involves "mitigation of threats".  However, this mitigation of threats is really security design.  I found it puzzling and confusing for TM and security-design activities to be mixed between the two steps.  Why not have a single security-design step, and it would compose a design based on assets to be protected, system vulnerabilities, potential attacks (e.g., STRIDE), and available defensive countermeasures (e.g., that provide CIA)?

    What helped in making sense of SDL’s security-design and TM  was considering MS’s environment.  This explanation is speculative, but it seems plausible:

    1)  In general, MS’s developers are not security experts.  Apparently, during the system design stage, MS developers do the best job they can on security design.  Having a separate TM stage provides a way for the security department’s experts to review the security design and find problems.  If MS had a single security-design step, then all the system designers would need to have security expertise, and this is not feasible.

    In general, it’s not practical to expect that most developers be security experts.  Security engineering is a unique skill, just as technical writing is a unique skill.  It’s well known that programmers typically dislike technical writing and many lack writing aptitude or skill.  Similarly, security engineering requires a certain savvy, and not all developers have aptitude for security engineering.  Further, acquiring security skills takes time and experience.

    2)  SDL grew out of MS’s efforts to reduce the high incidence of security bugs in its products.  TM is largely a form of testing, used to find security problems in the design.  SDL’s focus on TM may have been motivated by MS’s goal of dramatically reducing security bugs in the short term.

    Another reason for separate security-design and TM steps could be complexity.  Security design is complex, and although it involves consideration of attacks, it could be helpful to have an additional step that focuses exclusively on attacks.

    In summary, I found it puzzling and confusing for TM to be the central step in SDL.  Also, it was puzzling why SDL’s security-design and TM steps overlapped.  It would seem best to have a single security-design step and for it to be central in the SDL.  I found SDL and TM made much more sense when we considered them in the context of Microsoft’s development and security teams and Microsoft’s security objectives.

    Jim Yuill

Page 1 of 1 (5 items)
Leave a Comment
  • Please add 1 and 8 and type the answer here:
  • Post