September, 2007

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Threat Modeling Again, Threat Modeling Rules of Thumb

    • 12 Comments

    I wrote this piece up for our group as we entered the most recent round of threat models.  I've cleaned it up a bit (removing some Microsoft-specific stuff), and there's stuff that's been talked about before, but the rest of the document is pretty relevant. 

     

    ---------------------------------------

    As you go about filling in the threat model threat list, it’s important to consider the consequences of entering threats and mitigations.  While it can be easy to find threats, it is important to realize that all threats have real-world consequences for the development team.

    At the end of the day, this process is about ensuring that our customer’s machines aren’t compromised. When we’re deciding which threats need mitigation, we concentrate our efforts on those where the attacker can cause real damage.

     

    When we’re threat modeling, we should ensure that we’ve identified as many of the potential threats as possible (even if you think they’re trivial). At a minimum, the threats we list that we chose to ignore will remain in the document to provide guidance for the future. 

     

    Remember that the feature team can always decide that we’re ok with accepting the risk of a particular threat (subject to the SDL security review process). But we want to make sure that we mitigate the right issues.

    To help you guide your thinking about what kinds of threats deserve mitigation, here are some rules of thumb that you can use while performing your threat modeling.

    1. If the data hasn’t crossed a trust boundary, you don’t really care about it.

    2. If the threat requires that the attacker is ALREADY running code on the client at your privilege level, you don’t really care about it.

    3. If your code runs with any elevated privileges (even if your code runs in a restricted svchost instance) you need to be concerned.

    4. If your code invalidates assumptions made by other entities, you need to be concerned.

    5. If your code listens on the network, you need to be concerned.

    6. If your code retrieves information from the internet, you need to be concerned.

    7. If your code deals with data that came from a file, you need to be concerned (these last two are the inverses of rule #1).

    8. If your code is marked as safe for scripting or safe for initialization, you need to be REALLY concerned.

     

    Let’s take each of these in turn, because there are some subtle distinctions that need to be called out.

    If the data hasn’t crossed a trust boundary, you don’t really care about it.

    For example, consider the case where a hostile application passes bogus parameters into our API. In that case, the hostile application lives within the same trust boundary as the application, so you can simply certify the threat. The same thing applies to window messages that you receive. In general, it’s not useful to enumerate threats within a trust boundary. [Editors Note: Yesterday, David LeBlanc wrote an article about this very issue - I 100% agree with what he says there.] 

    But there’s a caveat (of course there’s a caveat, there’s ALWAYS a caveat). Just because your threat model diagram doesn't have a trust boundary on it, it doesn't mean that the data being validated hasn't crossed a trust boundary on the way to your code.

    Consider the case of an application that takes a file name from the network and passes that filename into your API. And further consider the case where your API has an input validation bug that causes a buffer overflow. In that case, it’s YOUR responsibility to fix the buffer overflow – an attacker can use the innocent application to exploit your code. Before you dismiss this issue as being unlikely, consider CVE-2007-3670. The Firefox web browser allows the user to execute scripts passed in on the command line, and registered a URI handler named “firefoxurl” with the OS with the start action being “firefox.exe %1” (this is a simplification). The attacker simply included a “firefoxurl:<javascript>” in a URL and was able to successfully take ownership of the client machine. In this case, the firefox browser assumed that there was no trust boundary between firefox.exe and the invoker, but it didn’t realize that it introduced such a trust boundary when it created the “firefoxurl” URI handler.

    If the threat requires that the attacker is ALREADY running code on the client at your privilege level, you don’t really care about it.

    For example, consider the case where a hostile application writes values into a registry key that’s read by your component. Writing those keys requires that there be some application currently running code on the client, which requires that the bad guy first be able to get code to run on the client box.

    While the threats associated with this are real, it’s not that big a problem and you can probably state that you aren’t concerned by those threats because they require that the bad guy run code on the box (see Immutable Law #1: “If a bad guy can persuade you to run his program on your computer, it’s not your computer anymore”).

    Please note that this item has a HUGE caveat: it ONLY applies if the attacker’s code is running at the same privilege level as your code. If that’s not the case, you have the next rule of thumb:

    If your code runs with any elevated privileges, you need to be concerned.

    We DO care about threats that cross privilege boundaries. That means that any data communication between an application and a service (which could be an RPC, it could be a registry value, it could be a shared memory region) must be included in the threat model.

    Even if you’re running in a low privilege service account, you still may be attacked – one of the privileges that all services get is the SE_IMPERSONATE_NAME privilege. This is actually one of the more dangerous privileges on the system because it can allow a patient attacker to take over the entire box. Ken “Skywing” Johnson wrote about this in a couple of posts on his blog (1 and 2) on his excellent blog Nynaeve. David LeBlanc has a subtly different take on this issue (see here), but the reality is that both David and Ken agree more than they disagree on this issue. If your code runs as a service, you MUST assume that you’re running with elevated privileges. This applies to all data read – rule #2 (requiring an attacker to run code) does not apply when you cross privilege levels, because the attacker could be writing code under a low privilege account to enable an elevation of privilege attack.

    In addition, if your component has a use scenario that involves running the component elevated, you also need to consider that in your threat modeling.

    If your code invalidates assumptions made by other entities, you need to be concerned

    The reason that the firefoxurl problem listed above was such a big deal was that the firefoxurl handler invalidated some of the assumptions made by the other components of Firefox. When the Firefox team threat modeled firefox, they made the assumption that Firefox would only be invoked in the context of the user.  As such it was totally reasonable to add support for executing scripts passed in on the command line (see rule of thumb #1).  However, when they threat modeled the firefoxurl: URI handler implementation, they didn’t consider that they had now introduced a trust boundary between the invoker of Firefox and the Firefox executable.  

    So you need to be aware of the assumptions of all of your related components and ensure that you’re not changing those assumptions. If you are, you need to ensure that your change doesn’t introduce issues.

    If your code retrieves information from the internet, you need to be concerned

    The internet is a totally untrusted resource (no duh). But this has profound consequences when threat modeling. All data received from the Internet MUST be treated as totally untrusted and must be subject to strict validation.

    If your code deals with data that came from a file, then you need to be concerned.

    In the previous section, I talked about data received over the internet. Microsoft has issued several bulletins this year that required an attacker tricking a user into downloading a specially crafted file over the internet; as a consequence, ANY file data must be treated as potentially malicious. For example, MS07-047 (a vulnerability in WMP) required that the attacker force the user to view a specially crafted WMP skin. The consequence of this is that that ANY file parsed by our code MUST be treated as coming from a lower level of trust.

    Every single file parser MUST treat its input as totally untrusted –MS07-047 is only one example of an MSRC vulnerability, there have been others. Any code that reads data from a file MUST validate the contents. It also means that we need to work to ensure that we have fuzzing in place to validate our mitigations.

    And the problem goes beyond file parsers directly. Any data that can possibly be read from a file cannot be trusted. <A senior developer in our division> brings up the example of a codec as a perfect example. The file parser parses the container and determines that the container isn't corrupted. It then extracts the format information and finds the appropriate codec for that format. The parser then loads the codec and hands the format information and file data to the codec.

    The only thing that the codec knows is that the format information that’s been passed in is valid. That’s it. Beyond the fact that the format information is of an appropriate size and has a verifiable type, the codec can make no assumptions about the contents of the format information, and it can make no assumptions about the file data. Even though the codec doesn’t explicitly parse the file, it’s still dealing with untrusted data read from the file.

    If your code is marked as “Safe For Scripting” or “Safe for Initialization”, you need to be REALLY concerned.

    If your code is marked as “Safe For Scripting” (or if your code can be invoked from a control that is marked as Safe For Scripting), it means that your code can be executed in the context of a web browser, and that in turn means that the bad guys are going to go after your code. There have been way too many MSRC bulletins about issues with ActiveX controls.

    Please note that some of the issues with ActiveX controls can be quite subtle. For instance, in MS02-032 we had to issue an MSRC fix because one of the APIs exposed by the WMP OCX returned a different error code if a path passed into the API was a file or if it was a directory – that constituted an Information Disclosure vulnerability and an attacker could use it to map out the contents of the users hard disk.

    In conclusion

    Vista raised the security bar for attackers significantly. As Vista adoption spreads, attackers will be forced to find new ways to exploit our code. That means that it’s more and more important to ensure that we do a good job ensuring that they have as few opportunities as possible to make life difficult for our customers.  The threat modeling process helps us understand the risks associated with our features and understand where we need to look for potential issues.

  • Larry Osterman's WebLog

    Threat Modeling Again, STRIDE

    • 9 Comments

    As has been mentioned elsewhere, when we're threat modeling at Microsoft we classify threats using the acronym STRIDE. 

    STRIDE stands for "Spoofing", "Tampering", "Repudiation", "Information disclosure", "Denial of service", and "Elevation of privilege".

    Essentially the idea is that you can classify all your threats according to one of the 6 STRIDE categories.  Since each category has a specific set of potential mitigations, once you've analyzed the threats and categorized them, you should know how to mitigate them.

    A caveat: as David points out in his "Dreadful" post, STRIDE is not a rigorous classification mechanism - there's a ton of overlap between the various categories (a successful Elevation of Privilege attack could result in Tampering of data, for instance).  But it doesn't change the fact that it's an extremely useful mechanism for analyzing threats to a system.

    So what are each of the STRIDE categories?

    Spoofing

    A spoofing attack occurs when an attacker pretends to be someone they're not.  So an attacker using DNS hijacking and pretending to be www.microsoft.com would be an example of a "spoofing" attack.  Spoofing attacks can happen locally.  For instance, as I mentioned in "Reapplying the decal" one mechanism that the Decal plugin framework  injects itself into the Asheron's Call process is to spoof one of the COM objects that Asheron's Call uses.

    Tampering

    Tampering attacks occur when the attacker modifies data in transit.  An attacker that modified a TCP stream by predicting the sequence numbers would be tampering with that data flows.  Obviously data stores can be tampered with - that's what happens when the attacker writes specially crafted data into a file to exploit a vulnerability. 

    Repudiation

    Repudiation occurs when someone performs an action and then claims that they didn't actually do it.  Primarily this shows up on operations like credit card transactions - a user purchases something and then claims that they didn't do it.  Another way that this shows up is in email - if I receive an email from you, you can claim that you never sent it.

    Information disclosure

    Information Disclosure threats are usually quite straightforward - can the attacker view data that they're not supposed to view?  So if you're transferring data from one computer to another, if the attacker can sniff the data on the wire, then your component is subject to an information disclosure threat.  Data Stores are also subject to information disclosure threats - if an unauthorized person can read the contents of the file, it's an information disclosure.

    Denial of service

    Denial of service threats occur when an attacker can degrade or deny service to users.  So if an attacker can crash your component or redirect packets into a black hole, or consume all the CPU on the box, you have a Denial of service situation.

    Elevation of privilege

    Finally, there's Elevation of privilege.  An elevation of privilege threat occurs when an attacker has the ability to gain privileges that they'd not normally have.  One of the reasons that classic buffer overflows are so important is that they often allow an attacker to raise their privilege level - for instance, a buffer overflow in any internet facing component allows an attacker to elevate their privilege level from anonymous  to the local user (or whatever account is hosting the vulnerable component). 

     

    Please note, these are only rough classifications of threats (not vulnerabilities).  And many of them aren't relevant in every circumstance.  For instance, if your component is like PlaySound, you don't need to worry about information disclosure threats to the data flows between the Application and PlaySound.  On the other hand, if you're writing an email server, you absolutely DO care about information disclosure threats.

    UPDATE: Adam Shostak over on the SDL team has posted an enhanced definition of the STRIDE categories on the Microsoft SDL blog.  You can read that list here: http://blogs.msdn.com/sdl/archive/2007/09/11/stride-chart.aspx

    Next: STRIDE mitigations

     

    Edit: Larry can't count to 6.

     

  • Larry Osterman's WebLog

    Threat Modeling Again, Threat modeling and the fIrefoxurl issue.

    • 26 Comments

    Yesterday I presented my version of the diagrams for Firefox's command line handler and the IE/URLMON's URL handler.  To refresh, here they are again:

     Here's my version of Firefox's diagram:

     And my version of IE/URLMON's URL handler diagram:

     

    As  I mentioned yesterday, even though there's a trust boundary between the user and Firefox, my interpretation of the original design for the Firefox command line parsing says that this is an acceptable risk[1], since there is nothing that the user can specify via the chrome engine that they can't do from the command line.  In the threat model for the Firefox command line parsing, this assumption should be called out, since it's important.

     

    Now let's think about what happens when you add in the firefoxurl URL handler to the mix?

     

    For that, you need to go to the IE/URLMON diagram.  There's a clear trust boundary between the web page and IE/URLMON.  That trust boundary applies to all of the data passed in via the URL, and all of the data should be considered "tainted".  If your URL handler is registered using the "shell" key, then IE passes the URL to the shell, which launches the program listed in the "command" verb replacing the %1 value in the command verb with the URL specified (see this for more info)[2].  If, on the other hand, you've registered an asynchronous protocol handler, then IE/URLMON will instantiate your COM object and will give you the ability to validate the incoming URL and to change how IE/URLMON treats the URL.  Jesper discusses this in his post "Blocking the FIrefox".

    The key thing to consider is that if you use the "shell" registration mechanism (which is significantly easier than using the asynchronous protocol handler mechanism), IE/URLMON is going to pass that tainted data to your application on the command line.

     

    Since the firefoxurl URL handler used the "shell" registration mechanism, it means that the URL from the internet is going to be passed to Firefox's command line handler.  But this violates the assumption that the Firefox command line handler made - they assume that their command line was authored with the same level of trust as the user invoking firefox.  And that's a problem, because now you have a mechanism for any internet site to execute code on the browser client with the privileges of the user.

     

    How would a complete threat model have shown that there was an issue?  The Firefox command line threat model showed that there was a potential issue, and the threat analysis of that potential issue showed that the threat was an accepted risk.

    When the firefoxurl feature was added, the threat model analysis of that feature should have looked similar to the IE/URLMON threat model I called out above - IE/URLMON took the URL from the internet, passed it through the shell and handed it to Firefox (URL Handler above).  

     

    So how would threat modeling have helped to find the bug?

    There are two possible things that could have happened next.  When the firefoxurl handler team[3] analyzed their threat model, they would have realized that they were passing high risk data (all data from the internet should be treated as untrusted) to the command line of the Firefox application.  That should have immediately raised red flags because of the risk associated with the data.

    At this point in their analysis, the foxurl handler team needed to confirm that their behavior was safe, which they could do either by asking someone on the Firefox command line handling team or by consulting the Firefox command line handling threat model (or both).  At that point, they would have discovered the important assumption I mentioned above, and they would have realized that they had a problem that needed to be mitigated (the actual form of the mitigation doesn't matter - I believe that the Firefox command line handling team removed their assumption, but I honestly don't know (and it doesn't matter for the purposes of this discussion)).

     

    As I mentioned in my previous post, I love this example because it dramatically shows how threat modeling can help solve real world security issues.

    I don't believe that anything in my analysis above is contrived - the issues I called out above directly follow from the threat modeling process I've outlined in the earlier posts. 

    I've been involved in the threat modeling process here at Microsoft for quite some time now, and I've seen the threat model analysis process find this kind of issue again and again.  The threat model either exposes areas where a team needs to be concerned about their inputs or it forces teams to ask questions about their assumptions, which in turn exposes potential issues like this one (or confirms that in fact there is no issue that needs to be mitigated).

     

    Next: Threat Modeling Rules of thumb.

     

    [1] Obviously, I'm not a contributor to Firefox and as such any and all of my comments about Firefox's design and architecture are at best informed guesses.  I'd love it if someone who works on Firefox or has contributed to the security analysis of Firefox would correct any mistakes I'm making here.

    [2] Apparently IE/URLMON doesn't URLEncode the string that it hands to the URL handler - I don't know why it does that (probably for compatibility reasons), but that isn't actually relevant to this discussion (especially since all versions of Firefox before 2.0.0.6, seem to have had the same behavior as IE).  Even if IE had URL encoded the URL before handing it to the handler, Firefox is still being handed untrusted input which violates a critical assumption made by the Firefox command line handler developers.

    [3] Btw, I'm using the term "team" loosely.  It's entirely possible that the same one individual did both the Firefox command line handling work AND the firefoxurl protocol handler - it doesn't actually matter.

  • Larry Osterman's WebLog

    Threat Modeling Again, STRIDE Mitigations

    • 14 Comments

    I described the 6 STRIDE categories the other day.  In that post, I mentioned that there are "well understood" mitigations for each of the STRIDE categories.  Of course this list isn't exhaustive, many of these are obvious, and some don't apply, but when you're looking at providing mitigations to the threats that your threat modeling discovers, these mitigations provide a good place to start looking.

    Spoofing

    As I mentioned the other day, a spoofing attack occurs when an attacker pretends to be someone they're not.  So how do you stop a spoofing attack?  You require authentication (yeah, I did say that some of these are obvious:)).  Authentication takes many forms, there are a boatload of authentication mechanisms available (basic auth, digest, kerberos, PKI systems, IPSEC, etc).  Most of these apply to data transferred over the wire, but there are other mechanisms to ensure validity.  For instance, the Authenticode mechanism provides a way of validating that code has been signed.  Sometimes authentication isn't the right mitigation.  For instance, if you data flow diagram has a client DLL that is making an RPC into a service that you own, an attacker can spoof the client DLL - they can generate the RPC calls directly from their code bypassing your client DLL.  The mitigation for that type of attack is to add additional validation of the data transferred by the RPC in the server.

    Tampering

    Again, tampering attacks occur when the attacker modifies data in transit.  The standard mitigations for tampering attacks include digital signatures and message authentication codes.  Those work great for data transmitted on the wire, and are also valid for data stored in files on the disk.  One other mitigation for Tampering attacks are ACLs - for instance if only administrators need to write to a file or registry key, ACL it so that only administrators can write to the file/key.  Another way is validation of input read from the data source.  You need to be careful in this case to make sure that the validation doesn't introduce the possibility of a DoS attack (we had a bug in an early beta of Windows Vista where a corrupted registry key could prevent the audio service from starting - we had validation which correctly detected that a particular key was corrupted and failed to start because of it).

    Repudiation

    The standard mitigations for repudiation attacks include secure logs and audit records, coupled with using a strong authentication mechanism. 

    Information disclosure

    Information Disclosure attacks occur when the bad guy can see stuff they're not supposed to be able to see.  Standard mitigations include encryption, especially for data transmitted on the wire - for example, RPC provides a fairly robust encryption mechanism if you specify the RPC_C_AUTHN_LEVEL_PKT flag when establishing an RPC connection.  Other mitigations include ACLs (again).

    Denial of service

    It can be difficult to mitigate some classes of DoS attacks, but again, there are mechanisms that can mitigate many of the classes of DoS attacks.  For instance, you can use ACLs (again) to protect the contents of files from being removed or modified (which also protects against tampering attacks), you can use firewall filter rules (both internal and external) to protect against some network based attacks, you can use disk and processor quotas to prevent excess disk or CPU consumption.  In addition, there are design patterns that allow for high availability even in the face of active attackers (you'd have to ask server people for details, but they DO exist.

    Elevation of privilege

    To mitigate against EoP attacks, once again, you can use ACLs and other forms of permission checks.  But (IMHO) by far the most effective source of protection against EoP attacks is input validation - if the input is verified to be correct, it's harder to cause problems (not impossible, but harder).  On the other hand, you also need to be very careful about your validation logic - it's quite easy to get it wrong.

     

    As I said at the beginning of this discussion, these are just rough outlines.  Many of them don't apply.  Since I'm working on building the PlaySound threat model, I'll take two examples from that threat model:

    • For the PlaySound API, repudiation threats aren't particularly applicable.  As such, Repudiation threats are considered to be an acceptable risk.
    • Tampering threats aren't particularly relevant to any of the data flows, because they're all in-proc.  The only way that an attacker could manipulate the data flows is if they had injected code into the current process, and in order for them to do that, they need to be running at either the same or a higher privilege level - the Win32 process object model protects us from those threats.

     

    Next: How do we use STRIDE?

  • Larry Osterman's WebLog

    Threat Modeling Again, Threat Modeling in Practice

    • 11 Comments

    I've been writing a LOT about threat modeling recently but one of the things I haven't talked about is the practical value of the threat modeling process.

    Here at Microsoft, we've totally drunk the threat modeling cool-aid.  One of Adam Shostak's papers on threat modeling has the following quote from Michael Howard:

    "If we had our hands tied behind our backs (we don't) and could do only one thing to improve software security... we would do threat modeling every day of the week."

    I want to talk about a real-world example of a security problem where threat modeling would have hopefully avoided a potential problem.

    I happen to love this problem, because it does a really good job of showing how the evolution of complicated systems can introduce unexpected security problems.  The particular issue I'm talking about is known as CVE-2007-3670.  I seriously recommend people go to the CVE site and read the references to the problem, they provide a excellent background on the problem.

    CVE-2007-3670 describes a vulnerability in the Mozilla Firefox browser that uses Internet Explorer as an exploit vector. There's been a TON written about this particular issue (see the references on the CVE page for most of the discussion), I don't want to go into the pros and cons of whether or not this is an IE or a FireFox bug.  I only want to discuss this particular issue from a threat modeling standpoint.

    There are four components involved in this vulnerability, each with their own threat model:

    • The Firefox browser.
    • Internet Explorer.
    • The "firefoxurl:" URI registration.
    • The Windows Shell (explorer).

    Each of the components in question play a part in the vulnerability.  Let's take them in turn.

    • The Firefox browser provides a command line argument "-chrome" which allows you to load the chrome specified at a particular location.
    • Internet Explorer provides an extensibility mechanism which allows 3rd parties to register specific URI handlers.
    • The "firefoxurl:" URL registration, which uses the simplest form of URL handler registration which simply instructs the shell to execute "<firefoxpath>\firefox.exe -url "%1" -requestPending".  Apparently this was added to Firefox to allow web site authors to force the user to use Firefox when viewing a link.  I believe the "-url" switch (which isn't included in the list of firefox command line arguments above) instructs firefox to treat the contents of %1 as a URL.
    • The Windows Shell which passes on the command line to the firefox application.

    I'm going to attempt to draw the relevant part of the diagrams for IE and Firefox.  These are just my interpretations of what is happening, it's entirely possible that the dataflow is different in real life.

    Firefox:

    image

    This diagram shows the flow of control from the user into Firefox (remember: I'm JUST diagramming a small part of the actual component diagram).  One of the things that makes Firefox's chrome engine so attractive is that it's easy to modify the chrome because the Firefox chrome is simply javascript.  Since the javascript being run runs with the same privileges as the current user, this isn't a big deal - there's no opportunity for elevation of privilege there.  But there is one important thing to remember here: Firefox has a security assumption that the -chrome command switch is only provided by the user - because it executes javascript with full trust, it's effectively accepts executable code from the command line.

     

    Internet Explorer:

    image

    This diagram describes my interpretation of how IE (actually urlmon.dll in this case) handles incoming URLs.  It's just my interpretation, based on the information contained here (at a minimum, I suspect it's missing some trust boundaries).  The web page hands IE a URL, IE looks the URL up in the registry and retrieves a URL handler.  Depending on how the URL handler was registered, IE either invokes the shell on the path portion of the URL, or, if the URL handler was registered as an async protocol hander, it hands the URL to the async protocol handler.

    I'm not going to do a diagram for the firefoxurl handler or the shell, since they're either not interesting or are covered in the diagram above - in the firefoxurl handler case, the firefoxurl handler is registered as being handled by the shell.  In that case,  Internet Explorer will pass the URL into the shell, which will happily pass it onto the URL handler (which, in this case is FireFox).

     

    That's a lot of text and pictures, tomorrow I'll discuss what I think went wrong and how using threat modeling could have avoided the issue.  I also want to look at BOTH of the threat models and see what they indicate.

     

    Obviously, the contents of this post are my own opinion and in no way reflect the opinions of Microsoft.

  • Larry Osterman's WebLog

    Threat Modeling Again, Presenting the PlaySound Threat Model

    • 7 Comments

    It's been a long path, but we're finally at the point where I can finally present the threat model for PlaySound.  None of the information in this post is new, all the information is pulled from previous posts.

     ----------------

    PlaySound Threat Model

    The PlaySound API is a high level multimedia API intended to render system sounds ("dings").  It has three major modes of operation:

    • It can play the contents of a .WAV file passed in as a parameter to the API.
    • It can play the contents of a Win32 resource or other memory location passed in as a parameter to the API.
    • It can play the contents of a .WAV file referenced by an alias.  If this mode is chosen, it reads the filename from the registry under HKCU\AppEvents.

     For more information on the PlaySound API and its options, see: The MSDN documentation for PlaySound.

    PlaySound Diagram

    The PlaySound API's data flow can be represented as follows.

    PlaySound Elements

     

    1. Application: External Interactor - The application which calls the PlaySound API.
    2. PlaySound: Process - The code that represents the PlaySound API
    3. WAV file: Data Store - The WAV file to be played, on disk or in memory
    4. HKCU Sound Aliases: Data Store - The Windows Registry under HKCU\AppEvents which maps from aliases to WAV filenames
    5. Audio Playback APIs: External Interactor - The audio playback APIs used for PlaySound.  This could be MediaFoundation, waveOutXxxx, DirectShow, or any other audio rendering system.
    6. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - The data transmitted in this data flow represents the filename to play, the alias to look up or the resource ID in the current executable to play.
    7. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - The data transmitted in this data flow represents the WAVEFORMATEX structure contained in the WAV file being played.
    8. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - The data transmitted in this data flow represents the actual audio samples contained in the WAV file being played.
    9. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - The data transmitted in this data flow represents the contents of the HKCU\AppEvents\Schemes\.Default\<sound>\.Current[1]
    10. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - The data transmitted in this data flow represents both the WAVEFORMATEX structure read from the WAV file and the audio samples read from the file.

    PlaySound Threat Analysis

    Data Flows

    WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Tampering
    WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Information Disclosure
    WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Denial of Service

    Because all the Data flows are all within a single process boundary, there are no meaningful threats to those dataflows - the Win32 process model protects against those threats.

    External Interactors

    Application: External Interactor - Spoofing

    It doesn't matter which application called the PlaySound API, so we don't care about spoofing threats to the application calling PlaySound.

    Application: External Interactor - Repudiation

    There are no requirements that the PlaySound API protect against Repudiation attacks.

    Audio Playback APIs: External Interactor - Spoofing

    The system default APIs are protected by Windows Resource Protection so they cannot be replaced.  If an attacker does successfully inject his logic (by overriding the COM registration for the audio APIs or by some other means, the attacker is running at the same privilege level as the user, so can do nothing that the user can't do.

    Audio Playback APIs: External Interactor - Repudiation

    There are no requirements that the PlaySound API protect against Repudiation attacks.

    Data Stores

    Since the data stores involved in the PlaySound API are under the control of the user, we must protect against threats to those data stores.

    WAV file: Data Store - Tampering

    An attacker can modify the contents of the WAV file data store.  To mitigate this attack, we will validate the WAVE header information; we're not going to check the actual WAV data, since it's just raw audio samples.  Bug #XXXX filed to validate this mitigation.

    WAV file: Data Store - Information Disclosure

    The WAV file is protected by NT's filesystem ACLs which prevent unauthorized users from reading the contents of the file.

    WAV file: Data Store - Repudiation

    Repudiation threats don't apply to this store.

    WAV file: Data Store - Denial of Service

    The PlaySound API will check for errors when reading from the store and will return an error indication to its caller (if possible). When PlaySound is running in the "resource or memory location" mode and the SND_ASYNC flag is specified, the caller may unmap the virtual memory associated with the WAV file.  In that case, the PlaySound may access violate while rendering the contents of the file[2].  Bug #XXXX filed to validate this mitigation.

    HKCU Sound Aliases: Data Store - Tampering

    An attacker can modify the contents of the sound aliases registry key.  To mitigate this attack, we will validate the contents of the key. Bug #XXXX filed to validate this mitigation.

    HKCU Sound Aliases: Data Store - Information Disclosure

    The aliases key is protected by the registry ACLs which prevent unauthorized users from reading the contents  of the key.

    HKCU Sound Aliases: Data Store - Repudiation

    Repudiation threats don't apply to this store.

    HKCU Sound Aliases: Data Store - Denial of service

    The PlaySound API will check for errors when reading from the store and will return an error indication to its caller (if possible).Bug #XXXX filed to validate this mitigation.

    Processes

    PlaySound: Process - Spoofing

    Since PlaySound is the component we're threat modeling, spoofing threats don't apply.

    PlaySound: Process - Tampering

    The only tampering that can happen to the PlaySound process directly involves modifying the PlaySound binary on disk, if the user has the rights to do that, we can't stop them.  For PlaySound, the file is protected by Windows Resource Protection, which should protect the file from tampering.

    PlaySound: Process - Repudiation

    We don't care about repudiation threats to the PlaySound API.

    PlaySound: Process - Information Disclosure

    The NT process model prevents any unauthorized entity from reading the process memory associated with the Win32 process playing Audio, so this threat is out of scope for this component.

    PlaySound: Process - Denial of Service

    Again, the NT process model prevents unauthorized entities from crashing or interfering with the process, so this threat is out of scope for this component.

    PlaySound: Process - Elevation of Privilege

    The PlaySound API runs at the same privilege level as the application calling PlaySound, so it is not subject to EoP threats.

    PlaySound: Process - "PlaySound Command" crosses trust boundary: Elevation of Privilege/Denial of Service / Tampering

    The data transmitted by the incoming "PlaySound Command" data flow comes from an untrusted source.  Thus the PlaySound API will validate the data contained in that dataflow for "reasonableness" (mostly checking to ensure that the string passed in doesn't cause a buffer overflow).  Bug #XXXX filed to validated this mitigation.

    PlaySound: Process - "WAV file Data" data flow crosses trust boundary: Information Disclosure

    It's possible that the contents of the WAV file might be private, so if some attacker can somehow "snoop" the contents of the data they might be able to learn information they shouldn't.  Another way that this "I" attack shows up is described in CVE-2007-0675 and here.  So how do we mitigate that threat (and the corresponding threat associated with someone spoofing the audio APIs)?

    The risk associated with CVE-2007-0675 is out-of-scope for this component (if the threat is to be mitigated, it's more appropriate to handle that either in the speech recognition engine or the audio stack), so the only risk is that we might be handing the audio stack data that can be misused. 

    Since the entire APIs purpose is to play the contents of the WAVE file, this particular threat is considered to be an acceptable risk.

    ---

    [1] The actual path is slightly more complicated because of SND_APPLICATION flag, but that doesn't materially change the threat model.

    [2] The DOS issues associated with this behavior are accepted risks.

    --------------

    Next: Let's look at a slightly more interesting case where threat modeling exposes an issue.

  • Larry Osterman's WebLog

    Threat Modeling Again, Threat Modeling PlaySound

    • 7 Comments

    Finally it's time to think about threat modeling the PlaySound API.

    Let's go back to the DFD that I included in my earlier post, since everything flows from the DFD.

     

    This dataflow diagram contains a number of elements, they are:

    1. Application: External Interactor
    2. PlaySound: Process
    3. WAV file: Data Store
    4. HKCU Sound Aliases: Data Store
    5. Audio Playback APIs: External Interactor
    6. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary)
    7. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary)
    8. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary)
    9. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary)
    10. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow

    Now that we've enumerated the elements, we apply the STRIDE/Element methodology, which allows us to enumerate the threats that this component faces:

    1.  Application: External Interactor - Spoofing
    2.  Application: External Interactor - Repudiation
    3. PlaySound: Process - Spoofing
    4. PlaySound: Process - Tampering
    5. PlaySound: Process - Repudiation
    6. PlaySound: Process - Information Disclosure
    7. PlaySound: Process - Denial of Service
    8. PlaySound: Process - Elevation of Privilege
    9. WAV file: Data Store - Tampering
    10. WAV file: Data Store - Information Disclosure
    11. WAV file: Data Store - Repudiation
    12. WAV file: Data Store - Denial of Service
    13. HKCU Sound Aliases: Data Store - Tampering
    14. HKCU Sound Aliases: Data Store - Information Disclosure
    15. HKCU Sound Aliases: Data Store - Repudiation
    16. HKCU Sound Aliases: Data Store - Denial of service
    17. Audio Playback APIs: External Interactor - Spoofing
    18. Audio Playback APIs: External Interactor - Repudiation
    19. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - Tampering
    20. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - Information Disclosure
    21. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - Denial of Service
    22. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    23. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    24. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    25. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    26. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    27. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    28. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    29. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    30. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    31. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Tampering
    32. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Information Disclosure
    33. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Denial of Service

     Phew.  You mean that the PlaySound API can be attacked in 33 different ways?  That's unbelievable.

    It's true.  There ARE 33 ways that you can attack the PlaySound API, however many of them are identical, and some of which are irrelevant.  That's the challenge of the next part of the process, which is the analysis phase.

    As I mentioned in the first STRIDE-per-element post, STRIDE-per-element is a framework for analysis.  That's where common sense and your understanding of the system comes into focus.

    And that's the next part in the series: Analyzing the threats enumerated by STRIDE-per-element.  This is the point at which all the previous articles come together.

  • Larry Osterman's WebLog

    What's wrong with this code, part 21, a psychic debugging example

    • 48 Comments

    Over the weekend, one of the developers in my group sent me some mail - he was seeing one of the registers in his code getting corrupted across a procedure call.  He was quite surprised to see this, and asked me for any suggestions.

    With the help of the info he gave me, I was able to figure out what had gone wrong with his code, and I realized that it'd make a great "what's wrong with this code" example.

    There are three parts to the code associated with this "what's wrong".  The first is an interface definition:

    class IPsychicInterface
    {
    public:
        virtual bool DoSomeOperation(int argc, _TCHAR *argv[]) = 0;
    };

    Next, you have a tiny test application:

    int _tmain(int argc, _TCHAR* argv[])
    {
        register int value1 = 1;
        IPsychicInterface *psychicInterface = GetPsychicInterface();
        register int value2 = 2;

        psychicInterface->DoSomeOperation(argc, argv);
        assert(value1 == 1);
        assert(value2 == 2);
        return 0;
    }

    The failure  happened when the caller returned from psychicInterface->DoSomeOperation - upon the return, the ESI register, which is supposed to be preserved got trashed.  Further debugging showed that the reason that ESI was trashed was that the stack was imbalanced after the call to DoSomeOperation.

    There's one more piece of information that I was given that let me immediately realize the root cause of the problem.

     

    I know that if I include that information, what went wrong should be blindingly obvious, so I'm going to be mean and ask you to tell me what the one additional piece of information was.   The reason that the other developer in my group didn't find it was simply because he was looking at too much data - if I had pointed out that one piece of additional data, he'd have instantly figured it out too.

     

    So the "answer" to this part of the "What's wrong" problem is "What single additional piece of information was I given that made this problem simple to solve?"

  • Larry Osterman's WebLog

    Threat Modeling Again, What does STRIDE have to do with threat modeling?

    • 5 Comments

    In my last couple of posts, I've talked about the STRIDE categories.  As I mentioned, STRIDE provides a convenient classification mechanism for threats, and threat modeling is all about trying to identify the threats to your feature/component/whatever.

    When we first started threat modeling, we already had the idea of STRIDE categories, but we really didn't know how to apply them.  We'd go into the big threat modeling meeting and look at each of the pieces of our diagram and ask "what is the spoofing (or tampering, or whatever) threat against this component"?  We were thinking about the STRIDE categories as discrete elements, not as categories in which to collect threats.

    After a while, it became obvious that not only doesn't this work (again, it's very adhoc), but it's missing the point.  The point is to identify the threats and put them in the appropriate bucket so you can help to understand how to mitigate the threat.

    One of the interesting aspects of threats is that they are permanent.  For a given design, the threats against that design are static, for any data flow diagram, you have a static set of threats that apply to that data flow diagram.  There may be more than one threat in a particular category for a particular element, but every element is subject to certain threats.

    Once we had this mindset shift, we started thinking about how the STRIDE categories applied to various elements, and we came to an interesting realization.

    It turns out that some STRIDE threats only apply to particular types of elements.  If you think about it, it makes sense - for instance, an Elevation of Privilege threat doesn't apply to data stores (since a data store simply holds data, it operates at no privilege level).

     

    Remember that we consider four types of elements in a threat model: External Entities, Processes, Data Stores and Data Flows.  For each element type, the following threats are considered valid:

    External Entities: Spoofing, Repudiation.  Since an external entity could be anything, including the human being interacting with the component, Tampering, Information Disclosure, Denial of Service and Elevation of Privilege threats don't really make sense).  On the other hand, you can absolutely spoof a human being, and human beings can repudiate operations.

    Processes: Processes are subject to all of the STRIDE threats (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege).

    Data Stores: Tampering, Information Disclosure, Denial of Service (as I mentioned above, EoP etc don't really apply to static stores), and repudiation.

    Data Flows: Tampering, Information Disclosure, Denial of Service. 

    Repudiation threats against data stores require special mention.  Data stores often come under attack to allow for a repudiation attack to work (if you have a log located in a data store, the attacker might try to flood the data store with log entries to enable a repudiation attack.  In addition, logs held in data stores are almost always the mitigation against a repudiation threat.

    And with that final realization, all the pieces have been brought together to describe Microsoft's current methodology for threat modeling, which we call STRIDE-per-element.

    Next: STRIDE-per-element.

  • Larry Osterman's WebLog

    Threat Modeling Again, STRIDE per Element

    • 6 Comments

    As I mentioned the other day,  we had three huge big realizations as we've been doing more and more threat models.  The first (which we've known all along) was that threats are permanent - the threats that apply to the elements of your component don't change over time (assuming that your diagram is accurate and that the design doesn't change[1]). The second (again, which we've known all along) is that threats can be categorized according to each of the 6 STRIDE categories and that there are standard mitigations that can be applied for each of the STRIDE elements.  The final piece of the puzzle was that for each type of element, there is a limited set of STRIDE categories that apply to that element.

    When you put those pieces together, it turns out that you can build a modeling process based on your diagram (remember diagrams?  I started out this series talking about them - there was a reason for it :))

    Since each element in your diagram has a series of threats that apply to it, you can build your threat model simply by thinking about how each threat applies.  That means that every dataflow in your document, has Tampering, Information Disclosure, and Denial of Service threats.  It's entirely possible that you don't actually care about those threats, but they ARE at risk to those threats. 

    We call this process of considering the threats to each element in the DFD "STRIDE-per-Element" (it's not the most original name, but it IS concise and highly descriptive). 

     

    The way that STRIDE-per-element shines (and why it's been so effective here at Microsoft) is that it's a flexible paradigm. 

    A group can use STRIDE/element as a framework that can lead you to consider threats that you hadn't considered before, they can use STRIDE/element as a way of guiding a brainstorming session, or they can use STRIDE/element as a way of thinking about which mitigations might be appropriate for the various threats that their elements face.  The STRIDE/element methodology allows for all of these.

    And the best part about it (as far as I'm concerned) is that STRIDE/element allows people to produce competent threat models with relatively little training.  Which, as I mentioned at the beginning of the series is important.  At Microsoft, every person on the development team participates in the threat modeling process, because they understand their feature better than anyone else does.

     

    The STRIDE/element methodology ends up creating a fair number of threats, even a component with a relatively tiny diagram like:

    image

    has at least 18 different threats according to the STRIDE/element methodology.  The good news is that many/most of those threats aren't meaningful threats - for example, if "Component" in the example above is an API, the "spoofing" threat against "Application" isn't particularly meaningful - you don't care WHO calls your API, one caller is as good as another.  On the other hand, if "Do Something" involved communicating data over the network, you probably DO care about the T, I and D threats associated with that data flow.  It's the analysis of each threat that lets you know how to handle that.

    This analysis is the core of the threat model, and where the real work associated with the process takes place (and where the value of the process shows up).  The analysis of the threats associated with your component let you know (a) how you need to change the component to prevent against various classes of attacks and (b) what your test team needs to do to ensure that your mitigations are in place and are correct.

     

    Next: Let's start building the PlaySound threat model!

     

     

    [1] I'll come back to this particular point a couple of posts from now, it's important.

  • Larry Osterman's WebLog

    Got Tetris?

    • 7 Comments

    I just wanted to take a quick break from threat modeling to point to a video that Valorie passed on to me that's nothing short of remarkable (from the blog of one of my favorite magazines, Mental Floss):

    Advanced Tetris Gameplay:

    The magic happens about 3 minutes into the video, when the game speed kicks up from Unbelievable to Utterly Insane.

     

    And then, once the player's beat Utterly Insane mode, the blocks become invisible.  And the player STILL manages to beat the game.

    Wow.

     

  • Larry Osterman's WebLog

    Threat Modeling Again, Pulling the threat model together

    • 9 Comments

    So I've been writing a LOT of posts about the threat modeling process and how one goes about doing the threat model analysis for a component.

    The one thing I've not talked about is what a threat model actually is.

    A threat model is a specification, just like your functional specification (a Program Management spec that defines the functional requirements of your component), your design specification (a development spec that defines the architecture that is required to implement the functional specification), and your test plan (a test spec that defines how you plan on ensuring that the design as implemented meets the requirements of the functional specification).

    Just like the functional, design and test specs, a threat model is a living document - as you change the design, you need to go back and update your threat model to see if any new threats have arisen since you started.

    So what goes into the threat model document?

    • Obviously you need the diagram and an enumeration and description of the elements in your diagram. 
    • You also need to include your threat analysis, since that's the core of the threat model.
    • For each mitigated threat that you call out in the threat analysis, you should include the bug # associated with the mitigation
    • You should probably have a one or two paragraph description of your component and what it does (it helps an outsider to understand your diagram), similarly, having a list of contacts for questions, etc are also quite useful.

    The third item I called out there reflects an important point about threat modeling that's often lost.

    Every time your threat model indicates that you have a need to mitigate a particular threat, you need to file at least one bug and potentially two.  The first bug goes to the developer to ensure that the developer implements the mitigation called out in the threat model, and the second bug goes to a tester to ensure that the tester either (a) writes tests to verify the mitigation or (b) runs existing tests to ensure that the mitigation is in place.

    This last bit is really important.  If you're not going to follow through on the process and ensure that the threats that you identified are mitigated, then your just wasting your time doing the threat model - except as an intellectual exercise, it won't actually help you improve the security of your product.

     

    Next: Presenting the PlaySound threat model!

  • Larry Osterman's WebLog

    Threat Modeling Again, Analyzing the threats to PlaySound

    • 4 Comments

    In my last post, I enumerated a bewildering array of threats that the PlaySound API is subject to, today I want to work through the analysis process for each of those threats.

    To refresh, here's the DFD and the list of threats:

    1.  Application: External Interactor - Spoofing
    2.  Application: External Interactor - Repudiation
    3. PlaySound: Process - Spoofing
    4. PlaySound: Process - Tampering
    5. PlaySound: Process - Repudiation
    6. PlaySound: Process - Information Disclosure
    7. PlaySound: Process - Denial of Service
    8. PlaySound: Process - Elevation of Privilege
    9. WAV file: Data Store - Tampering
    10. WAV file: Data Store - Information Disclosure
    11. WAV file: Data Store - Repudiation
    12. WAV file: Data Store - Denial of Service
    13. HKCU Sound Aliases: Data Store - Tampering
    14. HKCU Sound Aliases: Data Store - Information Disclosure
    15. HKCU Sound Aliases: Data Store - Repudiation
    16. HKCU Sound Aliases: Data Store - Denial of service
    17. Audio Playback APIs: External Interactor - Spoofing
    18. Audio Playback APIs: External Interactor - Repudiation
    19. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - Tampering
    20. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - Information Disclosure
    21. PlaySound Command (Application->PlaySound): DataFlow (Crosses Threat Boundary) - Denial of Service
    22. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    23. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    24. WAVE Header (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    25. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    26. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    27. WAV file Data (WAV file-> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    28. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Tampering
    29. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Information Disclosure
    30. WAV filename (HKCU Sound Aliases -> PlaySound) : DataFlow (Crosses Threat Boundary) - Denial of Service
    31. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Tampering
    32. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Information Disclosure
    33. WAV file Data (PlaySound -> Audio Playback APIs): DataFlow - Denial of Service

     

    Let's start with the dataflows, mostly because there are so many of them.

    For each of the dataflows (PlaySound Comand, WAVE Header, WAV file Data, WAV filename and WAV file Data), because the dataflows are all within a single process boundary, there are no meaningful threats to those dataflows.  So with one simple assertion, I've dealt with almost half the threats associated with PlaySound (this isn't strictly true - there ARE threat associated with the dataflows, I'll get to that details when I discuss the threats to the PlaySound process).

     

    Next, let's look at the external interactors.

    For the Application interactor, we don't care about spoofing threats - we don't care which application calls PlaySound, so once again, we can simply assert that Spoofing threats don't apply. We can make a similar assertion about the Repudiation threat - for PlaySound, there is no risk of the interactor repudiating their action (nobody's going to come along and claim that they didn't really make that noise).

    For the audio playback APIs, things are a smidge more complicated.  Once again, repudiation isn't an issue, so that particular threat's taken care of.   Spoofing is a more interesting issue.  What happens if someone spoofs the audio engine?  At one level, this is a "if it looks like a duck, walks like a duck and quacks like a duck, it's a duck" type of problem - because the entity claiming to be the audio engine follows the contracts of the audio engine, do we really care if it's the real audio engine or not?  Maybe, maybe not.  For now, let's table this particular threat (I'll come back to it later).

     

    Ok, we've mostly handled the interactors and data flows.  Let's discuss the two data stores.

    Each of the two data stores are subject to Tampering, Repudiation, Information Disclosure and Denial of Service threats.

    An attacker can tamper with the contents of both stores, and we need to have mitigations in place for that: The PlaySound API will validate the contents of the data read from either data store for "reasonableness".

    There are a number of ways that Denial of Service attacks show up for files and registry keys - first off, the file/key requested might not be present.  Secondly, the file might be on a removable storage medium which is removed during the read (registry keys also suffer from this - if you are using roaming profiles, then the registry hives might be located over the network).  There are others, but I believe these two are the big ones.  For PlaySound, all we need to do is to check error returns when reading from the data stores.  There are no performance guarantees associated with PlaySound so the fact that it may take a long time to read the data doesn't constitute a threat.

    So the analysis for the data stores is:

    WAV file: Data Store - Tampering: An attacker can modify the contents of the WAV file data store.  To mitigate this attack, we will validate the WAVE header information; we're not going to check the actual WAV data, since it's just raw audio samples. 

    WAV file: Data Store - Information Disclosure: The WAV file is protected by NT's filesystem ACLs which prevent unauthorized users from reading the contents of the file.

    WAV file: Data Store - Repudiation: Repudiation threats don't apply to this store.

    WAV file: Data Store - Denial of Service: The PlaySound API will check for errors when reading from the store and will return an error indication to its caller (if possible).

    HKCU Sound Aliases: Data Store - Tampering: An attacker can modify the contents of the sound aliases registry key.  To mitigate this attack, we will validate the contents of the key.

    HKCU Sound Aliases: Data Store - Information Disclosure: The aliases key is protected by the registry ACLs which prevent unauthorized users from reading the contents  of the key.

    HKCU Sound Aliases: Data Store - Repudiation: Repudiation threats don't apply to this store.

    HKCU Sound Aliases: Data Store - Denial of service: The PlaySound API will check for errors when reading from the store and will return an error indication to its caller (if possible).

     

    That's everything, except for the actual PlaySound API.  I'm going to take each of the threats to PlaySound in turn:

     PlaySound: Process - Spoofing: Since PlaySound is the component we're threat modeling, spoofing threats don't apply.

    PlaySound: Process - Tampering: The only tampering that can happen to the PlaySound process directly involves modifying the PlaySound binary on disk, if the user has the rights to do that, we can't stop them.  For PlaySound, the file is protected by Windows Resource Protection, which should protect the file from tampering.  For non Windows components, if you follow Microsoft's recommendations, the ACLs on the filesystem usually protect your component from tampering.

    PlaySound: Process - Repudiation: We don't care about repudiation threats to the PlaySound API.

    PlaySound: Process - Information Disclosure: The NT process model prevents any unauthorized entity from reading the process memory associated with the Win32 process playing Audio, so this threat is out of scope for this component.

    PlaySound: Process - Denial of Service: Again, the NT process model prevents unauthorized entities from crashing or interfering with the process, so this threat is out of scope for this component.

    PlaySound: Process - Elevation of Privilege: The PlaySound API runs at the same privilege level as the application calling PlaySound, so it is not subject to EoP threats.

     

    But that's not the complete set of threats associated with the PlaySound service, there are two more sets of threats that need to be mitigated.

    The first is a threat that comes from the "PlaySound Command" data flow.  The data transmitted over that data flow comes from an untrusted source, and thus it represents a Tampering and Denial of Service threat.  Since the data conceivably comes from an anonymous source (it could come from a web page in some circumstances), it's also potentially subject to an EoP threat.

    The mitigation to all three of these threats is the same:

    PlaySound: Process - Elevation of Privilege/Denial of Service/Tampering: The data transmitted by the incoming "PlaySound Command" data flow comes from an untrusted source.  Thus the PlaySound API will validate the data contained in that dataflow for "reasonableness" (mostly checking to ensure that the string passed in doesn't cause a buffer overflow).

    The last threat is an Information Disclosure threat associated with the "WAV file Data (PlaySound -> Audio Playback APIs): DataFlow" data flow.  It's possible that the contents of the WAV file might be private, so if some attacker can somehow "snoop" the contents of the data they might be able to learn information they shouldn't.  Another way that this "I" attack shows up is described in CVE-2007-0675 and here.  So how do we mitigate that threat (and the corresponding threat associated with someone spoofing the audio APIs)?

    First the spoofing issue against the audio APIs.  For that, I assert that it doesn't matter.  Here's the logic: There are two ways that the audio APIs can be spoofed.  The first is if the attacker somehow modifies or replaces the files that constitute the audio subsystem.  That threat is mitigated by the ACLs on those files - only administrators have the ability to modify those files, and you can't protect against an administrator.  The second way that the APIs can be spoofed is if someone overrides the COM registration for the audio stack that's located in HKLM - again, that can't be done because you have to be an administrator to modify those keys.  There IS a possible attack caused by modifying HKCU\Classes, but that threat isn't particularly relevant - it (a) requires that the user run a program to modify those registry keys and (b) doesn't give the user any abilities that they don't already have.

    Finally, the information disclosure vulnerability associated with sending the audio data to the Playback APIs.  The risk associated with CVE-2007-0675 is out-of-scope for this component (if the threat is to be mitigated, it's more appropriate to handle that either in the speech recognition engine or the audio stack), so the only risk is that we might be handing the audio stack data that can be misused.

    Since the entire APIs purpose is to play the contents of the WAVE file, this particular threat is considered to be an acceptable risk.

     

    Wow, that's a LOT of typing for such a small feature, it ended up being much more than I had anticipated.

     

    Next: Pulling the threat model together.

  • Larry Osterman's WebLog

    What's wrong with this code, Part 21 - A psychic debugging example: The missing piece

    • 18 Comments

    As I mentioned yesterday, one of the other developers in my group had hit a sticky problem, and he asked me for my opinion on what was going wrong.

    There were 3 pieces of information that I needed to use to diagnose the problem, I gave you two of them yesterday:

    The interface:

    class IPsychicInterface
    {
    public:
        virtual bool DoSomeOperation(int argc, _TCHAR *argv[]) = 0;
    };

    And the test application:

    int _tmain(int argc, _TCHAR* argv[])
    {
        register int value1 = 1;
        IPsychicInterface *psychicInterface = GetPsychicInterface();
        register int value2 = 2;

        psychicInterface->DoSomeOperation(argc, argv);
        assert(value1 == 1);
        assert(value2 == 2);
        return 0;
    }

    Originally the problem was that the ESI register was being trashed.  Since the C and C++ calling convention requires that the ESI register be preserved and the ESI register was trashed, that narrowed down the failure to three possible causes to the problem:

    • Somewhere inside DoSomeOperation, there was a stack overflow that caused the saved version of ESI to be corrupted.  This was actually my first thought.
    • Somewhere inside DoSomeOperation, there was a stack imbalance, which would cause garbage to be restored when the ESI register was popped off the stack.  Normally the compiler catches these errors, so I originally discounted this possibility.
    • There was a horrible compiler bug or OS bug which caused the register to be trashed (which is extraordinarily unlikely (but has happened)).

    The other developer had chased the problem down further and realized that there was a stack imbalance on the call to DoSomeOperation.  There are basically two things that can cause a stack imbalance, most of the people who left comments in the original post caught one of them, some caught the other:

    • A calling convention mismatch.
    • A parameter declaration mismatch.

    But I didn't have enough information to figure out which of the two it was.  That's when he gave me the final piece that let me accurately figure out what was going wrong.

    The final piece was the last bit of assembly language in the DoSomeOperation function:

    0040106B  pop         edi 
    0040106C  pop         ebx 
    0040106D  mov         al,1
    0040106F  pop         esi 
    00401070  ret         0Ch 

    Now that you have the last piece, what was the bug?  Be specific - we already know that the problem is a stack imbalance, but what's the root cause?

    For a bonus, why didn't the compiler catch it?

  • Larry Osterman's WebLog

    What's wrong with this code, part 21 - A Psychic Debugging Example - The answers.

    • 10 Comments

    So for the past couple of posts, I've been walking through a psychic debugging experience I had over the weekend.

    As I presented the problem, there were three pieces of information needed to debug the problem.

    An interface:

    class IPsychicInterface
    {
    public:
        virtual bool DoSomeOperation(int argc, _TCHAR *argv[]) = 0;
    };

    A test application:

    int _tmain(int argc, _TCHAR* argv[])
    {
        register int value1 = 1;
        IPsychicInterface *psychicInterface = GetPsychicInterface();
        register int value2 = 2;

        psychicInterface->DoSomeOperation(argc, argv);
        assert(value1 == 1);
        assert(value2 == 2);
        return 0;
    }

    and some assembly language code:

    0040106B  pop         edi 
    0040106C  pop         ebx 
    0040106D  mov         al,1
    0040106F  pop         esi 
    00401070  ret         0Ch 

    As I mentioned in my last post, the problem was tracked down to a stack imbalance when calling the DoSomeOperation method, and when I saw the postamble for DoSomeOperation, I quickly realized the answer to the problem.

    There are essentially 4 separate calling conventions supported by Microsoft's compilers - it turns out that you can figure out several of them from just looking at the code.  For the stdcall and thiscall calling conventions, input parameters are passed onto the routine, and the callee is responsible for cleaning up the stack (this contrasts with the cdecl calling convention where the caller is responsible for cleaning the stack).  From the postamble, we know that this function is either a stdcall or a thiscall function, since the "ret" instruction adjusts the stack.

    I've already stated that this is x86 code, and the RET 0CH indicates that the routine pops off 12 bytes of values off the stack.  This is clearly a problem, because the DoSomeOperation routine only takes two parameters (which would take 8 bytes). The RET 0CH implies that the implementation of DoSomeOperation took 3 parameters!

     

    This implies that we're dealing with a violation of the one definition rule (ODR).  The One Definition Rule is a part of the C++ standard (section 3.2) which states: "No translation unit shall contain more than one definition of any variable, function, class type, enumeration type or template.".  In other words, when you declare a function in separate object files, you need to make sure that they all use the same definitions of structures. 

    Most commonly ODR violations this show up when you change a header file but don't rebuild all the source files that depend on that file - there's a ton of work that's been done to automatically manage dependencies to avoid this particular issue.

    And if you look at the source code for the PsychicInterface logic, you'll see the problem immediately:

    class IPsychicInterface
    {
    public:
        virtual bool DoSomeOperation(int argc, _TCHAR *argv[], _TCHAR *envp[]) = 0;
    };

    bool CPsychicInterface::DoSomeOperation(int argc, _TCHAR *argv[], _TCHAR *envp[])
    {
        int count = argc;
        while (count--)
        {
            printf("%S", argv[count]);
        }
        return true;
    }

    The PsychicInterface code has it's own private definition of IPsychicInterface which doesn't match the definition in the test application.

    Obviously this is an utterly contrived example.  The real problem was much more complicated than this - the violation was in an export from a DLL, and involved external components, which made this more complicated.  In many ways, it was similar to the problem that Raymond talked about here (except in this case, we're in a position to fix the code involved).

Page 1 of 1 (15 items)