Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Larry and the "Ping of Death"

    • 34 Comments

    Also known as "Larry mounts a DDOS attack against every single machine running Windows NT"

    Or: No stupid mistake goes unremembered.

     

    I was recently in the office of a very senior person at Microsoft debugging a problem on his machine.  He introduced himself, and commented "We've never met, but I've heard of you.  Something about a ping of death?"

    Oh. My. Word.  People still remember the "ping of death"?  Wow.  I thought I was long past the ping of death (after all, it's been 15 years), but apparently not.  I'm not surprised when people who were involved in the PoD incident remember it (it was pretty spectacular), but to have a very senior person who wasn't even working at the company at the time remember it is not a good thing :).

    So, for the record, here's the story of Larry and the Ping of Death.

    First I need to describe my development environment at the time (actually, it's pretty much the same as my dev environment today).  I had my primary development machine running a version of NT, it was running a kernel debugger connected to my test machine over a serial cable.  When my test machine crashed, I would use the kernel debugger on my dev machine to debug it.  There was nothing debugging my dev machine, because NT was pretty darned reliable at that point and I didn't need a kernel debugger 99% of the time.  In addition, the corporate network wasn't a switched network - as a result, each machine received datagram traffic from every other machine on the network.

     

    Back in that day, I was working on the NT 3.1 browser (I've written about the browser here and here before).  As I was working on some diagnostic tools for the browser, I wrote a tool to manually generate some of the packets used by the browser service.

    One day, as I was adding some functionality to the tool, my dev machine crashed, and my test machine locked up.

    *CRUD*.  I can't debug the problem to see what happened because I lost my kernel debugger.  Ok, I'll reboot my machines, and hopefully whatever happened will hit again.

    The failure didn't hit, so I went back to working on the tool.

    And once again, my machine crashed.

    At this point, everyone in the offices around me started to get noisy - there was a great deal of cursing going on.  What I'd not realized was that every machine had crashed at the same time as my dev machine had crashed.  And I do mean EVERY machine.  Every single machine in the corporation running Windows NT had crashed.  Twice (after allowing just enough time between crashes to allow people to start getting back to work).

     

    I quickly realized that my test application was the cause of the crash, and I isolated my machines from the network and started digging in.  I quickly root caused the problem - the broadcast that was sent by my test application was malformed and it exposed a bug in the bowser.sys driver.  When the bowser received this packet, it crashed.

    I quickly fixed the problem on my machine and added the change to the checkin queue so that it would be in the next day's build.

     

    I then walked around the entire building and personally apologized to every single person on the NT team for causing them to lose hours of work.  And 15 years later, I'm still apologizing for that one moment of utter stupidity.

  • Larry Osterman's WebLog

    Running Non Admin

    • 38 Comments

    There’s been a fascinating process going on over here behind the curtains.  With the advent of XP SP2, more and more people are running as non administrative users.  Well, it’s my turn to practice what I preach, I’ve taken the plunge on my laptop and my home machine, I’m now running as a non admin user (I can’t do it on my development machine at work for the next few weeks for a variety of reasons).

    The process so far has been remarkably pain free, but there have been some “interesting” idiosyncrasies.  First off, I’ve been quite surprised at the number of games that have worked flawlessly.  I was expecting to have major issues, but none so far, with the exception of Asheron’s Call.  Annoyingly, the problem with AC isn’t the game itself, it’s with Microsoft’s Gaming Zone software, which insists on modifying files in the C:\Program Files directory. 

    Aaron Margosis’ blog posts about running as a limited user have been extremely helpful as well.

    Having said that, there are some oddities I’ve noticed.  First off: There seem to be a lot of applications that “assume” that they know what the user’s going to do.  For instance, if you double click on the time in the system tray, it pops up with “You don’t have the proper privilege level to change the System Time”.  This is a completely accurate statement, since modifying the time requires the SeSystemTime privilege, which isn’t granted to limited users.  But it assumes that the reason that I was clicking on the time was to change the time.  But maybe I wanted to use the date&time control panel as a shortcut to the calendar?  I know of a bunch of users that call action of double clicking on the time in the taskbar as invoking the “cool windows calendar”, they don’t realize that they’re just bringing up the standard date&time applet.  If I don’t have the SeSystemTime privilege, then why not just grey out the “OK” button?  Let me navigate the control but just prevent me from changing things.

    Similarly, the users control panel applet prompts you with a request to enter your credentials.  Why?  There are lots of things a limited user can do with the users control panel applet (enumerating groups, enumerating users, enumerating group membership, setting user information).  But the control panel applet ASSUMES that the user wants to manipulate the state of the other users.  It’s certainly true that most of the useful functionality of that control panel applet requires admin access.  But it should have waited until the user attempted to perform an action that was denied before it prompted the user for admin credentials.

    From my standpoint, these examples violate two of the principals of designing interfaces that involve security: 

    1)      Don’t tell the user they can’t do something until you’ve proven that they can’t do it.

    2)      Don’t assume what access rights the user needs to perform an operation. 

    The date&time control panel violates the first principal.  The user might be interacting with the control panel for reasons other than changing the time.  It turns out that the reason for this is that the date&time applet violates the principle of least privilege by enabling the SeDateTime privilege, running the control panel applet, and then disabling the privilege.  If the control panel applet had waited until the user clicked on the “Apply” button before it enabled the privilege (and then failed when the enable privilege failed), it would have better served the user IMHO.

    The users control panel applet violates the second principal.  In the case of the users control panel, it assumed that I was going to do something that required admin access.   This may in fact be a reasonable assumption given the work that the users control panel applet does (its primary task is to manage local group membership).  But the applet assumes up front that the user has to be an administrator to perform the action.  There may in fact be other classes of users that can access the information in the users control panel – as an example, members of the domains’ “account operators” group may very well be able to perform some or all the actions that the users control panel applet performs.  But the control panel applet doesn’t check for that – it assumes that the user has to be a member of the local administrators group to use the control panel applet.  Interestingly enough, this behavior only happens on XP PRO when joined to a domain.  If you’re not joined to a domain, the users control panel applet allows you to change your user information without prompting you – even as a limited user.   Peter Torr also pointed out that the computer management MCC snapin (compmgmt.msc) does the “right” thing – you can interact with the UI, perform actions (adding users to groups, etc), and it’s only when you click the “Apply” button that it fails.  The snap-in doesn’t know what’s allowed or not, it just tries the operation, and reports the failure to the user.

    This is a really tough problem to solve from a UI perspective – you want to allow the user to do their work, but it’s also highly desirable that you not elevate the privilege of the user beyond the minimum that’s required for them to do their job.  The good news is that with more and more developers (both at Microsoft and outside Microsoft) running as non administrative users, more and more of these quirks in the system will be fixed.

     

    Edit: Thanks Mike :)
  • Larry Osterman's WebLog

    How does COM activation work anyway?

    • 6 Comments

    One of my co-workers came to me the other day and asked why on earth COM had this "dwClsContext" parameter. In particular, he was concerned about the various CLSCTX_XXX_SERVER options.

    In order to explain what the point of the CLSCTX_XXX_SERVER options, you understand how COM activation works. In general, there are two cases: CLSCTX_INPROC_SERVER and CLSCTX_LOCAL_SERVER (there’s a 3rd option, CLSCTX_INPROC_HANDLER, which is half-way between the two as well).

    So what does happen when you call CoCreateInstance() to instantiate a class? It turns out that this is documented with the documentation for the CLSCTX parameter (go figure that one out), but in a nutshell (ignoring several issues like COM object activation security), the following happens:

    For both cases, the first thing that COM does is to open HKEY_CLASSES_ROOT(HKCR)\CLSID\{<STRINGIZED-GUID-OF-RCLSID>}.

    If the user specified CLSCTX_INPROC_SERVER (or CLSCTX_INPROC_HANDLER), COM looks for an InprocServer32 key under the classid. If the InprocServer32 key is found, then the default value of the key specifies the DLL to be loaded. The ThreadingModel value in that key specifies the desired threading model for the object. If the threading model for the object is compatible with the threading model for the thread (see "What are these "Threading Models" and why do I care?" for more info on threading models), then the DLL is loaded. Once the DLL’s been loaded, COM calls GetProcAddress() to find the address of the DllGetClassObject routine for the DLL. The DllGetClassObject routine’s job is to return an instance of an object that implements IClassFactory for all the classes supported by that DLL.

    If the user specified CLSCTX_LOCAL_SERVER, it means that the client wants to contact an out-of-proc COM server. For an out-of-proc object, there are a gain two possible choices – the COM server could be implemented by an NT service, or it could be implemented by a local executable.

    In both cases, COM needs to know activation information about the object, so it looks to see if there’s an APPID value associated with the class. If there IS an APPID value, then COM opens HKCR\APID\{<STRINGIZED-GUID-OF-APPID>}. Under the APPID key, a number of values are located – the first is the AccessPermission value, which contains the security descriptor that’s used to determine if the caller is allowed to access the COM object. The second is the LaunchPermission, which determines if the application is allowed to load the class factory for this class. Also located under the APPID is the RunAs key which specifies the security principal under which the server should be run. At some point in the future I’ll talk about these keys, but they’re beyond the scope of this article. The last piece of information that’s retrieved from the APPID is the LocalService value, which indicates the service that will handle the COM object.

    To handle the first case, the first thing that COM does is to look for a LocalService key. If the LocalService Key is found (or if a LocalService key was found under the APPID), then COM attempts to start the service specified (if it’s not already started).

    If there’s no LocalService key, then, COM looks for the LocalServer32 key. If it finds the key, again, the default value of the key specifies an executable name, and the ThreadingModel value specifies the threading model for the object. COM then launches the application (this is where COM checks for CLSCTX_DISABLE_AAA and CLSCTX_ENABLE_AAA).

    Once the service (or executable) comes up, it calls CoRegisterClassObject(). The call to CoRegisterClassObject informs the DCOM service (via RPC) of the address of a class factory for a particular ClassID.

    In both out-of-proc cases, the COM client next attempts to connect to the DCOM service, which looks up the class object in its table of class factories and returns a pointer to that class factory to the client.

    Now that the client has a class factory for the COM object (either by retrieving it locally or via the DCOM service), the client can call into the class factory to retrieve an instance of the class in question. It calls IClassFactory::CreateInstance which will then instantiate the object for the application. And finally CoCreateInstance will return that pointer to the client application.

    One thing to note: some COM objects have both in-proc and out-of-proc implementations (NetMeeting is an example of this). In those cases, a client could request that the service run either in OR out-of-proc, transparently.

    Oh, one thing I didn’t mention above is the CLSCTX_ALL definition, which is declared in objbase.h – CLSCTX_ALL is simply a combination of CLSCTX_INPROC_SERVER|CLSCTX_INPROC_HANDLER|CLSCTX_LOCAL_SERVER – it tries all the above and goes with the one that works

     

  • Larry Osterman's WebLog

    What does style look like, part 4

    • 33 Comments
    Continuing the discussion of "style"..

    Yesterday, I showed the code reformatted into "BSD" style format, which looks like:

    #include "list.h"
    main(C cArg, SZ rgszArg[])
    {
        I iNode;
        I cNodes = atoi(rgszArg[1]);
        I cNodesToSkip = atoi(rgszArg[2]);
        PNODE pnodeT;
        PNODE pnodeCur;
        InitNodes(cNodes);
        for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
        {
            pnodeT = PnodeNew(iNode);
            InsertNext(pnodeCur, pnodeT);
            pnodeCur = pnodeT;
        }
        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur);
            }
            FreeNode(PnodeDeleteNext(pnodeCur));
        }
        printf("%d\n", Item(nodeCur));
    }

    I chose one of the variants that I personally find most attractive, any one of them could work. The thing about this example is that there are no comments.  That's reasonable given that it's an example from a textbook, and as such the words in the textbook that accompany the example suffice as the documentation.  But all code begs to be documented, so lets see what happens.

    When you look at documenting a function, there are a couple of things to keep in mind when considering style (there are other important things, like the relevance of the comments, etc, but this post's about style).

    The first question that comes to mind, almost immediately is "What commenting form should you use?"  For C/C++, there are a number of options.

    First off, C and C++ support two different styles of comment.  The first is the traditional  /* */ style of comments.  And there's the C++ style // comment.  Each of the two comments have different flavors.

    When I was in college, I loved using some of the options that /* */ comments enabled.  My code was just full of things like:

             :
             :
        }

        /*************************************************************/
        /*                                                           */
        /*                                                           */
        /*   Loop through the nodes looking for the current node.    */
        /*                                                           */
        /*                                                           */
        /*************************************************************/
        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
             :
             :
     

    I also had this thing about putting a comment on every line, especially for assignments.  And every comment lined up on column 40 if possible.

             :
             :
        }

        /*************************************************************/
        /*                                                           */
        /*                                                           */
        /*   Loop through the nodes looking for the current node.    */
        /*                                                           */
        /*                                                           */
        /*************************************************************/
        while (pnodeCur != PnodeNext(x))    // Keep looping until PnodeNext == pnodeCur
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++) // Skip through cNodesToSkip nodes
             :                             // A comment for this line.
             :                             // A comment for that line.
     

    When I look back at my old code, the code screams "1980's", just like the hairdos of "A Flock of Seagulls" does.  But there are times when big honking block comments like that are appropriate, like when you're pointing out something that's REALLY important - like when you're about to do something that would cause Raymond Chen to swear like a sailor when he runs into your application.

    Also, while I'm on the subject of comments on every line.  There ARE situations where you want a comment on every line.  For example, if you're ever forced to write assembly language (for any assembly language), you really MUST comment every line.  It's far too easy to forget which registers are supposed to contain what contents, so adding comments on every line is essential to understanding the code.

    /* */ comments have the disadvantage that they take up a fair amount of space.  This means that they increase the effective line length.  // comments don't have this problem.  On the other hand, /* */ comments stand out more.

    Personally I feel that both comment styles should be used.  One fairly effective form that I've seen is:

             :
             :
        }
        /*
         * Loop through the nodes until you hit the current node.
         */
        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur);  // Skip to the next node.
            }
             :
             :
     

    Of course that's not very different from:

             :
             :
        }

        // Loop through the nodes until you hit the current node.
        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur);  // Skip to the next node.
            }
             :
             :
     

    On the other hand, the first form (/**/) stands out more, simply because it has more white space.  You can achieve the same whitespace effect with // comments:

             :
             :
        }
        //
        // Loop through the nodes until you hit the current node.
        //
        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur);  // Skip to the next node.
            }
             :
             :
    Another aspect of commenting style is the effective line length.  When I learned to program, I learned on 80 character wide terminals, which imposed a fairly strict limit on the number of columns in the code - anything longer than 80 characters was not OK.  Nowadays, that isn't as important, but it's still important to limit source code lines to a reasonable number, somewhere around 100 works for me, but of course your experience might be different.

    The other major thing about comments is the content of the comments.  Comments can be the bane of a maintainers existence or their savior, depending on the comment.

    I don't know how many times I've come across the following and cringed:

        //
        // Increment i.
        //
        i = i + 1;
     

    Yes, the line's commented, but how useful is the comment?  Especially since it's a block comment (block comments have more importance than in-line comments by virtue of the amount of real estate they occupy - the very size of the comment gives it weight.  In general, you want to use in-line comments for little things and block comments for the big stuff.  I almost never use in-line comments, I prefer to use block comments at the appropriate locations and let the code speak for itself.

    I've mentioned white space as an aspect of style before, but this is the time to bring it up.  Consider the example routine with some white space added to stretch the code out a bit:

    #include "list.h"
    main(C cArg, SZ rgszArg[])
    {
        I iNode;
        I cNodes = atoi(rgszArg[1]);
        I cNodesToSkip = atoi(rgszArg[2]);
        PNODE pnodeT;
        PNODE pnodeCur;

        InitNodes(cNodes);

        for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
        {
            pnodeT = PnodeNew(iNode);
            InsertNext(pnodeCur, pnodeT);
            pnodeCur = pnodeT;
        }

        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur);
            }
            FreeNode(PnodeDeleteNext(pnodeCur));
        }

        printf("%d\n", Item(nodeCur));
    }

    It's not a major change, but to me, by just inserting 4 empty lines, the code's been made clearer. 

    The other aspect of white space is intra-expression white space.  This is one of the aspects of style that tends to get religious.  I've seen people who do:

        for ( iNode = 2 , pnodeCur = PnodeNew( 1 ) ; iNode <= cNodes ; iNode ++ )
    And others who prefer:

        for (iNode=2,pnodeCur=PnodeNew(1);iNode<=cNodes;iNode++)
    Or:

        for (iNode=2, pnodeCur=PnodeNew(1); iNode<=cNodes; iNode++)
    Or:

        for ( iNode=2, pnodeCur=PnodeNew(1) ; iNode<=cNodes ; iNode++ )
    The reality is that it doesn't really matter which form you use, as long as you're consistent.

    Lets see what happens to the sample routine if you insert some block comments...

    #include "list.h"
    main(C cArg, SZ rgszArg[])
    {
        I iNode;
        I cNodes = atoi(rgszArg[1]);
        I cNodesToSkip = atoi(rgszArg[2]);
        PNODE pnodeT;
        PNODE pnodeCur;

        InitNodes(cNodes);

        //
        // Create a list of cNodes nodes. 
        // 
        for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
        {
            pnodeT = PnodeNew(iNode);
            InsertNext(pnodeCur, pnodeT);
            pnodeCur = pnodeT;
        }

        //
        // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.
        //
        while (pnodeCur != PnodeNext(x))
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur);
            }
            FreeNode(PnodeDeleteNext(pnodeCur));
        }

        //
        // Print out the value of the current node.
        //
        printf("%d\n", Item(nodeCur));
    }

    To me, that's a huge improvement.  By stretching out the code and adding some comments, it's already starting to look better.

    Again, just for grins, here's how it would look in my "1980's style":

    #include "list.h"
    main(C cArg, SZ rgszArg[])
    {
        I iNode;                              // Node index.
        I cNodes = atoi(rgszArg[1]);          // Set the number of nodes
        I cNodesToSkip = atoi(rgszArg[2]);    // and the nodes to skip.
        PNODE pnodeT;                         // Declare a temporary node.
        PNODE pnodeCur;                       // And the current node.

        InitNodes(cNodes);                    // Initialize the nodes database
                                              // with cNodes elements

        /*************************************************************/
        /*                                                           */
        /*                                                           */
        /*               Create a list of cNodes nodes.              */
        /*                                                           */
        /*                                                           */
        /*************************************************************/
        for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
        {
            pnodeT = PnodeNew(iNode);        // Allocate a new node.
            InsertNext(pnodeCur, pnodeT);    // Insert it after the current node
            pnodeCur = pnodeT;               // Current = New.
        }

        /*************************************************************/
        /*                                                           */
        /*                                                           */
        /*   Walk the list of nodes, freeing the node that occurs    */
        /*   at every cNodesToSkip nodes in the list.                */
        /*                                                           */
        /*                                                           */
        /*************************************************************/
        while (pnodeCur != PnodeNext(x))    // While not at the current node
        {
            for (iNode = 1; iNode < cNodesToSkip ; iNode++)
            {
                pnodeCur = PnodeNext(pnodeCur); // Current = current->Next
            }
            FreeNode(PnodeDeleteNext(pnodeCur)); // Free current->Next
        }

        /*************************************************************/
        /*                                                           */
        /*                                                           */
        /*          Print out the value of the current node.         */
        /*                                                           */
        /*                                                           */
        /*************************************************************/
        printf("%d\n", Item(nodeCur));
    }

    Yech.

    Tomorrow: function and file headers.

  • Larry Osterman's WebLog

    One in a million is next Tuesday

    • 9 Comments

    Back when I was a wee young lad, fresh from college, I thought I knew everything there was to know.

     

    I’ve since been disabused of that notion, rather painfully.

    One of the best happened very early on, back when I was working on DOS 4.  We ran into some kind of problem (I’ll be honest and say that I don’t remember what it was). 

    I was looking into the bug with Gordon Letwin, the architect for DOS 4.  I looked at the code and commented “Maybe this is what was happening?  But if that were the case, it’d take a one in a million chance for it to happen”.

    Gordon’s response was simply: “In our business, one in a million is next Tuesday”.

    He then went on to comment that at the speeds which modern computers operate (4.77 MHz remember), things happened so quickly that something with a one in a million chance of occurrence is likely to happen in the next day or so.

    I’m not sure I’ve ever received better advice in my career. 

    It has absolutely stood the test of time – no matter how small the chance of something happening, with modern computers and modern operating systems, essentially every possible race condition or deadlock will be found within a reasonable period of time.

    And I’ve seen some absolute doozies in my time – race conditions on MP machines where a non interlocked increment occurred (one variant of Michael Grier’s “i = i + 1” bug).   Data corruptions because you have one non protected access to a data structure.  I’m continually amazed at the NT scheduler’s uncanny ability to context switch my application at just the right time as to expose my data synchronization bug.  Or to show just how I can get my data structures deadlocked in hideous ways.

    So nowadays, whenever anyone comments on how unlikely it is for some event to occur, my answer is simply: “One in a million is next Tuesday”.

    Edit: To fix the spelling of MGrier's name.

    Edit:  My wife pointed out the following and said it belonged with this post: http://www.jumbojoke.com/000036.html

  • Larry Osterman's WebLog

    Larry goes to Layer Court

    • 29 Comments
    Two weeks ago, my boss, another developer in my group, and I had the opportunity to attend "Layer Court".

    Layer Court is the end product of a really cool part of the quality gate process we've introduced for Windows Vista.  This is a purely internal process, but the potential end-user benefits are quite cool.

    As systems get older, and as features get added, systems grow more complex.  The operating system (or database, or whatever) that started out as a 100,000 line of code paragon of elegant design slowly turns into fifty million lines of code that have a distinct resemblance to a really big plate of spaghetti.

    This isn't something specific to Windows, or Microsoft, it's a fundamental principal of software engineering.  The only way to avoid it is extreme diligence - you have to be 100% committed to ensuring that your architecture remains pure forever.

    It's no secret that regardless of how architecturally pure the Windows codebase was originally, over time, lots of spaghetti-like issues have crept into the  product over time.

    One of the major initiatives that was ramped up with the Longhorn Windows Vista reset was the architectural layering initiative.  The project had existed for quite some time, but with the reset, the layering team got serious.

    What they've done is really quite remarkable.  They wrote tools that perform static analysis of the windows binaries and they work out the architectural and engineering dependencies between various system components.

    These can be as simple as DLL dependencies (program A references DLLs B and C, DLL B references DLL D, DLL D in turn references DLL C), they can be as complicated as RPC dependencies (DLL A has a dependency on process B because DLL A contacts an RPC server that is hosted in process B).

    The architectural layering team then went out and assigned a number to every single part of the system starting at ntoskrnl.exe (which is the bottom, at layer 0).

    Everything that depended only on ntoskrnl.exe (things like win32k.sys or kernel32.dll) was assigned layer 1 , the pieces that depend on those (for example, user32.dll) got layer 2, and so forth (btw, I'm making these numbers up - the actual layering is somewhat more complicated, but this is enough to show what's going on).

    As long as the layering is simple, this is pretty straightforward.  But then the spaghetti problem starts to show up.  Raymond may get mad, but I'm going to pick on the shell team as an example of how a layering violation can appear.  Consider a DLL like SHELL32.DLL.  SHELL32 contains a host of really useful low level functions that are used by lots of applications (like PathIsExe, for example).  These functions do nothing but string manipulation of their input functions, so they have virtually no lower level dependencies.   But other functions in SHELL32 (like DefScreenSaverProc or DragAcceptFiles) manipulate windows and interact with large number of lower components.  As a result of these high level functions, SHELL32 sits relatively high in the architectural layering map (since some of its functions require high level functionality).

    So if relatively low level component (say the Windows Audio service) calls into SHELL32, that's what is called a layering violation - the low level component has taken an architectural dependency on a high level component, even if it's only using the low level functions (like PathIsExe). 

    They also looked for engineering dependencies - when low level component A gets code that's delivered from high level component B - the DLLs and other interfaces might be just fine, but if a low level component A gets code from a higher level component, it still has a dependency on that higher level component - it's a build-time dependency instead of a runtime dependency, but it's STILL a dependency.

    Now there are times when low level components have to call into higher level components - it happens all the time (windows media player calls into skins which in turn depend on functionality hosted within windows media player).  Part of the layering work was to ensure that when this type of violation occurred that it fit into one of a series of recognized "plug-in" patterns - the layering team defined what were "recognized" plug-in design patterns and factored this into their analysis.

    The architectural layering team went through the entire Windows product and identified every single instance of a layering violation.  They then went to each of the teams, in turn and asked them to resolve their dependencies (either by changing their code (good) or by explaining why their code matches the plugin pattern (also good), or by explaining the process by which their component will change to remove the dependency (not good, because it means that the dependency is still present)).   For this release, they weren't able to deal with all the existing problems, but at least they are preventing new ones from being introduced.  And, since there's a roadmap for the future, we can rely on the fact that things will get better in the future.

    This was an extraordinarily painful process for most of the teams involved, but it was totally worth the effort.  We now have a clear map of which Windows components call into which other Windows components.  So if a low level component changes, we can now clearly identify which higher level components might be effected by that change.  We finally have the ability to understand how changes ripple throughout the system, and more importantly, we now have mechanisms in place to ensure that no lower level components ever take new dependencies on higher level components (which is how spaghetti software gets introduced).

    In order to ensure that we never introduce a layering violation that isn't understood, the architectural layering team has defined a "quality gate" that ensures that no new layering violations are introduced into the system (there are a finite set of known layering violations that are allowed for a number of reasons).  Chris Jones mentioned "quality gates" in his Channel9 video, essentially they are a series of hurdles that are placed in front of a development team - the team is not allowed to check code into the main Windows branches unless they have met all the quality gates.  So by adding the architectural layering quality gate, the architectural layering team is drawing a line in the sand to ensure that no new layering violations ever get added to the system.

    So what's this "layer court" thingy I talked about in the title?  Well, most of the layering issues can be resolved via email, but for some set of issues, email just doesn't work - you need to get in front of people with a whiteboard so you can draw pretty pictures and explain what's going on.  And that's where we were two weeks ago - one of the features I added for Beta2 restored some functionality that was removed in Beta1, but restoring the functionality was flagged as a layering violation.  We tried, but were unable to resolve it via email, so we had to go to explain what we were doing and to discuss how we were going to resolve the dependency.

    The "good" news (from our point of view) is that we were able to successfully resolve the issue - while we are still in violation, we have a roadmap to ensure that our layering violation will be fixed in the next release of Windows.  And we will be fixing it :)

     

  • Larry Osterman's WebLog

    Useful service tricks - Debugging service startup

    • 22 Comments
    For the better part of the past 15 years, I've been working on one or another services for the Windows platform (not always services for windows, but always services ON windows).

    Over that time, I've developed a bag of tricks for working with services, I mentioned one of them here.  Here's another.

    One of the most annoying things to have to debug is a problem that occurs during service startup.  The problem is that you can't attach a debugger to the service until it's started, but if the service is failing during startup, that's hard.

    It's possible to put a Sleep(10000) to cause your service startup to delay for 10 seconds (which gives you time to attach the debugger during start), that usually works, but sometimes service startup failures only happen on boot (for autostart services).

    First off, before you start, you need to have a kernel debugger attached to your computer, and you need the debugging tools for windows (this gets you the command line debuggers).  I'm going to assume the debuggers are installed into "C:\Debuggers", obviously you need to adjust this for your local machine.

    One thing to keep in mind: As far as I know, you need have the kernel debugger hooked up to debug service startup issues (you might be able to use ntsd.exe hooked up for remote debugging but I'm not sure if that will work). 

    This of course begs the next question: "The kernel debugger?  Why on earth do I need a kernel debugger when I'm debugging user mode code?".  You're completely right.  But in this case, you're not actually using the kernel debugger.  Instead, you're running using a user mode debugger (ntsd.exe in my examples) that's running over the serial port using facilities that are enabled by the kernel debugger.  It's not quite the same thing.

    There are multiple reasons for using a debugger that's redirected to a kernel debugger.  First off, if your service is an autostart service, it's highly likely that it starts long before the a user logs on.  So an interactive debugger won't really be able to debug the application.  Secondly, services by default can't interact with the desktop (heck, they often run in a different TS session from the user (this is especially true in Vista, but it's also true on XP with Fast User Switching), so they CAN'T interact with the desktop).  That means that when the debugger attempts to interact with the user, it can't because it flat-out can't because the desktop is sitting in a different TS session.

    There are a couple of variants of this trick, all of which should work.

    Lets start with the simplest:

    If your service runs with a specific binary name, you can use the Image File Execution Options registry key (documented here) to launch your executable under the debugger.  The article linked shows how to launch using Visual Studio, for a service, you want to use the kernel debugger, so instead of using "devenv /debugexe" for the value, use "C:\Debuggers\NTSD.EXE -D", that will redirect the output to the kernel debugger.

     

    Now for a somewhat more complicated version - You can ask the service controller to launch the debugger for you.  This is useful if your service is a shared service, or if it lives in an executable that's used for other purposes (if you use a specific -service command line switch to launch your exe as a service, for example).

    This one's almost easier than the first.

    From the command line, simply type:

    sc config <your service short name> binpath= "c:\debuggers\ntsd.exe -d <path to your service executable> <your service executable options>

     

    Now restart your service and it should pick up the change.

     

    I suspect it's possible to use the ntsd.exe as a host process for remote debugging, I've never done that (I prefer assembly language debugging when I'm using the kernel debugger), so I don't feel comfortable describing how to set it up :(

    Edit: Answered Purplet's question in the comments (answered it in the post because it was something important that I left out of the article).

    Edit2: Thanks Ryan.  s/audiosrv/<your service>/

     

  • Larry Osterman's WebLog

    A Tree Grows... How?

    • 100 Comments
    Valorie's currently attending a seminar about teaching science at the middle-school level.

    Yesterday, her instructor asked the following question:

    "I have in my hand a Douglass Fir tree seed that masses 1 gram [I'm making this number up, it's not important].  I plant it in my yard, water it regularly, and wait for 20 years.

    At the end of that time, I have a 50 foot tree that masses 1,000 kilograms [again, I'm making this exact number up, it's not important].

     

    My question is: Where did the 999,999 grams of mass come from?"

     

    I'm going to put the answer out to the group.  Where DID the 999,999 grams of mass come from in the tree.

    The answer surprises a lot of people.  And it brings to question how much we actually know about science.

     

     

    I'm going to change my comment moderation policy for this one.  I'm NOT going to approve people whose comments have the right answer, until I post my follow-up post tomorrow, because once the right answer's been given, it's pretty clear.  But I'll be interested in knowing the percentage of comments that have the right answer vs. the wrong answer.

     

  • Larry Osterman's WebLog

    Threat Modeling, once again

    • 23 Comments

    About 2.5 years ago, I wrote a series of articles about how we threat model at Microsoft, about 18 months ago, I made a couple of updates to it, including a post about why we threat model at Micrososoft, and a review of how the process has changed over the years.

    It's threat modeling time again in my group (it seems to happen about once every 18 months or so, as you can see from my post history :)), and as the designated security nutcase in my group, I've been spending a LOT of time thinking about the threat modeling process as we're practicing it nowadays.  It's been interesting looking at my old posts to see how my own opinions on threat modeling have changed, and how Microsoft's processes have changed (we've gotten a LOT better at the process).

    One thing that was realized very early on is that our early efforts at threat modeling were quite ad-hoc.  We sat in a room and said "Hmm, what might the bad guys do to attack our product?" It turns out that this isn't actually a BAD way of going about threat modeling, and if that's all you do, you're way better off than you were if you'd done nothing.

    Why doesn't it work?  There are a couple of reasons:

    1. It takes a special mindset to think like a bad guy.  Not everyone can switch into that mindset.  For instance, I can't think of the number of times I had to tell developers on my team "It doesn't matter that you've checked the value on the client, you still need to check it on the server because the client that's talking to your server might not be your code.".
    2. Developers tend to think in terms of what a customer needs.  But many times, the things that make things really cool for a customer provide a superhighway for the bad guy to attack your code. 
    3. It's ad-hoc.  Microsoft asks every single developer and program manager to threat model (because they're the ones who know what the code is doing).  Unfortunately that means that they're not experts on threat modeling. Providing structure helps avoid mistakes.

    So how do we go about threat modeling?

    Well, as the fictional Maria Von Trapp said in her famous introductory lesson to solfege, "Let's start at the very beginning, A very good place to start"...

     

    One of the key things we've learned during the process is that having a good diagram is key to a good threat model.  If you don't have a good diagram, you probably don't have a good threat model.

    So how do you go about writing a good diagram?

    The first step is to draw a whiteboard diagram of the flow of data in your component.  Please note: it's the DATA flow you care about, NOT the code flow.  Your threats come via data, NOT code.  This is the single most common mistake that people make when they start threat modeling (it's not surprising, because as developers, we tend to think about code flow).

    When you're drawing the whiteboard diagram, I use the following elements (you can choose different elements, the actual image doesn't matter, what matters is that you define a common set of elements for each type):

    Element Image Element Type What the heck is this thing?
    image External Interactor An external interactor is an element that is outside your area of control.  It could be a user calling into an API, it could be another component but not one that's being threat modeled.  For example, if you're threat modeling an API, than the application which invoked the API is an external entity.  On the other hand, if you're threat modeling an application that calls into an API, the API is an external entity
    image Process A "process" is simply some code.  It does NOT mean that it's a "process" as OS's call processes, instead it's just a collection of code.
    image Multiple Process A "multiple process" is used when your threat model is complex enough to require multiple DFDs (this is rarely the case, but does happen).  In that case, the "multiple process" is expanded in the other DFD.  I'm not sure I've ever seen a threat model that used a "multiple process" element - you can usually break out everything that you want to break down, so they're very rarely seen.
    image Data Store A datastore is something that holds data.  It could be a file, a registry key, or even a shared memory region.
    image Data Flow A dataflow represents the flow of data through the system.  Please note that it does NOT represent the flow of code, but that of data.
    image Trust Boundary A trust boundary occurs when one component doesn't trust the component on the other side of the boundary.  There is always a trust boundary between elements running at different privilege levels, but there sometimes are trust boundaries between different components running at the same privilege level.

    image

    Machine Boundary A machine boundary occurs when data moves from one machine to another.
    image Process Boundary A process boundary occurs when data moves from one OS process to another. 

     

    You build a data flow diagram by connecting the various elements by data flows, inserting boundaries where it makes sense between the elements.

     

     Now that we have a common language, we can using it to build up a threat model.

    Tomorrow: Drawing the DFD.

  • Larry Osterman's WebLog

    Another pet peeve. Nounifying the word "ask"

    • 63 Comments

    Sorry about not blogging, my days are filled with meetings trying to finish up our LH beta2 features - I can't wait until people see this stuff, it's that cool.

    But because I'm in meetings back-to-back (my calender looks like a PM's these days), I get subjected to a bunch of stuff that I just hate.

    In particular, one "meme" that seems to have taken off here at Microsoft is nounifying the word "ask".

    I can't tell how many times I've been in a meeting and had someone say: "So what are your teams asks for this feature?" or "Our only ask is that we have the source process ID added to this message".

    For the life of me, I can't see where this came from, but it seems like everyone's using it.

    What's wrong with the word "request"?  It's a perfectly good noun and it means the exact same thing that a nounified "ask" means.

     

  • Larry Osterman's WebLog

    Structured Exception Handling Considered Harmful

    • 33 Comments

    I could have sworn that I wrote this up before, but apparently I’ve never posted it, even though it’s been one of my favorite rants for years.

    In my “What’s wrong with this code, Part 6” post, several of the commenters indicated that I should be using structured exception handling to prevent the function from crashing.  I couldn’t disagree more.  In my opinion, SEH, if used for this purpose takes simple, reproducible and easy to diagnose failures and turns them into hard-to-debug subtle corruptions.

    By the way, I’m far from being alone on this.  Joel Spolsky has a rather famous piece “Joel on Exceptions” where he describes his take on exception (C++ exceptions).  Raymond has also written about exception handling (on CLR exceptions).

    Structured exception handling is in many ways far worse than C++ exceptions.  There are multiple ways that structured exception handling can truly mess up an application.  I’ve already mentioned the guard page exception issue.  But the problem goes further than that.  Consider what happens if you’re using SEH to ensure that your application doesn’t crash.  What happens when you have a double free?  If you don’t wrap the function in SEH, then it’s highly likely that your application will crash in the heap manager.  If, on the other hand, you’ve wrapped your functions with try/except, then the crash will be handled.  But the problem is that the exception caused the heap code to blow past the release of the heap critical section – the thread that raised the exception still holds the heap critical section. The next attempt to allocate memory on another thread will deadlock your application, and you have no way of knowing what caused it.

    The example above is NOT hypothetical.  I once spent several days trying to track down a hang in Exchange that was caused by exactly this problem – Because a component in the store didn’t want to crash the store, they installed a high level exception handler.  That handler caught the exception in the heap code, and swallowed it.  And the next time we came in to do an allocation, we hung.  In this case, the offending thread had exited, so the heap critical section was marked as being owned by a thread that no longer existed.

    Structured exception handling also has performance implications.  Structured exceptions are considered “asynchronous” by the compiler – any instruction might cause an exception.  As a result of this, the compiler can’t perform flow analysis in code protected by SEH.  So the compiler disables many of its optimizations in routines protected by try/catch (or try/finally).  This does not happen with C++ exceptions, by the way, since C++ exceptions are “synchronous” – the compiler knows if a method can throw (or rather, the compiler can know if an exception will not throw).

    One other issue with SEH was discussed by Dave LeBlanc in Writing Secure Code, and reposted in this article on the web.  SEH can be used as a vector for security bugs – don’t assume that because you wrapped your function in SEH that your code will not suffer from security holes.  Googling for “structured exception handling security hole” leads to some interesting hits.

    The bottom line is that once you’ve caught an exception, you can make NO assumptions about the state of your process.  Your exception handler really should just pop up a fatal error and terminate the process, because you have no idea what’s been corrupted during the execution of the code.

    At this point, people start screaming: “But wait!  My application runs 3rd party code whose quality I don’t control.  How can I ensure 5 9’s reliability if the 3rd party code can crash?”  Well, the simple answer is to run that untrusted code out-of-proc.  That way, if the 3rd party code does crash, it doesn’t kill YOUR process.  If the 3rd party code is processing a request crashes, then the individual request fails, but at least your service didn’t go down in the process.  Remember – if you catch the exception, you can’t guarantee ANYTHING about the state of your application – it might take days for your application to crash, thus giving you a false sense of robustness, but…

     

    PS: To make things clear: I’m not completely opposed to structured exception handling.  Structured exception handling has its uses, and it CAN be used effectively.  For example, all NT system calls (as opposed to Win32 APIs) capture their arguments in a try/except handler.  This is to guarantee that the version of the arguments to the system call that is referenced in the kernel is always valid – there’s no way for an application to free the memory on another thread, for example.

    RPC also uses exceptions to differentiate between RPC initiated errors and function return calls – the exception is essentially used as a back-channel to provide additional error information that could not be provided by the remoted function.

    Historically (I don’t know if they do this currently) the NT file-systems have also used structured exception handling extensively.  Every function in the file-systems is protected by a try/finally wrapper, and errors are propagated by throwing exception this way if any code DOES throw an exception, every routine in the call stack has an opportunity to clean up its critical sections and release allocated resources.  And IMHO, this is the ONLY way to use SEH effectively – if you want to catch exceptions, you need to ensure that every function in your call stack also uses try/finally to guarantee that cleanup occurs.

    Also, to make it COMPLETELY clear.  This post is a criticism of using C/C++ structured exception handling as a way of adding robustness to applications.  It is NOT intended as a criticism of exception handling in general.  In particular, the exception handling primitives in the CLR are quite nice, and mitigate most (if not all) of the architectural criticisms that I’ve mentioned above – exceptions in the CLR are synchronous (so code wrapped in try/catch/finally can be optimized), the CLR synchronization primitives build exception unwinding into the semantics of the exception handler (so critical sections can’t dangle, and memory can’t be leaked), etc.  I do have the same issues with using exceptions as a mechanism for error propagation as Raymond and Joel do, but that’s unrelated to the affirmative harm that SEH can cause if misused.

  • Larry Osterman's WebLog

    What's this untitled slider doing on the Vista volume mixer?

    • 80 Comments

    Someone sent the following screen shot to one of our internal troubleshooting aliases.  They wanted to know what the "Name Not Available" slider meant.

    clip_image002[7]

     

    The audio system on Vista keeps track of the apps that are playing sounds (it has to, to be able to display the information on what apps are playing sounds :)).  It keeps this information around for a period of time after the application has made the sound to enable the scenario where your computer makes a weird sound and you want to find out which application made the noise.

    The system only keeps track of the PID for each application, it's the responsibility of the volume mixer to convert the PID to a reasonable name (the audio service can't track this information because of session 0 isolation).

    This works great, but there's one possible problem: If an application exits between the time when the application made a noise and the system times out the fact that it played the noise, then the volume mixer has no way of knowing what the name of the application that made the noise was. In that case, it uses the "Name Not Available" text to give the user some information.

  • Larry Osterman's WebLog

    Private Destructors

    • 17 Comments
    Yesterday, I mentioned that Michael Ruck had complained that I'd made the destructor on my CFooBase class private, and he wondered why on earth I had done it.

    Skywing answered in the comments but it bears a bit more discussion.

    The simple answer to why I made the destructor private was that I didn't want anyone to be able to destroy the object.

    ????

    That's right.  You see, CFooBase is a reference counted objects.  And you NEVER, EVER want to allow someone to delete a reference counted object.

    This is because you don't own the lifetime of the reference counted object.  Really, you don't.  You own the lifetime of your reference to the object, but you have no way of controlling who else is taking a reference to the object.  So the object may live well after you're done with it.

    For example, consider the following case:

    void MyFunction(void)
    {
        CFooBase *myFoo;

        myFoo = new CFooBase();
        <Do Some Stuff>
        delete myFoo;
    }

    Seems pretty straightforward, right?

    Well, no.  The reason is that you have no idea what happened in the <Do Some Stuff> section.  For example, consider:

    void MyFunction(void)
    {
        CFooBase *myFoo;

        myFoo = new CFooBase();
        hr = RegistrationFunction->RegisterForNotifications(myFoo);  // RegistrationFunction takes a reference.
        <Do Some Stuff>
        hr = RegistrationFunction->UnregisterForNotifications(myFoo); // Releases the reference taken earlier
        delete myFoo;
    }

    What's wrong here?  Well, what happens if a notification was being processed during the call to UnregisterForNotifications?  In that case, the notification logic would take ANOTHER reference to the myFoo object (to ensure that the object remains alive during the duration of the callback).  But by deleting the myFoo directly, you're deleting the object out from under the registration function.

    If, on the other hand, you make the destructor for myFoo private, then the call to delete myFoo returns an error, which forces you to rewite the code to look like:

    void MyFunction(void)
    {
        CFooBase *myFoo;

        myFoo = new CFooBase();
        hr = RegistrationFunction->RegisterForNotifications(myFoo);  // RegistrationFunction takes a reference.
        <Do Some Stuff>
        hr = RegistrationFunction->UnregisterForNotifications(myFoo); // Releases the reference taken earlier
        myFoo->Release();    // Remove my reference to the myFoo
    }

    In other words, making the destructor private forces you to use the correct release pattern for refcounted object.

    Of course, the next problem that comes up is the question of deterministic finalism - if the object in question is holding some external resource open and you need to ensure that it's closed its resources.

    Well, the CLR IDisposable pattern comes in quite handy here.  That allows the caller to notify the object that it's done with the object.  Of course, it's also responsible for dealing with the consequences...

    The bottom line is that once you decide to use reference counted objects, you don't control the lifetime of the object, all you do is control the lifetime of a reference to the object.  And declaring the destructor private forces you to recognise this.

  • Larry Osterman's WebLog

    What’s the difference between GetTickCount and timeGetTime?

    • 26 Comments

    I’ve always believed that the most frequently used multimedia API in winmm.dll was the PlaySound API.  However I recently was working with the results of some static analysis tools that were run on the Windows 7 codebase and I realized that in fact the most commonly used multimedia API (in terms of code breadth) was actually the timeGetTime API.  In fact almost all the multimedia APIs use timeGetTime which was somewhat surprising to me at the time.

    The MSDN article for timeGetTime says that timeGetTime “retrieves the system time, in milliseconds. The system time is the time elapsed since the system started.”.

    But that’s almost exactly what the GetTickCount API returns “the number of milliseconds that have elapsed since the system was started, up to 49.7 days.” (obviously timeGetTime has the same 49.7 day limit since both APIs return 32bit counts of milliseconds).

    So why are all these multimedia APIs using timeGetTime and not GetTickCount since the two APIs apparently return the same value?  I wasn’t sure so I dug in a bit deeper.

    The answer is that they don’t.  You can see this with a tiny program:

    int _tmain(int argc, _TCHAR* argv[])
    {
        int i = 100;
        DWORD lastTick = 0;
        DWORD lastTime = 0;
        while (--i)
        {
            DWORD tick = GetTickCount();
            DWORD time = timeGetTime();
            printf("Tick: %d, Time: %d, dTick: %3d, dTime: %3d\n", tick, time, tick-lastTick, time-lastTime);
            lastTick = tick;
            lastTime = time;
            Sleep(53);
        }
        return 0;
    }

    If you run this program, you’ll notice that the difference between the timeGetTime results is MUCH more stable than the difference between the GetTickCount results (note that the program sleeps for 53ms which usually doesn’t match the native system timer resolution):

    Tick: 175650292, Time: 175650296, dTick:  46, dTime:  54
    Tick: 175650355, Time: 175650351, dTick:  63, dTime:  55
    Tick: 175650417, Time: 175650407, dTick:  62, dTime:  56
    Tick: 175650464, Time: 175650462, dTick:  47, dTime:  55
    Tick: 175650526, Time: 175650517, dTick:  62, dTime:  55
    Tick: 175650573, Time: 175650573, dTick:  47, dTime:  56
    Tick: 175650636, Time: 175650628, dTick:  63, dTime:  55
    Tick: 175650682, Time: 175650683, dTick:  46, dTime:  55
    Tick: 175650745, Time: 175650739, dTick:  63, dTime:  56
    Tick: 175650792, Time: 175650794, dTick:  47, dTime:  55
    Tick: 175650854, Time: 175650850, dTick:  62, dTime:  56

    That’s because GetTickCount is incremented by the clock tick frequency on every clock tick and as such the delta values waver around the actual time (note that the deltas average to 55ms so on average getTickCount returns an accurate result but not with spot measurements) but timeGetTime’s delta is highly predictable.

    It turns out that for isochronous applications (those that depend on clear timing) it is often important to be able to retrieve the current time in a fashion that doesn’t vary, that’s why those applications use timeGetTime to achieve their desired results.

  • Larry Osterman's WebLog

    Where do "checked" and "free" come from?

    • 26 Comments

    People who have MSDN or the DDK know that Windows is typically built in two different flavors, "Checked" and "Free".  The primary difference between the two is that the "checked" build has traces and asserts, but the free build doesn't.

    Where did those names "checked" and "free" come from?  It's certainly not traditional, the traditional words are "Debug" and "Retail" (or "Release").

    When we were doing the initial development of Windows NT, we started by using the same "Debug" and "Retail" names that most people use.

    The thing is, it turns out that there are actually four different sets of options that make up the "Debug" and "Retail" split.

    You have:

    1. Compiler Optimization: On/Off
    2. Debug Traces: On/Off
    3. Assertions: Enabled/Disabled
    4. Sanity checks: Enabled/Disabled

    Traditionally, "Debug" is "Optimization:off, Traces:on, Assertions: on" and "Retail" is "Optimization:on, Traces:off, Assertions: off".  Sanity checks was something the NT team added.  The idea was that there would be additional sanity checks built in for our internal development that would be removed before we shipped.

    So the NT build team wanted to build "Optimization:on, Traces:on, Assertions: on, sanity checks:on" and "Optimizations:on, traces:off, assertions: off, sanity checks: on" and "optimizations:on, traces:off, assertions:off, sanity checks: off".

    The last was what was traditionally called "Retail" - no debugging whatsoever.  However, the question still remained - what to call the "O:on, T:on, A:on, S:on" and "O:on, T:off, A:off, S:on" build - the first wasn't "Debug" because the optimizer was enabled, the latter wasn't "Retail", since the sanity checks were enabled.

    So clearly there needed to be some other name to differentiate these cases.  After some internal debate, we settled on "Checked" for the "O:on, T:on, A:on, S:on" and "Free" for the "O:on, T:off, A:off, S:on" build.  Checked because it had all the checks enabled, and free because it was "check free".

    And as the NT 3.1 project progressed, the team eventually realized that (a) since they'd never actually tested the "retail" build, they had no idea what might break when they started making builds, and (b) since they had perf tested the free build and it met the perf criteria, the team eventually decided to ship the free build as the final version.

     

     

  • Larry Osterman's WebLog

    Windows Vista Sound causes Network Throughput slowdowns.

    • 62 Comments

    AKA: How I spent last week :).

    On Tuesday Morning last week, I got an email from "reader@slashdot.org":

    You've probably already seen this article, but just in case I'd love to hear your response.

    http://it.slashdot.org/article.pl?sid=07/08/21/1441240

    Playing Music Slows Vista Network Performance?

    In fact, I'd not seen this until it was pointed out to me.  It seemed surprising, so I went to talk to our perf people, and I ran some experiments on my own.

    They didn't know what was up, and I was unable to reproduce the failure on any of my systems, so I figured it was a false alarm (we get them regularly).  It turns out that at the same time, the networking team had heard about the same problem and they WERE able to reproduce the problem.  I also kept on digging and by lunchtime, I'd also generated a clean reproduction of the problem in my office.

    At the same time, Adrian Kingsley-Hughes over at ZDNet Blogs picked up the issue and started writing about the issue.

    By Friday, we'd pretty much figured out what was going on and why different groups were seeing different results - it turns out that the issue was highly dependent on your network topology and the amount of data you were pumping through your network adapter - the reason I hadn't been able to reproduce it is that I only have a 100mbit Ethernet adapter in my office - you can get the problem to reproduce on 100mbit networks, but you've really got to work at it to make it visible.  Some of the people working on the problem sent a private email to Adrian Kingsley-Hughes on Friday evening reporting the results of our investigation, and Mark Russinovich (a Technical Fellow, and all around insanely smart guy) wrote up a detailed post explaining what's going on in insane detail which he posted this morning.

    Essentially, the root of the problem is that for Vista, when you're playing multimedia content, the system throttles incoming network packets to prevent them from overwhelming the multimedia rendering path - the system will only process 10,000 network frames per second (this is a hideously simplistic explanation, see Mark's post for the details)

    For 100mbit networks, this isn't a problem - it's pretty hard to get a 100mbit network to generate 10,000 frames in a second (you need to have a hefty CPU and send LOTS of tiny packets), but on a gigabit network, it's really easy to hit the limit.

     

    One of the comments that came up on Adrian's blog was a comment from George Ou (another zdnet blogger):

    ""The connection between media playback and networking is not immediately obvious. But as you know, the drivers involved in both activities run at extremely high priority. As a result, the network driver can cause media playback to degrade."


    I can't believe we have to put up with this in the era of dual core and quad core computers. Slap the network driver on one CPU core and put the audio playback on another core and problem solved. But even single core CPUs are so fast that this shouldn't ever be a problem even if audio playback gets priority over network-related CPU usage. It's not like network-related CPU consumption uses more than 50% CPU on a modern dual-core processor even when throughput hits 500 mbps. There’s just no excuse for this."

    At some level, George is right - machines these days are really fast and they can do a lot.  But George is missing one of the critical differences between multimedia processing and other processing.

    Multimedia playback is fundamentally different from most of the day-to-day operations that occur on your computer. The core of the problem is that multimedia playback is inherently isochronous. For instance, in Vista, the audio engine runs with a periodicity of 10 milliseconds. That means that every 10 milliseconds, it MUST wake up and process the next set of audio samples, or the user will hear a "pop" or “stutter” in their audio playback. It doesn’t matter how fast your processor is, or how many CPU cores it has, the engine MUST wake up every 10 milliseconds, or you get a “glitch”.

    For almost everything else in the system, if the system locked up for even as long as 50 milliseconds, you’d never notice it. But for multimedia content (especially for audio content), you absolutely will notice the problem. The core reason behind it has to do with the physics of sound, but whenever there’s a discontinuity in the audio stream, a high frequency transient is generated. The human ear is quite sensitive to these high frequency transients (they sound like "clicks" or "pops"). 

    Anything that stops the audio engine from getting to run every 10 milliseconds (like a flurry of high priority network interrupts) will be clearly perceptible. So it doesn’t matter how much horsepower your machine has, it’s about how many interrupts have to be processed.

    We had a meeting the other day with the networking people where we demonstrated the magnitude of the problem - it was pretty dramatic, even on the top-of-the-line laptop.  On a lower-end machine it's even more dramatic.  On some machines, heavy networking can turn video rendering to a slideshow.

     

    Any car buffs will immediately want to shoot me for this analogy, because I’m sure it’s highly inaccurate (I am NOT a car person), but I think it works: You could almost think of this as an engine with a slip in the timing belt – you’re fine when you’re running the engine at low revs, because the slip doesn’t affect things enough to notice. But when you run the engine at high RPM, the slip becomes catastrophic – the engine requires that the timing be totally accurate, but because it isn’t, valves don’t open when they have to and the engine melts down.

     

    Anyway, that's a long winded discussion.  The good news is that the right people are actively engaged on working to ensure that a fix is made available for the problem.

  • Larry Osterman's WebLog

    FPO

    • 22 Comments

    I was chatting with one of the perf guys last week and he mentioned something that surprised me greatly.  Apparently he's having perf issues that appear to be associated with a 3rd party driver.  Unfortunately, he's having problems figuring out what's going wrong because the vendor wrote the driver used FPO (and hasn't provided symbols), so the perf guy can't track the root cause of the problem.

    The reason I was surprised was that I didn't realize that ANYONE was using FPO any more.

    What's FPO?

    To know the answer, you have to go way back into prehistory.

    Intel's 8088 processor had an extremely limited set of registers (I'm ignoring the segment registers), they were:

    AX BX CX DX IP
    SI DI BP SP FLAGS

    With such a limited set of registers, the registers were all assigned specific purposes.  AX, BX, CX, and DX were the "General Purpose" registers, SI and DI were "Index" registers, SP was the "Stack Pointer", BP was the "Frame Pointer", IP was the "Instruction Pointer", and FLAGS was a read-only register that contained several bits that were indicated information about the processors' current state (whether the result of the previous arithmetic or logical instruction was 0, for instance).

    The BX, SI, DI and BP registers were special because they could be used as "Index" registers.  Index registers are critically important to a compiler, because they are used to access memory through a pointer.  In other words, if you have a structure that's located at offset 0x1234 in memory, you can set an index register to the value 0x1234 and access values relative to that location.  For example:

    MOV    BX, [Structure]
    MOV    AX, [BX]+4

    Will set the BX register to the value of the memory pointed to by [Structure] and set the value of AX to the WORD located at the 4th byte relative to the start of that structure.

    One thing to note is that the SP register wasn't an index register.  That meant that to access variables on the stack, you needed to use a different register, that's where the BP register came from - the BP register was dedicated to accessing values on the stack.

    When the 386 came out, they stretched the various registers to 32bits, and they fixed the restrictions that only BX, SI, DI and BP could be used as index registers.

    EAX EBX ECX EDX EIP
    ESI EDI EBP ESP FLAGS

    This was a good thing, all of a sudden, instead of being constrained to 3 index registers, the compiler could use 6 of them.

    Since index registers are used for structure access, to a compiler they're like gold - more of them is a good thing, and it's worth almost any amount of effort to gain more of them.

    Some extraordinarily clever person realized that since ESP was now an index register the EBP register no longer had to be dedicated for accessing variables on the stack.  In other words, instead of:

    MyFunction:
        PUSH    EBP
        MOV     EBP, ESP
        SUB      ESP, <LocalVariableStorage>
        MOV     EAX, [EBP+8]
          :
          :
        MOV     ESP, EBP
        POP      EBP
        RETD

    to access the 1st parameter on the stack (EBP+0 is the old value of EBP, EBP+4 is the return address), you can instead do:

    MyFunction:
        SUB      SP, <LocalVariableStorage>
        MOV     EAX, [ESP+4+<LocalVariableStorage>]
          :
          :
        ADD     SP, <LocalVariableStorage>
        RETD

    This works GREAT - all of a sudden, EBP can be repurposed and used as another general purpose register!  The compiler folks called this optimization "Frame Pointer Omission", and it went by the acronym FPO.

    But there's one small problem with FPO.

    If you look at the pre-FPO example for MyFunction, you'd notice that the first instruction in the routine was PUSH EBP followed by a MOV EBP, ESP.  That had an interesting and extremely useful side effect.  It essentially created a singly linked list that linked the frame pointer for each of the callers to a function.  From the EBP for a routine, you could recover the entire call stack for a function.  This was unbelievably useful for debuggers - it meant that call stacks were quite reliable, even if you didn't have symbols for all the modules being debugged.  Unfortunately, when FPO was enabled, that list of stack frames was lost - the information simply wasn't being tracked.

    To solve the is problem, the compiler guys put the information that was lost when FPO was enabled into the PDB file for the binary.  Thus, when you had symbols for the modules, you could recover all the stack information.

    FPO was enabled for all Windows binaries in NT 3.51, but was turned off for Windows binaries in Vista because it was no longer necessary - machines got sufficiently faster since 1995 that the performance improvements that were achieved by FPO weren't sufficient to counter the pain in debugging and analysis that FPO caused.

     

    Edit: Clarified what I meant by "FPO was enabled in NT 3.51" and "was turned off in Vista", thanks Steve for pointing this out.

  • Larry Osterman's WebLog

    Interacting with Services

    • 15 Comments
    In the comments for my first services post, someone asked about the SERVICE_INTERACTIVE_PROCESS flag that can be specified for the CreateService API.

    This flag allows the user to specify that the service should be allowed to interact with the logged on user.  The idea of an interactive service was added back in NT 3.1 for components like printer drivers that want to pop up UI.

    IMHO this was a spectacularly bad idea that should never have been added to the system.

    MSDN has an entire page on interactive services, unfortunately IMHO, it doesn't go into enough detail as to why it's a bad idea to ever specify the SERVICE_INTERACTIVE_PROCESS flag on a service.

    The primary reason for this being a bad idea is that interactive services enable a class of threats known as "Shatter" attacks (because they "shatter windows", I believe). 

    If you do a search for "shatter attack", you can see some details of how these security threats work.  Microsoft also published KB article 327618 which extends the documentation about interactive services, and Michael Howard wrote an article about interactive services for the MSDN Library.  Initially the shatter attacks went after windows components that had background window message pumps (which have long been fixed), but they've also been used to attack 3rd party services that pop up UI.

    The second reason it's a bad idea is that the SERVICE_INTERACTIVE_PROCESS flag simply doesn't work correctly.  The service UI pops up in the system session (normally session 0).  If, on the other hand, the user is running in another session, the user never sees the UI.  There are two main scenarios that have a user connecting in another session - Terminal Services, and Fast User Switching.  TS isn't that common, but in home scenarios where there are multiple people using a single computer, FUS is often enabled (we have 4 people logged in pretty much all the time on the computer in our kitchen, for example).

    The third reason that interactive services is a bad idea is that interactive services aren't guaranteed to work with Windows Vista :)  As a part of the security hardening process that went into Windows Vista, interactive users log onto sessions other than the system session - the first interactive user runs in session 1, not session 0.  This has the effect of totally cutting shatter attacks off at the knees - user apps can't interact with high privilege windows running in services. 

     

    On the other hand, sometimes it's important to interact with the logged on user.  How do you deal with this problem?  There are a couple of suggestions as to how to resolve the issue.  The first is to use the CreateProcessAsUser API to create a process on the users desktop.  Since the new process is running in the context of the user, privilege elevation attacks don't apply.  Another variant of this solution is to use an existing systray process to communicate with the service.

    In addition, if a COM object is marked as running in the security context of the interactive user, it will be activated in the interactive user's session.  You can use a session moniker to start a COM object in a particular session.  There's an example of how to do this here.

     

  • Larry Osterman's WebLog

    What IS audio on a PC anyway?

    • 39 Comments

    This may be well known, but maybe not (I didn’t understand it until I joined the Windows Audio team).

    Just what is digital audio, anyway?  Well, at its core, all of digital audio is a “pop” sound made on the speaker.  When you get right down to it, that’s all it is.  A “sound” in digital audio is a voltage spike applied to a speaker jack, with a specific amplitude.  The amplitude determines how much the speaker diaphragm moves when the signal is received by the speaker.

    That’s it, that’s all that digital audio is – it’s a “pop” noise.  The trick that makes it sound like Sondheim is that you make a LOT of pops every second – thousands and thousands of pops per second.  When you make the pops quickly enough, your ear puts the pops together to turn them into a discrete sound.  You can hear a simple example of this effect when you walk near a high voltage power transformer.  AC power in the US runs at 60 cycles per second, and as the transformer works, it emits a noise on each cycle.  The brain smears that 60 Hz sound together and turns it into the “hum” that you hear near power equipment.

    Another way of thinking about this (thanks Frank) is to consider the speaker on your home stereo.  As you’re listening to music, if you pull the cover off the speaker, you can see the cone move in and out with the music.  Well, if you were to take a ruler and measure the displacement of the cone from 0, the distance that it moves from the origin is the volume of the pop.  Now start measuring really fast – thousands of times a second.  Your collected measurements make up an encoded representation of the sound you just heard.

    To play back the audio, take your measurements, and move the cone the same amount, and it will reproduce the original sound.

    Since a picture is worth a thousand words, Simon Cooke was gracious enough to draw the following...

    Take an audio signal, say a sine wave:

    Then, you sample the sine wave (in this case, 16 samples per frequency):

    Each of the bars under the sine wave is the sample.  When you play back the samples, the speaker will reproduce the original sound.  One thing to keep in mind (as Simon commented) is that the output waveform doesn't look quite like the stepped function that the samples would generate.  Instead, after the Digital-to-Audio-Converter (DAC) in the sound card, there's a low pass filter that smooths the output of the signal.

    When you take an analog audio signal, and encode it in this format, it’s also known as “Pulse Coded Modulation”, or “PCM”.  Ultimately, all PC audio comes out in PCM, that’s typically what’s sent to the sound card when you’re playing back audio.

    When an analog signal is captured (in a recording studio, for example), the volume of the signal is sampled at some frequency (typically 44.1 kHz for CD audio).  Each of the samples is captured with a particular range of amplitudes (or quantization).  For CD audio, the quantization is 16 bits, in two samples.  Obviously, this means that each sample has one of at most 65,536 values, which is typically enough for most audio applications.  Since the CD audio is stereo, there are two 16 bit values for each sample. 

    Other devices, like telephones, on the other hand, typically uses 8 bit samples, and acquires their samples at 8kHz – that’s why the sound quality on telephone communications is so poor (btw, telephones don’t actually use direct 8 bit samples, instead their data stream is compressed using a format called mu-law (or a-law in Europe), or G.711).  On the other hand, the bandwidth used by typical telephone communication is significantly lower than CD audio – CD audio’s bandwidth is 44,100*16*2=1.35Mb/second, or 176KB/second.  The bandwidth of a telephone conversation is 64Kb/second, or 8KB/second (reduced to from 3.2Kb/s to 11Kb/s with compression), an order of magnitude lower.  When you’re dealing with low bandwidth networks like the analog phone network or wireless networks, this reduction in bandwidth is critical.

    It’s also possible to sample at higher frequencies and higher sample sizes.  Some common sample sizes are 20bits/sample and 24bits/sample.  I’ve also seen 96.2 kHz sample frequencies and sometimes even higher.

    When you’re ripping your CDs, on the other hand, it’s pointless to rip them at anything other than 44.1 kHz, 16 bit stereo, there’s nothing you can do to improve the resolution.  There ARE other forms of audio that have a higher bit rate, for example, DVD-Audio allows samples at 44.1, 48, 88.2, 96, 176.4 or 192 kHz, and sample sizes of 16, 20, or 24 bits/sample, with up to 6 96 kHz audio channels or 2 192 kHz samples.

    One thing to realize about PCM audio is that it’s extraordinarily sparse – there is a huge amount of compression that can be done to the data to reduce the size of the audio data.  But in most cases, when the data finally hits your sound card, it’s represented as PCM data (this isn’t always the case, for example, if you’re using the SPDIF connector on your sound card, then the data sent to the card isn’t PCM).

    Edit: Corrected math slightly.

    Edit: Added a couple of pictures (Thanks Simon!)

    Edit3: Not high pass, low pass filter, thanks Stefan.

  • Larry Osterman's WebLog

    Microsoft Anti-Spyware

    • 23 Comments

    I don't normally do "Me Too" posts, and I know that this one will get a lot of coverage on the Microsoft blogs, but the Seattle PI blog just mentioned that the beta of Microsoft's new anti-spyware solution was just released to the web here.

    I installed it on my machines at work yesterday, and it seems pretty nice so far.  Of course I didn't have any spyware for it to find (because I'm pretty darned careful, and run as a limited user), but...  It'll be interesting to run it at home, especially since Valorie (and I) like playing some of the online games (like Popcap's) that get singled out as being spyware by some tools.

     

    I have no knowledge of their final product plans, so it's pointless to ask.  All I know about this is what I've read in the press.

     

  • Larry Osterman's WebLog

    So what's wrong with DRM in the platform anyway?

    • 53 Comments

    As I said yesterday, it's going to take a bit of time to get the next article in the "cdrom playback" series working, so I thought I'd turn the blog around and ask the people who read it a question.

    I was reading Channel9 the other day, and someone turned a discussion of longhorn into a rant against the fact that Longhorn's going to be all about DRM (it's not, there will be DRM support in Longhorn, just like there has been DRM support in just about every version of Windows that's distributed windows media format).

    But I was curious.  Why is it so evil that a platform contain DRM support?

    My personal opinion is that DRM is a tool for content producers.  Content Producers are customers, just like everyone else that uses our product is a customer.  They want a platform that provides content protection.  You can debate whether or not that is a reasonable decision, but it's moot - the content producers today want it.

    So Microsoft, as a platform vendor provides DRM for the content producers.  If we didn't, they wouldn't use our media formats, they'd find some other media format that DOES have DRM support for their content.

    The decision to use (or not use) DRM is up to the content producer.  It's their content, they can decide how to distribute it.  You can author and distribute WMA/WMV files without content protection - all my ripped CDs are ripped without content protection (because I don't share them).  I have a bunch of WMV files shot on the camcorder that aren't DRM'ed - they're family photos, there's no point in using rights management.

    There are professional content producers out there that aren't using DRM for their content (Thermal and a Quarter is a easy example I have on the tip of my tongue (as I write this, they've run out of bandwidth :( but...)).  And there are content producers that are using DRM.

    But why is it evil to put the ability to use DRM into the product?

  • Larry Osterman's WebLog

    Threat Modeling Again, Threat Modeling Rules of Thumb

    • 12 Comments

    I wrote this piece up for our group as we entered the most recent round of threat models.  I've cleaned it up a bit (removing some Microsoft-specific stuff), and there's stuff that's been talked about before, but the rest of the document is pretty relevant. 

     

    ---------------------------------------

    As you go about filling in the threat model threat list, it’s important to consider the consequences of entering threats and mitigations.  While it can be easy to find threats, it is important to realize that all threats have real-world consequences for the development team.

    At the end of the day, this process is about ensuring that our customer’s machines aren’t compromised. When we’re deciding which threats need mitigation, we concentrate our efforts on those where the attacker can cause real damage.

     

    When we’re threat modeling, we should ensure that we’ve identified as many of the potential threats as possible (even if you think they’re trivial). At a minimum, the threats we list that we chose to ignore will remain in the document to provide guidance for the future. 

     

    Remember that the feature team can always decide that we’re ok with accepting the risk of a particular threat (subject to the SDL security review process). But we want to make sure that we mitigate the right issues.

    To help you guide your thinking about what kinds of threats deserve mitigation, here are some rules of thumb that you can use while performing your threat modeling.

    1. If the data hasn’t crossed a trust boundary, you don’t really care about it.

    2. If the threat requires that the attacker is ALREADY running code on the client at your privilege level, you don’t really care about it.

    3. If your code runs with any elevated privileges (even if your code runs in a restricted svchost instance) you need to be concerned.

    4. If your code invalidates assumptions made by other entities, you need to be concerned.

    5. If your code listens on the network, you need to be concerned.

    6. If your code retrieves information from the internet, you need to be concerned.

    7. If your code deals with data that came from a file, you need to be concerned (these last two are the inverses of rule #1).

    8. If your code is marked as safe for scripting or safe for initialization, you need to be REALLY concerned.

     

    Let’s take each of these in turn, because there are some subtle distinctions that need to be called out.

    If the data hasn’t crossed a trust boundary, you don’t really care about it.

    For example, consider the case where a hostile application passes bogus parameters into our API. In that case, the hostile application lives within the same trust boundary as the application, so you can simply certify the threat. The same thing applies to window messages that you receive. In general, it’s not useful to enumerate threats within a trust boundary. [Editors Note: Yesterday, David LeBlanc wrote an article about this very issue - I 100% agree with what he says there.] 

    But there’s a caveat (of course there’s a caveat, there’s ALWAYS a caveat). Just because your threat model diagram doesn't have a trust boundary on it, it doesn't mean that the data being validated hasn't crossed a trust boundary on the way to your code.

    Consider the case of an application that takes a file name from the network and passes that filename into your API. And further consider the case where your API has an input validation bug that causes a buffer overflow. In that case, it’s YOUR responsibility to fix the buffer overflow – an attacker can use the innocent application to exploit your code. Before you dismiss this issue as being unlikely, consider CVE-2007-3670. The Firefox web browser allows the user to execute scripts passed in on the command line, and registered a URI handler named “firefoxurl” with the OS with the start action being “firefox.exe %1” (this is a simplification). The attacker simply included a “firefoxurl:<javascript>” in a URL and was able to successfully take ownership of the client machine. In this case, the firefox browser assumed that there was no trust boundary between firefox.exe and the invoker, but it didn’t realize that it introduced such a trust boundary when it created the “firefoxurl” URI handler.

    If the threat requires that the attacker is ALREADY running code on the client at your privilege level, you don’t really care about it.

    For example, consider the case where a hostile application writes values into a registry key that’s read by your component. Writing those keys requires that there be some application currently running code on the client, which requires that the bad guy first be able to get code to run on the client box.

    While the threats associated with this are real, it’s not that big a problem and you can probably state that you aren’t concerned by those threats because they require that the bad guy run code on the box (see Immutable Law #1: “If a bad guy can persuade you to run his program on your computer, it’s not your computer anymore”).

    Please note that this item has a HUGE caveat: it ONLY applies if the attacker’s code is running at the same privilege level as your code. If that’s not the case, you have the next rule of thumb:

    If your code runs with any elevated privileges, you need to be concerned.

    We DO care about threats that cross privilege boundaries. That means that any data communication between an application and a service (which could be an RPC, it could be a registry value, it could be a shared memory region) must be included in the threat model.

    Even if you’re running in a low privilege service account, you still may be attacked – one of the privileges that all services get is the SE_IMPERSONATE_NAME privilege. This is actually one of the more dangerous privileges on the system because it can allow a patient attacker to take over the entire box. Ken “Skywing” Johnson wrote about this in a couple of posts on his blog (1 and 2) on his excellent blog Nynaeve. David LeBlanc has a subtly different take on this issue (see here), but the reality is that both David and Ken agree more than they disagree on this issue. If your code runs as a service, you MUST assume that you’re running with elevated privileges. This applies to all data read – rule #2 (requiring an attacker to run code) does not apply when you cross privilege levels, because the attacker could be writing code under a low privilege account to enable an elevation of privilege attack.

    In addition, if your component has a use scenario that involves running the component elevated, you also need to consider that in your threat modeling.

    If your code invalidates assumptions made by other entities, you need to be concerned

    The reason that the firefoxurl problem listed above was such a big deal was that the firefoxurl handler invalidated some of the assumptions made by the other components of Firefox. When the Firefox team threat modeled firefox, they made the assumption that Firefox would only be invoked in the context of the user.  As such it was totally reasonable to add support for executing scripts passed in on the command line (see rule of thumb #1).  However, when they threat modeled the firefoxurl: URI handler implementation, they didn’t consider that they had now introduced a trust boundary between the invoker of Firefox and the Firefox executable.  

    So you need to be aware of the assumptions of all of your related components and ensure that you’re not changing those assumptions. If you are, you need to ensure that your change doesn’t introduce issues.

    If your code retrieves information from the internet, you need to be concerned

    The internet is a totally untrusted resource (no duh). But this has profound consequences when threat modeling. All data received from the Internet MUST be treated as totally untrusted and must be subject to strict validation.

    If your code deals with data that came from a file, then you need to be concerned.

    In the previous section, I talked about data received over the internet. Microsoft has issued several bulletins this year that required an attacker tricking a user into downloading a specially crafted file over the internet; as a consequence, ANY file data must be treated as potentially malicious. For example, MS07-047 (a vulnerability in WMP) required that the attacker force the user to view a specially crafted WMP skin. The consequence of this is that that ANY file parsed by our code MUST be treated as coming from a lower level of trust.

    Every single file parser MUST treat its input as totally untrusted –MS07-047 is only one example of an MSRC vulnerability, there have been others. Any code that reads data from a file MUST validate the contents. It also means that we need to work to ensure that we have fuzzing in place to validate our mitigations.

    And the problem goes beyond file parsers directly. Any data that can possibly be read from a file cannot be trusted. <A senior developer in our division> brings up the example of a codec as a perfect example. The file parser parses the container and determines that the container isn't corrupted. It then extracts the format information and finds the appropriate codec for that format. The parser then loads the codec and hands the format information and file data to the codec.

    The only thing that the codec knows is that the format information that’s been passed in is valid. That’s it. Beyond the fact that the format information is of an appropriate size and has a verifiable type, the codec can make no assumptions about the contents of the format information, and it can make no assumptions about the file data. Even though the codec doesn’t explicitly parse the file, it’s still dealing with untrusted data read from the file.

    If your code is marked as “Safe For Scripting” or “Safe for Initialization”, you need to be REALLY concerned.

    If your code is marked as “Safe For Scripting” (or if your code can be invoked from a control that is marked as Safe For Scripting), it means that your code can be executed in the context of a web browser, and that in turn means that the bad guys are going to go after your code. There have been way too many MSRC bulletins about issues with ActiveX controls.

    Please note that some of the issues with ActiveX controls can be quite subtle. For instance, in MS02-032 we had to issue an MSRC fix because one of the APIs exposed by the WMP OCX returned a different error code if a path passed into the API was a file or if it was a directory – that constituted an Information Disclosure vulnerability and an attacker could use it to map out the contents of the users hard disk.

    In conclusion

    Vista raised the security bar for attackers significantly. As Vista adoption spreads, attackers will be forced to find new ways to exploit our code. That means that it’s more and more important to ensure that we do a good job ensuring that they have as few opportunities as possible to make life difficult for our customers.  The threat modeling process helps us understand the risks associated with our features and understand where we need to look for potential issues.

  • Larry Osterman's WebLog

    Laptops and Kittens....

    • 38 Comments
    I mentioned the other day that we have four cats currently.  Three of them are 18 month old kittens (ok, at 18 months, they're not kittens anymore, but we still refer to them as "the kittens").

    A while ago, one of them (Aphus, we believe) discovered that if they batted at Valorie's laptop, they could remove the keys from the laptop, and the laptop keys made great "chase" toys.  Valorie has taken to locking her laptop up in a tiny computer nook upstairs as a result, but even with that, they somehow made off with her "L" key.  We've not been able to find it even after six months of looking.  To get her computer up and running, we replaced the "L" key with the "windows" key. Fortunately she's a touch typist, and thus never looks at her keyboard - when she does, she freaks out.

    Last night, I left a build running on my laptop when I went to bed.  Valorie mentioned that it would probably be a bad idea to do this, since the kittens were on the loose.

    Since I couldn't close the laptop without shutting down the build, I hit on what I thought was a great solution.  I put the laptop in two plastic bags, one on each side of the laptop (sorry about the mess on the table :)):

    I went to bed confident that I'd outsmarted the kittens.  My laptop would remain safe.

    Well, this morning, I got up, and went downstairs (you can see Sharron's breakfast cereal on the table to the top right).  I asked the kids if there had been any problems, and Daniel, with his almost-teenager attitude said "Yeah, the kittens scattered the keys on your laptop all over the kitchen".

    I  figured he was just twitting me, until I went to check on the computer...

    Oh crud...

    There were the keys, sitting in a pile where Sharron had collected them...

    I love my cats, I really do...

    The good news is that I managed to find all the keys, although I was worried about the F8 key for a while.

  • Larry Osterman's WebLog

    Why I removed the MSN desktop search bar from IE

    • 16 Comments

    I was really quite excited to see that the MSN Desktop Search Team had finally released the final version of their MSN Desktop Search toolbar.

    I've been using it for quite a while, and I've been really happy with it (except for the minor issue that the index takes up 220M of virtual memory, but that's just VA - the working set of the index is quite reasonable).

    So I immediately downloaded it and enabled the toolbar on IE.

    As often happens with toolbars, the toolbar was in the wrong place.  No big deal, I unlocked the toolbar and repositioned it to where I want it (immediately to the right of the button bar, where it takes up less real-estate).

    Then I locked the toolbar.  And watched as the MSN desktop search toolbar repositioned itself back where it was originally.

    I spent about 10 minutes trying to figure out a way of moving the desktop search bar next to the button bar, to no success.  By positioning it in the menu bar, I was able to get it to move into the button bar when I locked the toolbar, but it insisted on being positioned to the left of the button bar, not the right.

    Eventually I gave up.  I'm not willing to give up 1/4 inch of screen real-estate to an IE toolbar - it doesn't give me enough value to justify the real-estate hit.

    Sorry guys.  I'm still using the desktop search stuff (it's very, very cool), including the taskbar toolbar, but not the IE toolbar.  I hate it when my toolbars have a mind of their own.

    Update: Someone on the CLR team passed on a tip: The problem I was having is because I run as a limited user.  But it turns out that if you exit IE and restart it, the toolbar sticks where you put it!

    So the toolbar's back on my browser.

  • Larry Osterman's WebLog

    Still more misinformation about virtual memory and paging files

    • 26 Comments

    The wired network in my building's being unusually flakey so I'm posting this from my laptop, sorry for the brevety..

    Slashdot had a front page story today about an article be Adrian Wong posted in his Rojak Pot: "Virtual Memory Optimization Guide".

    I've not finished reading it (the site's heavily slashdotted), but his first paragraph got me worried:

    Back in the 'good old days' of command prompts and 1.2MB floppy disks, programs needed very little RAM to run because the main (and almost universal) operating system was Microsoft DOS and its memory footprint was small. That was truly fortunate because RAM at that time was horrendously expensive. Although it may seem ludicrous, 4MB of RAM was considered then to be an incredible amount of memory.

    4MB of RAM?  Back in the "good old days" of 1.2MB floppy disks (those were the 5 1/4" floppy drives in the PC/AT) the most RAM that could be addressed by a DOS based computer was 1M.  If you got to run Xenix-286, you got a whopping 16M of physical address space.

    I was fuming by the time I'd gotten to the first sentence paragraph of the first section:

    Whenever the operating system has enough memory, it doesn't usually use virtual memory. But if it runs out of memory, the operating system will page out the least recently used data in the memory to the swapfile in the hard disk. This frees up some memory for your applications. The operating system will continuously do this as more and more data is loaded into the RAM.

    This is SO wrong on so many levels.  It might have been be true for an old (OS8ish) Mac, but it's not been true for any version of Windows since Windows 95.  And even for Windows 1.0, the memory manager didn't operate in that manner (it it was a memory manager but it didn't use virtual memory (it was always enabled and active swapping data in and out of memory, but the memory manager didn't use the hardware (since there wasn't any hardware memory management for Windows 1.0))).

    It REALLY disturbs me when articles like this get distributed.  Because it shows that the author fundimentally didn't understand what he's writing about (sort-of like what happens when I write about open source :) - at least nobody's ever quoted me as an authority on that particular subject)

    Edit: I'm finally at home, and I've had a chance to read the full article.  I've not changed my overall opinion of the article, as a primer on memory management, it's utterly pathetic (and dangerously incorrect).  Having said that, the recommendations for improving the performance of your paging file are roughly the same as I'd come up with if I was writing the article.  Most importantly, he differentiates between the difference between having a paging file on a partition and on a separate drive, and he adds some important information on P-ATA and RAID drive performance characteristics that I wouldn't have included if I was writing the article.  So if you can make it past the first 10 or so pages, the article's not that bad.

     

Page 3 of 33 (815 items) 12345»