April, 2004

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Units of measurement

    • 17 Comments

     Whenever people find out that I work at Microsoft, invariably the next question they ask is “Have you met Bill?” (The next question is: “So what’s with the stock?” – as if I had a magic 8-ball to tell them).

    I’ve met Bill socially a couple of times (at various company functions); he doesn’t know who I am thoughJ.  But there was one memorable meeting I attended with him.

    It was back in 1986ish; we were presenting the plans for Lan Manager 1.0 to him.  One portion of the meeting was about my component, DOS Lan Manager (basically an enhanced version of the MS-NET redirector, with support for a fair number of the Lan Manager APIs on the client).  My boss and I were given the job of presenting the data for that portion.

    One of the slides (not Powerpoint, it didn’t exist at the time – Lucite slides on an overhead projector) we had covered the memory footprint of the DOS Lan Manager redirector.

    For DOS LM 1.0, the redirector took up 64K of RAM.

    And Bill went ballistic.

    “What do you mean 64K?  When we wrote BASIC, it only took up 8K of RAM.  What the f*k do you think idiots think you’re doing?  Is this thing REALLY 8 F*ing BASIC’s?”

    The only answer we could give him was “Yes”J.

    To this day, I sometimes wonder if he complains that Windows XP is “16,000 F*ing BASIC’s”.

    Edit: To add what we finally did with DOS Lan Manager's memory footprint. 

    We didn't ignore Bill's comment, btw.  We worked on reducing the footprint of the DOS redirector by first moving the data into LIM Expanded memory, next by moving the code into expanded memory.  For LAN Manager 2.1, we finally managed to reduce the below 640K footprint of the DOS redirector to 128 bytes.  It took a lot of work, and some truely clever programming, but it did work.

     

  • Larry Osterman's WebLog

    How much is too much?

    • 8 Comments

    There was a fascinating discussion on one of our internal mailing lists earlier today, towards the tail end of it, someone asked what the memory management guys would do when machines started coming with 8TB of memory.

    5 minutes later came the following response:

    There are already TPCC benchmarks being done with 1TB of RAM.

    I bet we will run out of 8TB for giant machines sometime in 2008.5, if Moore’s Law applies to memory capacity (double once every 18 months).

    Gleep!

    I find the concept of a machine with more than 8G of RAM to be mind boggling.  And this guy's saying that big machines will be running with 8TB of of RAM in four years.  That's 8,000,000,000,000 bytes of RAM (give or take a couple of gigabytes).  8,000 GIGABYTES of RAM.

    We truely do live in remarkable times.

     

  • Larry Osterman's WebLog

    Choosing a C runtime library

    • 28 Comments

    Yesterday a developer in my group came by asking about a failure he saw when running the application verifier on his component.  The app verifier was reporting that he was using a HEAP_NO_SERIALIZE heap from a thread other than the one that created the heap.

    I looked a bit deeper and realized that he was running with the single threaded statically linked C runtime library.  An honest mistake, given that it’s the default version of the C runtime library.

    You see, there are 3 different versions of the C runtime library shipped (and 3 different versions of the ATL and MFC libraries too). 

    The first is the statically linked single-threaded library.  This one can be used only on single threaded applications, and all the object code for the C runtime library functions used is included in the application binary.  You get this with the /ML compiler switch.

    The second is the statically linked, multi-threaded library.  This one’s the same as the first, but you can use it in a multithreaded application.  You get this one with the /MT compiler switch.

    The third is the dynamically linked library.  This one keeps all the C runtime library code in a separate DLL (MSVCRTxx.DLL).  Since the runtime library code’s in a DLL, it also handles multi-threaded issues.   The DLL library is enabled with the /MD switch.

    But I’ve been wondering.  Why on earth would anyone ever choose any option OTHER than multi-threaded DLL version of the runtime library?

    There are LOTS of reasons for always using the multithreaded DLL:

    1)      Your application is smaller because it doesn’t have the C runtime library loaded into it.

    2)      Because of #1, your application will load faster.  The C runtime library is almost certainly in memory, so the pages containing the library don’t have to be read from disk.

    3)      Using the multithreaded DLL future-proofs your application.  If you ever add a second thread to your application (or call into an API that creates multiple threads), you don’t have to remember to change your C runtime library.  And unless you’re running the app verifier regularly, the only way you’ll find out about the problem is if you get a heap corruption (if you’re lucky).

    4)      If your application has multiple DLL’s, then you need to be VERY careful about allocation – each DLL will have its own C runtime library heap, as will the application.  If you allocate a block in one DLL, you must free it in the same DLL.

    5)      If a security bug is ever found in the C runtime library, you don’t have to release an update to your app.

    The last one’s probably the most important IMHO.  Just to be clear - There haven’t been any security holes found in the C runtime library.  But it could happen.  And when it happens, it’s pretty ugly.  A really good example of this can be seen with the security vulnerability that was found in the zlib compression library. This library was shipped in dozens of products, and every single one of them had to be updated.  If you do a google search for “zlib library security vulnerability” you can see some of the chaos that resulted from this disclosure.  If your app used the DLL C runtime library, then you’d get the security fix for free from windows update when Microsoft posted the update.

    The only arguments I’ve been able to come up with for using the static C runtime libraries are:

    1)      I don’t have to distribute two binaries with my application – If I use the DLL, I need to redistribute the DLL.  This makes my application setup more complicated.

    Yes, but not significantly (IMHO).  This page lists the redistribution info for the C runtime library and other components.

    2)      If I statically link to the C runtime library, I avoid DLL hell.

    This is a red herring IMHO.  Ever since VC6, the C runtime library has been tightly versioned, as long as your installer follows the rules for version checking of redistributable files (found here) you should be ok.

    3)      My code is faster since the C runtime library doesn’t have to do all that heap synchronization stuff.

    Is it really?  How much checking is involved in the multithreaded library?  Let’s see.  The multithreaded library puts some stuff that was kept in global variable in thread local storage.  So there’s an extra memory indirection involved on routines like strtok etc.  Also, the single threaded library creates it’s heap with HEAP_NO_SERIALIZE (that’s what led to this entire post J).  But that just wraps the heap access with an EnterCriticalSection/ExitCriticalSection.  Which is very very fast if there’s no contention.  And since this is a single threaded application, by definition there’s no contention for the critical section.

    Using the multithreaded DLL C runtime library is especially important for systems programmers.  First off, if your system component is a DLL, it’s pretty safe to assume that you’ll be called from multiple threads, so at an absolute minimum, you’re going to want to use the multithreaded static C runtime library.  And if you’re using the multithreaded static C runtime library, why NOT use the DLL version?

    If you’re not writing a DLL, then it’s highly likely that your app does (or will) use multiple threads.  Which brings me back to the previous comment – why NOT use the DLL version? 

    You’re app will be smaller, more secure, future-proof, and no slower than if you don’t.

     

  • Larry Osterman's WebLog

    What are these "Threading Models" and why do I care?

    • 28 Comments

    Somehow it seems like it’s been “Threading Models” week, another example of “Blogger synergy”.  I wrote this up for internal distribution to my group about a year ago, and I’ve been waiting for a good time to post it.  Since we just hit another instance of the problem in my group yesterday, it seemed like a good time.

     

    So what is this thing called a threading model anyway?

    Ok.  So the COM guys had this problem.  NT supports multiple threads, but most developers, especially the VB developers at which COM/ActiveX were targeted are totally terrified by the concept of threading.  In fact, it’s very difficult to make thread-safe VB (or JS) applications, since those languages don’t support any kind of threading concepts.  So the COM guys needed to design an architecture that would allow for supporting these single-threaded objects and host them in a multi-threaded application.

    The solution they came up was the concept of apartments.  Essentially each application that hosts COM objects holds one or more apartments.  There are two types of apartments, Single Threaded Apartments (STAs) and Multi Threaded Apartments (MTAs).  Within a given process there can be multiple STA’s but there is only one MTA.

    When a thread calls CoInitializeEx (or CoInitialize), the thread tells COM which of the two apartment types it’s prepared to host.  To indicate that the thread should live in the MTA, you pass the COINIT_MULTITHREADED flag to CoInitializeEx.  To indicate that the thread should host an STA, either call CoInitialize or pass the COINIT_APARTMENTTHREADED flag to CoInitializeEx.

    A COM object’s lifetime is limited to the lifetime of the apartment that creates the object.  So if you create an object in an STA, then destroy the apartment (by calling CoUninitialize), all objects created in this apartment will be destroyed.

    Single Threaded Apartment Model Threads

    When a thread indicates that it’s going to be in single threaded apartment, then the thread indicates to COM that it will host single threaded COM objects.  Part of the contract of being an STA is that the STA thread cannot block without running a windows message pump (at a minimum, if they block they must call MsgWaitForSingleObject – internally, COM uses windows messages to do inter-thread marshalling).

    The reason for this requirement is that COM guarantees that objects will be executed on the thread in which they were created regardless of the thread in which they’re called (thus the objects don’t have to worry about multi-threading issues, since they can only ever be called from a single thread).  Eric mentions “rental threaded objects”, but I’m not aware of any explicit support in COM for this.

     

    Multi Threaded Apartment Model Threads

    Threads in the multi threaded apartment don’t have any restrictions – they can block using whatever mechanism they want.  If COM needs to execute a method on an object and no thread is blocked, then COM will simply spin up a new thread to execute the code (this is particularly important for out-of-proc server objects – COM will simply create new RPC threads to service the object as more clients call into the server).

    How do COM objects indicate which thread they work with?

    When an in-proc COM object is registered with OLE, the COM object creates the following registry key:

                HKCR\CLSID\{<Object class ID>}\InprocServer32

    The InprocServer32 tells COM which DLL hosts the object (in the default value for the key), and via the ThreadingModel value tells COM the threading model for the COM object.

     

    There are essentially four legal values for the ThreadingModel value.  They are:

    Apartment

    Free

    Both

    Neutral

    Apartment Model objects.

    When a COM object is marked as being an “Apartment” threading model object, it means that the object will only run in an STA thread.  All calls into the object will be serialized by the apartment model, and thus it will not have to worry about synchronization.

    Free Model objects.

    When a COM object is marked as being a “Free” threading model object, it means that the object will run in the MTA.  There is no synchronization of the object.  When a thread in an STA wants to call into a free model object, then the STA will marshal the parameters from the STA into the MTA to perform the call. 

    Both Model objects.

    The “Both” threading model is an attempt at providing the best of both worlds.  An object that is marked with a threading model of “Both” takes on the threading model of the thread that created the object. 

    Neutral Model objects.

    With COM+, COM introduced the concept of a “Neutral” threading model.  A “Neutral” threading model object is one that totally ignores the threading model of its caller.

    COM objects declared as out-of-proc (with a LocalServer32=xxx key in the class ID.) are automatically considered to be in the multi-threaded apartment (more about that below).

    It turns out that COM’s enforcement of the threading model is not consistent.  In particular, when a thread that’s located in an STA calls into an object that was created in the MTA, COM does not enforce the requirement that the parameters be marshaled through a proxy object.   This can be a big deal, because it means that the author of COM objects can be lazy and ignore the threading rules – it’s possible to create a COM object in that uses the “Both” threading model and, as long as the object is in-proc, there’s nothing that’ll check to ensure you didn’t violate the threading model.  However the instant you interact with an out-of-proc object (or call into a COM method that enforces apartment model checking), you’ll get the dreaded RPC_E_WRONG_THREAD error return.  The table here describes this in some detail.

    What about Proxy/Stub objects?

    Proxy/Stub objects are objects that are created by COM to handle automatically marshaling the parameters of the various COM methods to other apartments/processes.  The normal mechanism for registering Proxy/Stub objects is to let COM handle the registration by letting MIDL generate a dlldata.c file that is referenced during the proxy DLL’s initialization.

    When COM registers these proxy/stub objects, it registers the proxy/stub objects with a threading model of “Both”.  This threading model is hard-coded and cannot be changed by the application.

    What limitations are there that I need to worry about?

    The problem that we most often see occurs because of the Proxy/Stub objects.  Since the proxy/stub objects are registered with a threading model of “Both”, they take on the threading model of the thread that created the object.  So if a proxy/stub object is created in a single threaded apartment, it can only be executed in the apartment that created it.  The proxy/stub marshaling routines DO enforce the threading restriction I mentioned above, so applications learn about this when they unexpectedly get a RPC_E_WRONG_THREAD error return from one of their calls.  On the server side, the threading model of the object is set by the threading model of the caller of CoRegisterClassObject.  The good news is that the default ALT 7.1 behavior is to specify multi-threaded initialization unless otherwise specified (in other words, the ATL header files define _ATL_FREE_THREADED by default.

    How do I work around these limitations?

    Fortunately, this problem is a common problem, and to solve it COM provides a facility called the “Global Interface Table”.  The GIT is basically a singleton object that allows you to register an object with the GIT and it will then return an object that can be used to perform the call from the current thread.  This object will either be the original object (if you’re in the apartment that created the object) or it will be a proxy object that simply marshals the calls into the thread that created the object.

    If you have a COM proxy/stub object (or you use COM proxy/stub objects in your code), you need to be aware of when you’ll need to use the GIT to hold your object.

    Use the GIT, after you’ve called CoCreateInstance to create your COM object, call IGlobalInterfaceTable::RegisterInterfaceInGlobal to add the object to the global interface table.  This will return a “cookie” to you.  When you want to access the COM object, you first call IGlobalInterfaceTable::GetInterfaceFromGlobal to retrieve the interface.  When you’re done with the object, you call IGlobalInterface::RevokeInterfaceFromGlobal.

    In our case, we didn’t feel that pushing the implementation details of interacting with the global interface table to the user was acceptable, so we actually wrote an in-proc object that wraps our out-of-proc object. 

    Are there other problems I need to worry about?

    Unfortunately, yes.  Since the lifetime of a COM object is scoped to the lifetime of the apartment that created the object, this means that when the apartment goes away, the object will go away.  This will happen even if the object is referenced from another thread.  If the object in question is a local object, this really isn’t that big a deal since the memory backing the object won’t go away.  If, however the object is a proxy/stub object, then the object will be torn down post-haste.  The global interface table will not help this problem, since it will remove all the entries in the table that were created in the apartment that’s going away.

    Additional resources:

    The MSDN article Geek Speak Decoded #7 (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngeek/html/geekthread.asp) also has some detail on how this stuff works (although it’s somewhat out-of-date).

     

  • Larry Osterman's WebLog

    Any MVP's out there?

    • 8 Comments

    I was just on the first floor of my building and I noticed that they’ve put up laminated posters with the names and pictures of all the multimedia and MCE MVP’s on the wall – pretty cool actually.  I don’t have a camera on me; otherwise I’d post a picture of the pictures L

     Very neat guys!

    Edit: Got the camera, posted the shots:

    MCE MVP Wall of Fame:

    Windows Media MVP Wall of Fame:

  • Larry Osterman's WebLog

    What's wrong with this code?

    • 29 Comments

    This one comes courtesy of a co-worker who once upon a time was the lead developer for the Manx (http://www.itee.uq.edu.au/~csmweb/decompilation/hist-c-pc.html) compiler.

    He ran into the following problem that was reported by the customer.  They tried and tried to get the following code to compile and it didn't seem to work:

    Main.c:

    #include <stdio.h>
    float ReadFileAndCalculateAverage(FILE *InputFile, int Values[], int LengthOfValues, int *NumberOfValuesRead)
    {
       float average;
       int i;

       //
       // Read the Values array from InputFile.
       //

       <Code eliminated for brevity>

       //
       // Now calculate the average.
       //

       for (i = 1 ; i <= *NumberOfValuesRead ; i += 1)
       {
          average += Values[i];
       }

       average = average/*NumberOfValuesRead;
    }

    int main(int argc, char *argv[])
    {
        int Values[20];
        float average;
        int numberOfValues;
        FILE *inputFile;

        inputFile = fopen(“MyFile.Txt”);
        if (inputFile != NULL)
        {
            average = ReadFileAndCalculateAverage(inputFile, Values, sizeof(j) / sizeof(j[0]), &numberOfValues);
        }
        fclose(inputFile);
     }

     Now imagine this problem in a 10,000 line source file :) Have fun trying to track it down.

    Edit: Removed bogus <p>'s.  Thank you Newsgator's .Text plugin...

  • Larry Osterman's WebLog

    Viral Schmiral

    • 1 Comments

    So Daniel's off on a boat for the next couple of days, and Sharron asked if we could do “Dinner and a Movie”.  Our choice for movie last night was “The Horse in the Grey Flannel Suit”, a Disney film starring Dean Jones and a horse named “Aspercel”.  Not high drama, but still fun - especially if you're a horse owner.

    What I found fascinating is the basic premise of the film.  The daughter of a single dad is totally horse-crazy, so her father (who is an advertising executive on the virge of losing his biggest account, Aspercel antacid) convinces the head of the pharmaceutical company that makes Aspercel that they should embark on a “Subliminal” advertising campaign starring the horse Aspercel.  As his daughter rides Aspercel in more and more horse events, the horse will provide built-in advertising of the product.  One of the requirements of the CEO was that all ownership of the horse had to be hidden - it couldn't be associated with the company at all - that's what made the ad campaign subliminal - it wasn't a “product sponsorship“ because there was theoretically no connection between the company and the horse.

    Anyway, I was watching this movie and it struck me - I thought that viral marketing was a relatively new phenomenon.  But is there any real difference between the idea of Aspercel the horse selling antacids by appearing in horse shows and the Submissive Chicken?

     

  • Larry Osterman's WebLog

    What if Microsoft behaved like the Coalition Provisional Authority?

    • 12 Comments

     So I’m listening to NPR this morning and I ran into this short article on Morning Edition:

    The Coalition Provisional Authority in Iraq provides information on electricity production and reconstruction projects, but not on security. The coalition Web site declares, "For security reasons, there are no security reports."

    The actual web page can be found here.

    Could you imagine if Microsoft (or Suse, or Debian, or any other operating system vendor) attempted to do the same thing with security bugs?

    “For Security reasons, we can’t provide any information about security bugs in our products.”

    The industry wouldn’t stand for it (heck, I wouldn’t stand for it (as if my opinion counts J)).  They’d rightly want to know what we were covering up.

    This is not to say that Microsoft or the others couldn’t be justified in making such a claim – since most (if not all) the security bug exploits that are found in the wild are released after the vendor announces the security hole (18 months for ms-blaster, 1 week for the last couple of security holes).  This isn’t done because the hackers want to be nice and let the vendors involved get a patch out.  Instead, a fairly strong claim could be made that the hackers figured out the exploit from information in the vendors’ security release.  So if the vendor didn’t release information about the security holes, the hackers couldn’t/wouldn’t reverse engineer the holes, and thus there would be fewer exploits in the wild.

    There have been very few examples of a zero-day exploit actually discovered – in a quick Google search, I found only one or two legitimate 0-day exploits out there (no, I’m not posting them), most of the exploits found in the wild are 7-day or 14-day exploits, which tends to justify the argument above – if software vendors didn’t disclose their vulnerabilities, then the hackers would have less to work with.

    Fortunately, the various powers that be have decided that full disclosure’s the way to go – at least for computer security.  Now, if the CPA would only consider doing the same…

     

    Btw, in case it’s not obvious: This posting is provided "AS IS" with no warranties, and confers no rights.  All opinions enclosed are the opinions of the poster and are not those of his employer.

     

     

  • Larry Osterman's WebLog

    What's Larry doing this weekend?

    • 2 Comments

    This weekend, my family and I are participating in Seattle Children’s Theater’s  “Play in a Day” program. 

    Basically we’ll arrive at SCT at 1:00PM, and we’ll start working with director Don Fleming  on creating a new play.

    At 4:30PM we’ll go on stage live in the Eve Alvord in our newly created play!

    We last did this two years ago, performing in a Fairy Tale, and it was an absolutely huge amount of fun for everyone involved (I played the insecure son clutching his teddy bear).

    This year’s production will be a Pirate Tale; I can’t wait to see what we came up with.  Whatever it is, it’ll be fun.

    If anyone wants to come see Larry make a fool of himself, as I mentioned, the show’s going to be on at 4:30PM at the Eve Alvord theater.  Directions can be found here.

     

  • Larry Osterman's WebLog

    Things you shouldn't do, part 1 - DllMain is special

    • 5 Comments

    A lot of people have written about things not to do in your DllMain.  Like here, and here and here.

    One other thing not to do in your DllMain is to call LoadLibraryEx.  As others have written, DllMain’s a really special place to be.  If you do anything more complicated than initializing critical sections, or allocating thread local storage blocks, or calling DisableThreadLibraryCalls, you’re potentially asking for trouble.

    Sometimes, however the interaction is much more subtle.  For example, if your DLL uses COM, you might be tempted to call CoInitializeEx in your DllMain.  The problem is that that under certain circumstances, CoInitializeEx can call LoadLibraryEx.  And calling LoadLibraryEx is one of the things that EXPLICITLY is forbidden during DllMain (You must not call LoadLibrary in the entry-point function).

     

  • Larry Osterman's WebLog

    It's about bloody time...

    • 14 Comments

    I recently ran across these two posts (ok, one’s from a month ago).

    Justice department announces international internet piracy sweep: http://www.usdoj.gov/opa/pr/2004/April/04_crm_263.htm

    Switzerlands's Judical Inquiry Department has taken down the web site ShareReactor.com due to copyright infringement and breach of trademark law. “ShareReactor served as a link platform for filesharing offerings”. http://www.multireg.com/article382.html

    This is something I feel really strongly about, and have since I graduated from college (and started working at a company that makes it’s living from intellectual property), I’m glad to see that governments are starting to crack down on the warez people.

     

  • Larry Osterman's WebLog

    Another Exchange blog posted

    • 0 Comments

    KC just posted another of my Exchange blog entries.  This one’s on the Exchange 2000 access rights and how Exchange 5.5 access rights were represented in Exchange 2000.

    Enjoy!

     

  • Larry Osterman's WebLog

    When global destructors bite

    • 10 Comments

    In my work, I use a lot of ATL.  And in general, I'm pretty impressed with it.  I recently ran into a cool bug that I figured would be worth posting about.

    First, what's wrong with the following code?

     

    main.cpp:

    #include <stdafx.h>

     CComPtr<IUnknown> g_pUnknown;

     

     void __cdecl main(int argc, char *argv[])

     {

        HRESULT hr;

        hr = CoInitializeEx(NULL, COINIT_MULTITHREADED);

        if (hr == S_OK)

        {

               hr = g_pUnknown.CoCreateInstance(CLSID_IXMLDOMDocument);

                       :

                       :

        }

        CoUninitialize();

    }

    Assume that the program uses ATL and the ATL definitions are included in the stdafx.h file.

    Looks pretty simple, right?  Well, if you run the app under the debugger, you're likely to find that it silently access violates when it exits.

    The problem occurs because CComPtr's are auto-pointers.  This means that when they’re destroyed, they release the contained pointer.  Normally, that's a good thing - the reference counts are removed and the object is released.  This works AWESOMELY if the CComPtr is scoped to a function or as a member variable in a class.

    But when the CComPtr is a global variable, when is the destructor called?

    The answer's that the destructor is called when the C runtime library runs down all the global variables.  And that happens when the C runtime library DLL is unloaded.

    So why is that a problem?  Well, when the last thread in the process calls CoUninitialize(), COM says "Hey, he's done using COM.  I can now unload all the DLL's that I loaded into the process".  This includes the MSXML3.DLL that contains the XML DOM.  So when the C runtime library runs the destructor for the CComPtr, it tries to release the reference to the embedded IUnknown pointer.  And that tries to access code in the MSXML3.DLL file which is no longer loaded.  And blam!

     

     

  • Larry Osterman's WebLog

    Spywares

    • 2 Comments

    I just noticed that Microsoft Monitor has a blog on this article that Microsoft published last week on Spyware.  It's an interesting read with some good suggestions.

     

  • Larry Osterman's WebLog

    Page 23 Meme

    • 5 Comments

    Me to!

    So the meme works like this:

         Grab the nearest book. Open the book to page 23. Find the fifth sentence. Post the text of the sentence in your journal along with these instructions.

    Ok, here goes:

    Kirk realized that his M-80 was running low on amunition, but the plague survivors kept on coming - if he didn't find more amunition, his post would be overrun.

    From: “Survivors: Warriors in the post-apocalyptic future“ by Rick Redman.

     

     

  • Larry Osterman's WebLog

    Larry's rules of software engineering #2: Measuring testers by test metrics doesn't.

    • 30 Comments

    This one’s likely to get a bit controversial J.

    There is an unfortunate tendency among test leads to measure the performance of their testers by the number of bugs they report.

    As best as I’ve been able to figure out, the logic works like this:

    Test Manager 1: “Hey, we want to have concrete metrics to help in the performance reviews of our testers.  How can we go about doing that?”
    Test Manager 2: “Well, the best testers are the ones that file the most bugs, right?”
    Test Manager 1: “Hey that makes sense.  We’ll measure the testers by the number of bugs they submit!”
    Test Manager 2: “Hmm.  But the testers could game the system if we do that – they could file dozens of bogus bugs to increase their bug count…”
    Test Manager 1: “You’re right.  How do we prevent that then? – I know, let’s just measure them by the bugs that are resolved “fixed” – the bugs marked “won’t fix”, “by design” or “not reproducible” won’t count against the metric.”
    Test Manager 2: “That sounds like it’ll work, I’ll send the email out to the test team right away.”

    Sounds good, right?  After all, the testers are going to be rated by an absolute value based on the number of real bugs they find – not the bogus ones, but real bugs that require fixes to the product.

    The problem is that this idea falls apart in reality.

    Testers are given a huge incentive to find nit-picking bugs – instead of finding significant bugs in the product, they try to find the bugs that increase their number of outstanding bugs.  And they get very combative with the developers if the developers dare to resolve their bugs as anything other than “fixed”.

    So let’s see how one scenario plays out using a straightforward example:

    My app pops up a dialog box with the following:

     

                Plsae enter you password:  _______________ 

     

    Where the edit control is misaligned with the text.

    Without a review metric, most testers would file a bug with a title of “Multiple errors in password dialog box” which then would call out the spelling error and the alignment error on the edit control.

    They might also file a separate localization bug because there’s not enough room between the prompt and the edit control (separate because it falls under a different bug category).

    But if the tester has their performance review based on the number of bugs they file, they now have an incentive to file as many bugs as possible.  So the one bug morphs into two bugs – one for the spelling error, the other for the misaligned edit control. 

    This version of the problem is a total and complete nit – it’s not significantly more work for me to resolve one bug than it is to resolve two, so it’s not a big deal.

    But what happens when the problem isn’t a real bug – remember – bugs that are resolved “won’t fix” or “by design” don’t count against the metric so that the tester doesn’t flood the bug database with bogus bugs artificially inflating their bug counts. 

    Tester: “When you create a file when logged on as an administrator, the owner field of the security descriptor on the file’s set to BUILTIN\Administrators, not the current user”.
    Me: “Yup, that’s the way it’s supposed to work, so I’m resolving the bug as by design.  This is because NT considers all administrators as idempotent, so when a member of BUILTIN\Administrators creates a file, the owner is set to the group to allow any administrator to change the DACL on the file.”

    Normally the discussion ends here.  But when the tester’s going to have their performance review score based on the number of bugs they submit, they have an incentive to challenge every bug resolution that isn’t “Fixed”.  So the interchange continues:

    Tester: “It’s not by design.  Show me where the specification for your feature says that the owner of a file is set to the BUILTIN\Administrators account”.
    Me: “My spec doesn’t.  This is the way that NT works; it’s a feature of the underlying system.”
    Tester: “Well then I’ll file a bug against your spec since it doesn’t document this.”
    Me: “Hold on – my spec shouldn’t be required to explain all of the intricacies of the security infrastructure of the operating system – if you have a problem, take it up with the NT documentation people”.
    Tester: “No, it’s YOUR problem – your spec is inadequate, fix your specification.  I’ll only accept the “by design” resolution if you can show me the NT specification that describes this behavior.”
    Me: “Sigh.  Ok, file the spec bug and I’ll see what I can do.”

    So I have two choices – either I document all these subtle internal behaviors (and security has a bunch of really subtle internal behaviors, especially relating to ACL inheritance) or I chase down the NT program manager responsible and file bugs against that program manager.  Neither of which gets us closer to shipping the product.  It may make the NT documentation better, but that’s not one of MY review goals.

    In addition, it turns out that the “most bugs filed” metric is often flawed in the first place.  The tester that files the most bugs isn’t necessarily the best tester on the project.  Often times the tester that is the most valuable to the team is the one that goes the extra mile and spends time investigating the underlying causes of bugs and files bugs with detailed information about possible causes of bugs.  But they’re not the most prolific testers because they spend the time to verify that they have a clean reproduction and have good information about what is going wrong.  They spent the time that they would have spent finding nit bugs and instead spent it making sure that the bugs they found were high quality – they found the bugs that would have stopped us from shipping, and not the “the florblybloop isn’t set when I twiddle the frobjet” bugs.

    I’m not saying that metrics are bad.  They’re not.  But basing people’s annual performance reviews on those metrics is a recipe for disaster.

    Somewhat later:  After I wrote the original version of this, a couple of other developers and I discussed it a bit at lunch.  One of them, Alan Ludwig, pointed out that one of the things I missed in my discussion above is that there should be two halves of a performance review:

                MEASUREMENT:          Give me a number that represents the quality of the work that the user is doing.
    And      EVALUATION:               Given the measurement, is the employee doing a good job or a bad job.  In other words, you need to assign a value to the metric – how relevant is the metric to your performance.

    He went on to discuss the fact that any metric is worthless unless it is reevaluated at every time to determine how relevant the metric is – a metric is only as good as its validity.

    One other comment that was made was that absolute bug count metrics cannot be a measure of the worth of a tester.  The tester that spends two weeks and comes up with four buffer overflow errors in my code is likely to be more valuable to my team than the tester that spends the same two weeks and comes up with 20 trivial bugs.  Using the severity field of the bug report was suggested as a metric, but Alan pointed out that this only worked if the severity field actually had significant meaning, and it often doesn’t (it’s often very difficult to determine the relative severity of a bug, and often the setting of the severity field is left to the tester, which has the potential for abuse unless all bugs are externally triaged, which doesn’t always happen).

    By the end of the discussion, we had all agreed that bug counts were an interesting metric, but they couldn’t be the only metric.

    Edit: To remove extra <p> tags :(

  • Larry Osterman's WebLog

    It's only temporary

    • 9 Comments

    NT has a whole lot of really cool features that aren’t always obvious without REALLY looking closely at the documentation.

    One of my favorite is what I call “temporary” temporary files.

    A “temporary” temporary file is one whose storage is never written to disk (in the absence of memory pressure).  It behaves just like a file does (because it’s a file) but the cache manager disables the lazy writer on the file, and the filesystem knows to never commit the pages containing the file’s metadata to disk.

    To create a “temporary” file, you call CreateFile specifying FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE in the dwFlagsAndAttributes attribute.  This combination of bits acts as a hint to the filesystem that the file data should never be flushed to disk.  In other words, such a file can be created, written to, and read from without the system ever touching the disk.

    Why would you ever want such a file?

    Well, consider the case of a web server that’s creating content for client applications.  In this case, it’s possible that halfway through rendering the content to be transmitted to the client, you encounter an error.  The problem is that you’ve already sent the 200 OK response to the client, so the client thinks there are no errors.  To fix this, you can render the content to a “temporary” temporary file, and if there is an error rendering the content, you can response with an appropriate error code if the rendering fails.  If the rendering succeeds, you can use TransmitFile (if your server is written to raw sockets) or HttpSendHttpResponse (if your server is written to the HTTP API) to send the response data directly from the file.

    There are lots of other reasons for using this mechanism – for example, Exchange 5.5’s POP, IMAP and NNTP servers used this technique to render RFC822 message content.  The Exchange server would render the email message to a temporary file, and then use TransmitFile to send the response to the client.  We used this for two reasons – we wanted to be able to respond to render failures above, but also because we needed to deal with potentially huge email messages.

    Which is the other huge advantage of “temporary” temporary files over in-memory buffers - their size is bounded by the available disk space, NOT by available RAM.  So while an in-memory render might fail on a 1G file (because you couldn’t allocate a gigabyte of contiguous memory), it will work just fine to a “temporary” temporary file (assuming you have a gigabyte of disk space free on the rendering drive).  If you exceed available memory, the memory manager will flush the file data to disk. This causes a performance hit, but your operation will succeed instead of failing.

     

  • Larry Osterman's WebLog

    I get all my news on /.

    • 1 Comments

    Slashdot has a post on the front page indicating that Microsoft's made the VC 2K3 compiler available for free download here.

    I gotta say, I'm impressed at the VC team for releasing it, it's pretty amazing IMHO that they'd let this out free of charge.

  • Larry Osterman's WebLog

    Ok, I lied a bit...

    • 1 Comments

    I’m not yet on vacation but I noticed this article about Microsoft starting a new program to enable refurbishing old PCs with new licenses for the Win98 and Win2K pro that was  picked up by /.

    And now I’m wondering what creative reasons the /. crowd is going to come up with to explain why this is a bad thing.

    Edit: to add what the article is about.

     

  • Larry Osterman's WebLog

    I've never seen anything that big.

    • 7 Comments

    Ok, now put your dirty little minds back in the gutter.

    Back in the day (1985ish), I was at work and heard a commotion from outside in the hall.  I saw one of the Xenix developers holding a box that was about the size of a shoebox.

    “Look at this!  Do you know what it is?!”

    “I’ll bite, no, what is it?”

    “It’s a 70 MEGABYTE HARD DISK!!  Isn’t it amazing?!”

    We had never seen anything with that kind of capacity.  This disk was actually big enough that you could put the source to Xenix AND a running copy of the operating system ON THE SAME HARD DISK!

    DOS in those days was limited to 32M disks (DOS sector sizes were 512 bytes in size, and the disk drivers received their requests in 16 bit integers, which meant that the disk drivers could only address 32M of disk).  This wasn’t fixed until DOS 3.31 in 1987.

    My times have changed.  I saw the other day that LaCie is now selling a disk drive with a terabyte of capacity.

     

  • Larry Osterman's WebLog

    I'm OOF for a week.

    • 2 Comments

    Valorie and I are off to Lost Wages for the next week, so it's highly unlikely that I'll be writing anything (we may take the laptops since our hotel is advertising internet access, so you never know, but)...

    Catch y'all later.

     

  • Larry Osterman's WebLog

    Dave Ross has a weblog!

    • 0 Comments

    My favorite talk radio host has a ‘blog!

    http://www.daveross.com/baghdad/baghdad.htm

     Now I need to convince him to add an RSS feed.

     

  • Larry Osterman's WebLog

    More Exchange stuff posted

    • 0 Comments

    KC just told me that she posted one of my Exchange articles, this one about Bedlam DL3.  Enjoy!

     

  • Larry Osterman's WebLog

    It's on the web, it must be true, right?

    • 12 Comments

     This topic shamelessly stolen from my wife J

    I was listening to NPR the other day and ran into this discussion of Dihydrogen Monoxide.  And that reminded me of an experience my wife had in the classroom where she works.

    Valorie’s a teachers aid in a split 5/6 classroom, and they started discussing about how you need to be skeptical about the things you find on the web.  She (and the teacher) pointed out a bunch of revisionism sites, sites where they denied that the moon landing existed, etc.

    And then they hit: Buy Dehydrated Water.  The kids looked at it and were fascinated.  They wanted some.  They thought it was a remarkable product.  From their FAQ:

    "What kind of reaction can I expect from my cat and plant if I only feed them dehydrated water?"

    Response: Dead cat.  If you only consume one type of product and nothing else, you will ruin your body.  You must have a well balanced diet; even with dehydrated water.  For plants, GrowYourOwnFlorist.com confirms our water promotes healthy growth.

    No amount of persuading could convince them that it wasn’t real.  It was on the web, wasn’t it, and they were selling it.  How could it be a fake?

    So Valorie decided to buy some for the class. 

    The people at BuyDehydratedWater.Com are absolutely amazing.  They sent them 24 packages of dehydrated water, complete with instructions, and their thanks.  They also threw in a free BuyDehydratedWater.Com sweatshirt – it’s my son’s favorite attire currently.

    All the students in the class absolutely loved their packages of dehydrated water.

     

  • Larry Osterman's WebLog

    Larry's rules of software engineering, part 1: Every software engineer should know roughly what assembly language their code generates.

    • 23 Comments

    The first in an ongoing series (in other words, as soon as I figure out what more rules are, I’ll write more articles in the series).

    This post was inspired by a comment in Raymond’s blog where a person asked “You mean you think I’m expected to know assembly language to do my job?  Yech”.

    My answer to that poster was basically “Well, yes, I do expect that everyone know assembly language”.  If you don’t, you don’t really understand what your code is doing.

    Here’s a simple quiz:  How many string objects are created in the following code?

    int __cdecl main(int argc, char *argv[])

    {

          std::string foo, bar, baz

         

          foo = bar + baz + “abc”;

    }

    The answer?  5.  Three of the strings are obvious – foo, bar, and baz.  The other two are hidden in the expression: foo = bar  + baz + “abc”.

    The first of the hidden two is the temporary string object that’s created to encapsulate the “abc” string.  The second is one that’s used to hold the intermediate result of baz + “abc” which is then added to bar to get the resulting foo.  That one line of code generated 188 bytes of code.  Now that’s not a whole lot of code today, but it can add up.

    I ran into this rule a long, long time ago, back in the DOS 4 days.  I was working on the DOS 4 BIOS, and one of the developers who was working on the BIOS before me had defined a couple of REALLY useful macros to manage critical sections.  You could say ENTER_CRITICAL_SECTION(criticalsectionvariable) and LEAVE_CRITICAL_SECTION(criticalsectionvariable) and it would do just what you wanted.

    At one point, Gordon Letwin became concerned about the size of the BIOS, it was like 20K and he didn’t understand why it would be so large.  So he started looking.  And he noticed these two macros.  What wasn’t obvious from the macro usage was that each of those macros generated about 20 or 30 bytes of code.  He changed the macros from inline functions to out-of-line functions and saved something like 4K of code.  When you’re running on DOS, this was a HUGE savings (full disclosure – the DOS 4 BIOS was written in assembly language, so clearly I knew what the assembly language that I generated.  But I didn’t know the assembly language the macro generated).

    Nowadays, memory pressures aren’t as critical, but it’s STILL critical that you know what your code is going to generate.  This is especially true if you’re using C++, since it’s entirely possible to hide huge amounts of object code in a very small amount of source.  For instance, if you have:

    CComPtr<IXmlDOMDocument> document;

    CComPtr<IXMLDOMNode> node;

    CComPtr<IXMLDOMElement> element;

    CComPtr<IXMLDOMValue> value;

     

    How many discrete implementations of CComPtr do you have in your application?  Well, the answer is that you’ve got 4 different implementations – and all the code associated with CComPtr gets duplicated FOUR times in your application.  Now it turns out that the linker has some tricks that it can use to collapse identical implementations of methods (and it uses them starting with VC.Net), but if your code is targeting VC6, or if it’s using some other C++ compiler, you can’t guarantee that you won’t be staring at <n> different implementations of CComPtr in your object code.  CComPtr is especially horrible in this respect, since you typically need to use a LOT of interfaces in your application.  As I said, with VC.Net onwards, this isn’t a problem, the compiler/linker collapses all those implementations into a single instance in your binary, but for many templates, this doesn’t work.  Consider, for example std::vector.

    std::vector<short> document;

    std::vector<int> node;

    std::vector<float> element;

    std::vector<bool> value;

    This requires that there be four separate implementations of std::vector compiled in with your application, since there’s no way of sharing the implementation between them (since the sizes of all the types are different, and thus the assembly language for the different implementations is different).  If you don’t know this is going to happen, you’re going to be really upset when your boss starts complaining about the working set of your application.

    The other time that not knowing what’s going on under the covers hits you is when a class author accidentally hides performance problems in their class. 

    This kind of problem happens a LOT.  I recently inherited a class that used operator overloading extensively.  I started using the code, and as I usually do, I started stepping though the code (to make sure that my code worked) and realized that the class implementation was calling the copy constructor for the class extensively.  Basically it wasn’t possible to use the class at all without a half a dozen trips through the heap allocator.  But I (as the consumer of the class) didn’t realize that – I didn’t realize that a simple assignment statement involved two trips through the heap manager, several calls to printf, and a string parse.  The author of the class didn’t know this either, it was a total surprise when I pointed it out to him, since the calls were side effects of other calls he made).  But if that class had been used in a performance critical situation, we’d have been sunk.  In this case, the class worked as designed; it was just much less efficient than it had to be.

    As it is, because I stepped through the assembly, and looked at ALL the code that was generated, we were able to fix the class ahead of time to make it much more implementation friendly.  But if we’d blindly assumed that since the code functioned correctly (and it did), we’d have never noticed this potential performance problem.

    If the developer involved had realized what was happening with his class, he’d have never written it that way, but because he didn’t follow Larry’s rule #1, he got burned.

     

Page 1 of 2 (30 items) 12