• PFE - Developer Notes for the Field

    Pivoting ASP.NET event log error messages

    Unless you’ve been hiding under the proverbial rock, you’ve probably seen the recent Pivot hoopla .  If you’re not familiar with it, it’s a way to visualize a large amount of data in a nice filterable format.  The nice thing about it is that...
  • PFE - Developer Notes for the Field

    Visualizing Azure performance counters…

    I’ve blogged before about how some things are different in the cloud, namely Azure .  That post dealt with finding out why your Azure Role crashed by using some of the logging facilities and storage available in Azure.  That’s kind of a worst...
  • PFE - Developer Notes for the Field

    Don’t guess when it comes to performance…a RegEx story.

    Many times, when I work with a customer, it’s because they’ve tried to accomplish something and need a little extra help.  Often, this falls into the application optimization area.  For example, a few years ago, I had a customer that was...
  • PFE - Developer Notes for the Field

    Optimizing your life: E-Mail Edition

    As with most people, I hate to do the same thing more than once.  In-fact, I had a job in college where I audited expense reports and then had to enter them into an application so they could be paid.  The auditing of the expense report was...
  • PFE - Developer Notes for the Field

    Exposing any class as a WCF endpoint…

    Often, I’ll have a customer that wants to utilize a new feature in the .NET framework but they don’t necessarily know where to start or they have a large investment in another area and they’re afraid of the amount of rework required to get something...
  • PFE - Developer Notes for the Field

    Parsing ASP.NET event log error messages for fun and profit

    Sometimes a customer will ask me to look at their site and make some recommendations on what can be improved.  One of the many things I’ll look at is their event logs.  One of the nice things about ASP.NET is that when you encounter an unhandled...
  • PFE - Developer Notes for the Field

    Whodunit: Who threw the message box, and why?


    My name is Brad and I’ve been on the PFE team here at Microsoft for many years. Suffice to say, I’m overdue for contributing to the team blog. I’ve seen lots of interesting (and not-so-interesting) issues with customers all over the world in my time at PFE. What follows is an issue I worked earlier this year. For me, the most interesting part of this issue was not so much in finding root cause as it was the process of discovering who was behind root cause.


    The Problem

    It all started with a customer who reported their ASP.NET application had an OutOfMemory issue. These kinds of issues are not at all uncommon in the .NET world, and the trick usually comes down to finding what object(s) are rooted so that the .NET Garbage Collector can’t reclaim the memory associated with said object(s).


     Getting data from the problem

    They sent me a dump of the problematic application pool, mentioning that they dumped it two hours after they received notice about the OutOfMemoryException (OOM). My initial thought on this was that they had recycled the process and then obtained a dump of the fresh new process. This would obviously be no good since the w3wp instance exhibiting the OOM was gone, and the dump instead represented a new process instance with no memory pressure. Unfortunately, this wouldn’t be the first time I’d had this problem.

    However, when I received the dump, I was pleasantly surprised to see that they had dumped the correct process instance. It was over 1GB in size, representing a 32-bit process with significant memory pressure. And when I looked at the length of time the process had been alive, that verified this was the same instance that threw the OOM.

    Windows Server 2003 Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer SingleUserTS
    Machine Name:
    Debug session time: Thu Feb 25 15:07:20.000 2010 (UTC - 5:00)
    System Uptime: 194 days 18:11:06.875 Process Uptime: 0 days 15:03:28.000


    Analyzing the data, Part 1

    After searching through the dump I found the problematic object that led to the OOM. And after talking with the right folks at the customer, we pieced back together how the problem arose in addition to a resolution to this problem.

    So one question was answered, but another one remained: How did this process manage to stay alive for a full two hours after getting the OOM? I’ve been debugging issues like this since .NET 1.0 was released, and I can tell you that this wasn’t a normal set of circumstances.

    The thread that threw the OOM turned out to be the same thread that answered the question.

    # ChildEBP RetAddr
    00 0c3aedb8 7c82775b ntdll!KiFastSystemCallRet
    01 0c3aedbc 773d7a4b ntdll!NtRaiseHardError+0xc
    02 0c3aee18 773b8377 user32!ServiceMessageBox+0x145
    03 0c3aef74 7739eec9 user32!MessageBoxWorker+0x13e
    04 0c3aefcc 773d7d0d user32!MessageBoxTimeoutW+0x7a
    05 0c3af000 773c42c8 user32!MessageBoxTimeoutA+0x9c
    06 0c3af020 773c42a4 user32!MessageBoxExA+0x1b
    07 0c3af03c 7c34c224 user32!MessageBoxA+0x45
    08 0c3af070 7c348e6c msvcr71!__crtMessageBoxA+0xf4
    09 0c3af294 7c34cf83 msvcr71!_NMSG_WRITE+0x12e
    0a 0c3af2cc 7a09bea7 msvcr71!abort+0x7
    0b 0c3af2d8 77e761b7 mscorwks!InternalUnhandledExceptionFilter+0x16
    0c 0c3af530 77e792a3 kernel32!UnhandledExceptionFilter+0x12a
    0d 0c3af538 77e61ac1 kernel32!BaseThreadStart+0x4a
    0e 0c3af560 7c828752 kernel32!_except_handler3+0x61
    0f 0c3af584 7c828723 ntdll!ExecuteHandler2+0x26
    10 0c3af62c 7c82863c ntdll!ExecuteHandler+0x24
    11 0c3af90c 77e4bee7 ntdll!RtlRaiseException+0x3d
    12 0c3af96c 78158e89 kernel32!RaiseException+0x53
    13 0c3af9a4 7a14fa70 msvcr80!_CxxThrowException+0x46
    14 0c3af9b8 7a108013 mscorwks!ThrowOutOfMemory+0x24
    15 0c3afae4 7a109f7d mscorwks!Thread::RaiseCrossContextException+0x408
    16 0c3afb98 79fd878b mscorwks!Thread::DoADCallBack+0x2a2
    17 0c3afbb4 79e983fb mscorwks!ManagedThreadBase_DispatchInner+0x35
    18 0c3afc48 79e98321 mscorwks!ManagedThreadBase_DispatchMiddle+0xb1
    19 0c3afc84 79e984ad mscorwks!ManagedThreadBase_DispatchOuter+0x6d

    <clipped for brevity>

    As seen in the stack above, after the OOM was thrown in frame 0x14, a message box was thrown in frame 0x8. A message box will keep the process alive until someone clicks OK, Cancel, etc. and the message box goes away. In short, message boxes in server-side processes are never a good thing since they will hang your application!

    Once again, we have answered one question, but another question remains: the process stayed alive for two hours after it threw the OOM because a message box was thrown, but who threw the message box, and why?


    Analyzing the data, Part 2

    As is common when finding a message box that’s popped up in a process, I wanted to see what the message box said. From MSDN, we learn that the second parameter to user32!MessageBoxExA is the text of the message, and the third parameter is the text of the caption in the message box. Using the kb command, we can retrieve these parameters, and then dump them to find the values:

    0:023> kbn7
    # ChildEBP RetAddr Args to Child
    00 0c3aedb8 7c82775b 773d7a4b 50000018 00000004 ntdll!KiFastSystemCallRet
    01 0c3aedbc 773d7a4b 50000018 00000004 00000003 ntdll!NtRaiseHardError+0xc
    02 0c3aee18 773b8377 0dbec7a8 0bcda228 00012010 user32!ServiceMessageBox+0x145
    03 0c3aef74 7739eec9 0c3aef80 00000028 00000000 user32!MessageBoxWorker+0x13e
    04 0c3aefcc 773d7d0d 00000000 0dbec7a8 0bcda228 user32!MessageBoxTimeoutW+0x7a
    05 0c3af000 773c42c8 00000000 0c3af0a4 7c37f480 user32!MessageBoxTimeoutA+0x9c
    06 0c3af020 773c42a4 00000000 0c3af0a4 7c37f480 user32!MessageBoxExA+0x1b

    0:023> da 7c37f480
    7c37f480 "Microsoft Visual C++ Runtime Lib"
    7c37f4a0 "rary"

    0:023> da 0c3af0a4
    0c3af0a4 "Runtime Error!..Program: c:\wind"
    0c3af0c4 "ows\system32\inetsrv\w3wp.exe..."
    0c3af0e4 ".This application has requested "
    0c3af104 "the Runtime to terminate it in a"
    0c3af124 "n unusual way..Please contact th"
    0c3af144 "e application's support team for"
    0c3af164 " more information..."

    I’ve seen this message before, and it can have different underlying causes. The bottom line – at this point in the troubleshooting phase – is that it isn’t a custom message box thrown carelessly by application code. The trick now is to determine how the message box was thrown.

    I made a few unusual observations about the call stack in question.

    - First, there are two versions of the C++ runtime on the stack (7.1 and 8.0). This isn’t common.

    - Secondly, the sequence of events as told by the stack seems very unusual. It appears the .NET Framework throws an OOM (frame 0x14). Then eventually when the underlying OS handles the exception in frame 0xe, it somehow goes back to BaseThreadStart in frame 0xd where the default unhandled exception filter (UEF) is called in frame 0xc. From there we wind up back in the .NET Framework’s UEF in frame 0xb that appears to call the 7.1 CRunTime’s abort() function in frame 0xa. Finally, a message box results.
    What a wild ride!!

    - Finally… wait, the .NET Framework appears to throw a message box (if you observe frames 0xb through 0x8 of the stack)??? This is something I never would have predicted!

    Fortunately, I’d been fooled by such an appearance before, so I knew to take a look at the raw stack first, before I went reading through the .NET source to see if mscorwks!InternalUnhandledExceptionFilter really does call for a message box to be thrown. When I looked at the raw stack around the frames 0xb through 0x8, this is what I found in between all the frames displayed from a kb command:

    0:023> dds 0c3af070 L90
    0c3af070 0c3af208
    0c3af074 7c348e6c msvcr71!_NMSG_WRITE+0x12e
    0c3af078 0c3af0a4
    0c3af07c 7c37f480 msvcr71!`string'
    0c3af26c 77e66ebd kernel32!VirtualQueryEx+0x1d
    0c3af270 000d0f48
    0c3af274 0000003a
    0c3af278 0008b788
    0c3af27c 02403a0a
    0c3af280 0000003a
    0c3af284 013af29c*** WARNING: Unable to verify checksum for ThirdParty.dll
    *** ERROR: Symbol file could not be found. Defaulted to export symbols for ThirdParty.dll -
    0c3af288 0c3af2c4
    0c3af28c 7c349600 msvcr71!_beginthreadex+0x7
    0c3af290 90a99269
    0c3af294 0c3af2c4
    0c3af298 7c34cf83 msvcr71!abort+0x7 <—frame 0xa of the kn callstack above
    0c3af29c 0000000a
    0c3af2a0 00000000
    0c3af2a4 0c3af558
    0c3af2a8 7a3b2000 mscorwks!COMUnhandledExceptionFilter
    0c3af2ac 0c3af2a0
    0c3af2b0 00000103
    0c3af2b4 0c3af520
    0c3af2b8 7c34240d msvcr71!_except_handler3
    0c3af2bc 7c380ea0 msvcr71!type_info::`vftable'+0x94
    0c3af2c0 ffffffff
    0c3af2c4 0c3af2d8
    0c3af2c8 7c35f0a8 msvcr71!__CxxUnhandledExceptionFilter+0x2b
    0c3af2cc 77ecb7c0 kernel32!gcsUEFInfo
    0c3af2d0 7a09bea7 mscorwks!InternalUnhandledExceptionFilter+0x16
    <—frame 0xb of the kn callstack above

    Aha! So there is, in fact, someone else in between the .NET Framework and the call to throw the message box.

    Now, let’s say you still want some kind of proof that the .NET Framework doesn’t make the call to msvcr71!abort(), which results in a call to show the message box. I had my doubts that mscorwks.dll had the 7.1 CRT as a dependency. On a tip from fellow PFE Zach Kramer, you can prove this by running dumpbin /imports on the binary (Dumpbin is an old – but very handy - utility that still ships with Visual Studio). I could always obtain the binary by asking my customer for their mscorwks.dll, but it’s much easier to just use psscor2!savemodule.

    0:023> lmvmmscorwks
    start end module name
    79e70000 7a400000 mscorwks
    Loaded symbol image file: mscorwks.dll

    0:023> !savemodule 79e70000 F:\modules\mscorwks.dll
    5 sections in file at 79e70000
    Successfully saved file: F:\Modules\mscorwks.dll

    Then from a command window:

    f:\Modules>dumpbin /imports mscorwks.dll >Imports.txt


     Searching Imports.txt for the string ‘msvcr7’ comes up empty. In fact, the build of mscorwks used by my customer’s application was msvcr80.dll. This makes sense when you look at frames 0x14-0x13 in our call stack.

    What if we employ the same technique on this ThirdParty.dll – will it show the assembly depends on the 7.1 CRT? Scanning the output for dumpbin /imports on ThirdParty dll shows the following:

    Microsoft (R) COFF/PE Dumper Version 9.00.30729.01
    Copyright (C) Microsoft Corporation. All rights reserved.

    Dump of file ThirdParty.dll
    File Type: DLL

      Section contains the following imports:
            1000E154 Import Address Table
            10010554 Import Name Table 
               0 time date stamp
               0 Index of first forwarder reference


    So we know this ThirdParty.dll has the 7.1 CRT as a dependency, but that alone doesn’t prove this component throws the message box. And according to my customer, the vendor of ThirdParty.dll had been approached many times in the past regarding these “phantom message boxes” hanging production applications. But the vendor had denied any involvement without explicit proof. So the presence of their dll on the stack next to a call to msvcr71!abort() might not suffice when I confronted them with this issue. I felt I still had some work ahead of me.


    Deeper dive

    Before continuing, let’s do a quick review of what we know and don’t know:

    1. The process hung because a message box was thrown. Ironically, this kept the process alive so that a dump could be taken, and from this dump we learned root cause of the OOM.

    2. The message box text and caption indicate it was thrown as a result of some systematic process – not due to some “rogue code” in the customer’s application.

    3. Contrary to the appearance of the kb command, the message box was not thrown by the .NET Framework.

    4. Based on the placement of ThirdParty.dll in the raw stack, the fact that ThirdParty.dll has the 7.1 CRT as a dependency, and the fact that the stack shows the message box was thrown directly by the CRT, and we’re likely to make the most progress by trying to rule out ThirdParty.dll as the culprit (or alternatively, prove that it is the culprit).

    5. From the call stack, it appears the .NET Framework had already instructed the OS to handle the exception. Why does it appear that the .NET Framework got a second go-around at handling the exception, and how do we connect the dots from the .NET Framework, to this ThirdParty.dll, to the 7.1 CRT’s call to abort()?

    6. Why does msvcr71!abort() throw a message box – was it explicitly instructed to do this by someone?

    First, let’s tackle the question about connecting the dots seen in the call stack frames. There’s a pretty clear and concise explanation for this in an MSDN Magazine article from 2008.

    Unhandled Exception Processing In The CLR

    When an exception goes unhandled and the OS invokes the topmost [Unhandled Exception Filter], it will end up invoking the CLR's UEF callback. When this happens, the CLR will behave like a good citizen and will first chain back to the UEF callback that was registered prior to it. Again, if the original UEF callback returns indicating that it has handled the exception, then the CLR won't trigger its unhandled exception processing

    In other words, ThirdParty.dll had registered its UEF before the .NET Framework. And its UEF took the default road of calling abort()  and throwing a message box. After ThirdParty.dll registered its UEF, the .NET Framework then registered its UEF callback. But it didn’t want to be rude and step over the UEF that ThirdParty.dll had registered first, so it chains back to it. Therefore, the result of msvcr71!abort() being called is due to the UEF registered by ThirdParty.dll.

    Next, let’s tackle the question about msvcr71!abort() throwing a message box. Since this ThirdParty.dll was using the 7.1 build of the CRT, let’s look in MSDN for the information on msvcr71!abort():

    abort determines the destination of the message based on the type of application that called the routine. Console applications always receive the message through stderr. In a single or multithreaded Windows application, abort calls the Windows MessageBox API to create a message box to display the message with an OK button. When the user clicks OK, the program aborts immediately.

    To influence the behavior of abort(), simply call _set_error_mode in the DllMain so that it doesn’t exercise the default behavior of throwing a message box. MSDN’s documentation on _set_error_mode states you can use _OUT_TO_STDERR for the lone parameter, and this will avoid the message box when abort() is called.

    A colleague of mine, Senior Escalation Engineer Bret Bentzinger, offered to write some sample code that would load ThirdParty.dll and test this proposed resolution of passing _OUT_TO_STDERR to _set_error_mode. Doing this confirmed that no message box was thrown, and the thread exits without hanging the process.


    In the end, this problem of throwing a message box from the CRT’s call to abort() isn’t a new one. This issue has been around for ages. But it was my first opportunity to drive such an issue and see it through to a resolution. My customer got a fix from the vendor, and we were able to help the vendor write better, more stable code. It was a win-win situation!

  • PFE - Developer Notes for the Field

    PowerShell: Restoring a whole heap of SQL Databases

    PowerShell is one of those things that falls into my “other duties as assigned” repertoire.  It’s something that I’ve used for years to get things done but it’s not often I encounter a Dev at a customer that has worked with it much.  In my...
  • PFE - Developer Notes for the Field

    Micro optimization or just good coding practice?

    This is a common topic and I thought I’d write up some thoughts I have on it.  In-fact, I was just working with a customer on improving their code reviews and what they should be checking for and the question arose - “Should performance be targeted...
  • PFE - Developer Notes for the Field

    Converting the Silverlight Business Application Template to use Windows Authentication

    For my first “real” attempt at creating a business application using Silverlight I decided to use WCF RIA Services, Silverlight 4.0, and Visual Studio.NET 2010.  What easier way to get started than through a template provided by VS.NET 2010, right?  Well...
  • PFE - Developer Notes for the Field

    Announcing the Perfmon Friendly Azure Log Viewer Plug-In

    The Background About 3 months ago as some colleagues and I were working on the "Advanced Debugging Hands On Lab for Windows Azure" (for more info contact me via this blog) we identified an interesting opportunity within the Azure MMC . If you've worked...
  • PFE - Developer Notes for the Field

    Commerce Server 2009 Order Server Deserialization Error

    During the deployment testing of an integration component with Commerce Server 2009’s Order Service I encountered the following error at a client site " Column requires a valid DataType ."  It was occuring right at the point when deserializing the dataset...
  • PFE - Developer Notes for the Field

    Getting a List of a User’s SharePoint Groups

    I found it very odd that this was not available when doing some search engine queries.  My requirement was to take in an AD user name and query SharePoint 2010 to determine the SharePoint groups in which the account belongs.  The code was to run from...
  • PFE - Developer Notes for the Field

    Fixing the DynamicControlsPlaceholder control – Making the community better

    In my job as a PFE for Microsoft , I read, review and fix a lot of code.  A lot of code.  It’s a large part of what I love about my job.  The code is generally written by large corporations or for public websites .  Every now and...
  • PFE - Developer Notes for the Field

    OpenXML: How to refresh a field when the document is opened

    I was working on an internal project a bit ago and one of the requirements was to implement a fancy Word document.  The idea was that all of the editing of the text/code samples/etc. would be done in the application and then the user could just...
  • PFE - Developer Notes for the Field

    Working with the Bing Translator API

    An online translator really isn’t all that new.  They’ve been around for at least 8 years or so.  I remember the days when I would use Babelfish for all of my fun translations.  It was a great way to get an immediate translation for...
  • PFE - Developer Notes for the Field

    IE9: DOM Traversal

    Really interesting blog post by the IE team on some of the new DOM traversal features in IE9 (and other browsers).  Often times, you need to traverse the DOM to find a particular element or series of elements.  In the past, you might need...
  • PFE - Developer Notes for the Field

    Azure: Why did my role crash?

    One thing you might encounter when you start your development on Windows Azure> is that there is an insane number of options available for number of options for logging.   You can view a quick primer here .  One of the things that I...
  • PFE - Developer Notes for the Field

    IIS: When to enable web gardening?

    I seem to get this question a lot and come across many customer environments where they have enabled web gardening thinking that it will automagically improve the performance for their site/application. Most time, that is not the case...
  • PFE - Developer Notes for the Field

    Are you ready for your next challenge? (We’re hiring!)

    First, a few questions: Do you enjoy helping developers write better code? Do you enjoy solving complex problems that span multiple technologies? Do you enjoy optimizing and improving code? Are you passionate...
  • PFE - Developer Notes for the Field

    Site update July 2010 (aka - Where are all of your posts?)

    If you have visited my blog anytime in the last 2 weeks – you may have noticed an error page.  This was due to my hosting provider “accidentally” deleting my site’s database.  This was actually a perfect storm of sorts.  All three of...
  • PFE - Developer Notes for the Field

    How to reset your Flip UltraHD Video camcorder

    I recently ran into a problem with my Flip UltraHD Video camcorder where it would not turn on.  Unlike other camcorders in the Flip family, there is no microscopic reset button anywhere on the device. After e-mailing support, I received...
  • PFE - Developer Notes for the Field

    Creating an instructor guide from a PowerPoint deck.

    I recently came across the situation where I had several PowerPoint decks that were VERY well documented.  Essentially, each slide had reams of notes in the Notes panel of the deck.  This is both good and bad.  It was good because for...
  • PFE - Developer Notes for the Field

    SharePoint Planning/Design Worksheet Links

    I ran across a blog entry with a consolidated list of links to the SharePoint 2007 planning worksheets.  These are good starting points for your discovery, analysis, and design and are provided by Microsoft.  I would suggest tweaking them to meet your...
  • PFE - Developer Notes for the Field

    SharePoint Excel Services 2007 is Exciting, HUH?

    Over the past three plus years that I have been working with SharePoint, I have never had the pleasure of giving an Excel Services presentation to a client.  Well, thanks to our awesome sales team this past week I was able to do that very thing.  I have...
Page 2 of 4 (82 items) 1234