If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

ASP.NET Case Study: Hang on WaitOne, WaitAny or WaitMultiple

ASP.NET Case Study: Hang on WaitOne, WaitAny or WaitMultiple

  • Comments 9

One of the synchronization methods in .NET is the ResetEvent.  It comes in two flavors, the AutoResetEvent which resets itself immediately after it is set, and the ManualResetEvent which as the name suggests you have to manually reset.

Lets say you have a team of developers that can implement different parts of an application simultaneously without interaction, then the work order might look something like this

  • Ask Bob to implement X
  • Ask Belinda to implement Y
  • Ask Ben to implement Z
  • Integrate X, Y and Z when you get a notification that they are done with their work

In code (using a reset event) this would look something like this

ImplementApp(){
  	ImplementX();			//spawns off implementation of X on another thread, signals when ready
	ImplementY();			//spawns off implementation of Y on another thread, signals when ready
	ImplementZ();			//spawns off implementation of Z on another thread, signals when ready
	WaitHandle.WaitAll(autoEvents);
	IntegrateXYandZ();		//uses the results of the Imlement methods
}

The ImplementX, Y and Z methods would then use QueueUserWorkItem to get the work scheduled to other threads and when done they would do autoEvents[i].Set() to signal that they are ready.

When you call a web service for example, internally it will spawn up a thread that sits and waits for the results from the web service call and when it is done the original thread will be signalled and can continue with its work.  You can see an example of how this looks here

Another common use for the autoresetevents and manualresetevents is to spawn a thread that just sits around for the lifetime of the process waiting for certain events to happen and act on them when they occurr.  If you look at a dump you will often see threads sitting in WaitOne waiting for some event to happen like this one:

ESP EIP 
0x0109fb74 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0109fb88 0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0109fbbc 0x01040fcf [DEFAULT] Void System.EnterpriseServices.ServicedComponentProxy.QueueCleaner()
0x0109fdc4 0x791b3208 [FRAME: GCFrame] 

This is perfectly normal,  the QueueCleaner here just sits there waiting for someone to signal that the Queue needs cleaning so it isn't hanging by any means, it is just waiting on an event.

 

Going back to the initial example, what would happen if Ben quit work without telling anyone, before he is done with his implementation of Z?  In real-life you would probably be worried if he didn't come to work for a few days and assign the work to someone else, but in an application noone would be the wiser and the app would be hung, waiting indefinitely for WaitHandle.WaitAll(autoEvents). 

Debugging the issue:

For demo purposes I have implemented the Calculate example show in the MSDN help files for AutoResetEvent but added a little bit of a twist to it (as you'll see later) so my application hung.  I then grabbed a hang dump with adplus -hang -pn w3wp.exe, loaded up sos and ran ~* e !clrstack.

Most threads were sitting in this stack:

OS Thread Id: 0x1e58 (26)
ESP       EIP     
0f2cefb0 7d61d051 [HelperMethodFrame_1OBJ: 0f2cefb0] System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle[], Int32, Boolean, Boolean)
0f2cf07c 7940332b System.Threading.WaitHandle.WaitAll(System.Threading.WaitHandle[], Int32, Boolean)
0f2cf098 0f1005a1 Calculate.Result(Int32)
0f2cf0a8 0f10034d _Default.Page_Load(System.Object, System.EventArgs)
0f2cf0d8 66f12980 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0f2cf0e8 6628efd2 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0f2cf0f8 6613cb04 System.Web.UI.Control.OnLoad(System.EventArgs)
0f2cf108 6613cb50 System.Web.UI.Control.LoadRecursive()
0f2cf11c 6614e12d System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0f2cf318 6614d8c3 System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0f2cf350 6614d80f System.Web.UI.Page.ProcessRequest()
0f2cf388 6614d72f System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0f2cf390 6614d6c2 System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0f2cf3a4 0f100125 ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0f2cf3a8 65fe6bfb System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0f2cf3dc 65fe3f51 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0f2cf41c 65fe7733 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0f2cf46c 65fccbfe System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0f2cf488 65fd19c5 System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0f2cf4bc 65fd16b2 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0f2cf4c8 65fcfa6d System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0f2cf6d8 79f047fd [ContextTransitionFrame: 0f2cf6d8] 
0f2cf70c 79f047fd [GCFrame: 0f2cf70c] 
0f2cf868 79f047fd [ComMethodFrame: 0f2cf868]

So here we can see that we are in _Default.Page_Load, calling Calculate.Result, and this is sitting in a WaitAll waiting for someone to signal a resetEvent

Here is an excerpt from the code, and we are stuck on the bolded line:

    public double Result(int seed)
    {
        randomGenerator = new Random(seed);
        ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateBase));
        ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateFirstTerm));
        ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateSecondTerm));
        ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateThirdTerm));
        WaitHandle.WaitAll(autoEvents);
        manualEvent.Reset();

        return firstTerm + secondTerm + thirdTerm;
    }

...

    void CalculateThirdTerm(object stateInfo)
    {
        double preCalc = randomGenerator.NextDouble();
        manualEvent.WaitOne();
        try
        {
            thirdTerm = GetTerm(preCalc);
            autoEvents[2].Set();
        }
        catch { }
    }

For some reason one of the the autoEvents has not been signaled.  If it was the fact that we were still working on the calculation in CalculateThirdTerm for example, then we would have seen a thread in ~* e !clrstack that was stuck somewhere in CalculateThirdTerm.  This was not the case here which means that the thread must have exited without setting the event, much like our teammate Ben.  

From the code we can see that one way this could happen would be if some exception occurred in GetTerm such that we exit the try block without setting the event.

Knowing this I dump out all the recent exceptions in the dump using this command

.foreach (ex {!dumpheap -type Exception -short}){!pe ${ex}}

This goes through all objects on the heap named *Exception* and runs !pe (print exception) on them. 

Note:  If you do this, don't be alarmed if you see an OutOfMemory Exception, a StackOverflowException and an ExecutionEngineException... they will always be there since the exception objecst for these exceptions are created on startup since you can't create them when you throw them.

With the command above I find a number of these exceptions

Exception object: 06f392b4
Exception type: System.ArgumentException
Message: Value can't be less than 1.0
InnerException: 
StackTrace (generated):
    SP       IP       Function
    0F34F19C 0F1006DE App_Code_klmxs0si!Calculate.GetTerm(Double)+0x6e
    0F34F1B4 0F10079B App_Code_klmxs0si!Calculate.CalculateThirdTerm(System.Object)+0x33

So this validates the theory that an exception occurred in GetTerm and this in turn cause us to not signal the event and finally block on the WaitAll

 

Final thoughts

If you use a synchronization method, whether it be a Monitor, ReaderWriterLock or a ResetEvent, you need to make sure that independently of what happens you will release the lock or signal the event as it may be.  With a WaitOne, WaitAny or WaitAll there is an option to provide a timeout in which case the Wait will finish when the timeout is reached and WaitOne, WaitAny or WaitAll will return false so that you can check to see if it timed out.

With a lock(){} statement you will never orphan the lock, the monitor that is used internally will exit even if an exception occurrs in the lock statement... This is similar to using the using(){} statement instead of disposing manually.

If you are manually using Monitor.Enter and Monitor.Exit, or if you use AcquireReaderLock or AcquireWriterLock you should release it in a finally block to avoid orphaning it if you throw exceptions.

 

Laters,

Tess   

  • AJAX Firefox 3.0 XmlHttpRequest Default Content-Type change [Via: Rick Strahl ] ASP.NET Importing an...

  • .NET: LINQ Framework Design Guidelines 使用 Visual Studio 分析器找出应用程序瓶颈 Add Support for "Set" Collections

  • .NET:LINQFrameworkDesignGuidelines使用VisualStudio分析器找出应用程序瓶颈AddSupportfor

  • Hello!

    I would have two questions about .NET Debugging. Maybe you have a minute to answer them.

    1. An a production Machine it was required to debug an issue (I had to check if an web-request had the proper client-certificates attached to it). I came to the conclusion, that generating a dump on a breakpoint would be the best way to check this, because generating a Dump on every 1st chance would be an overkill, and it would leave the application in a nearly unusable state.

    First I wanted to pass the name of the managed Method (e.g. System.Collections.ArrayList.Add) directly to the "bp" command. I tried "System.Collections.ArrayList.Add", and "System::Collections::ArrayList.Add", but I was not able to get the breakpoint bound to the actual method. It there another Syntax for this? Is there also a special syntax for generics?

    Then I used Name2EE to get the the "JITTED Code Address" and set an breakpoint on this address:

    bp XXXXXXXX ".dump -u /ma C:\\DUMPFILE.dmp;g;"

    Everything worked so far, but I wanted to automate the complete procedure, because the time the application is down (waiting for debugger-commands) shall be as low as possible. So I wanted to extract the "JITTED Code Address" and use it as an argument for the Breakpoint Action.

    This is what I've now:

    .foreach /pS 0C (v {!Name2EE mscorlib.dll System.Collections.ArrayList.Add}){bp v ".dump -u /ma C:\\DUMPFILE.dmp;g;";.break;}

    Just skipping the first 13 tokens within the Name2EE Output seems to be not a very stable solution. Isn't there a method to search for a specific pattern (e.g. "Code Address")? This would be much better in my case.

    The second question is if the Attaching to the process can also be scripted. I've seen the .attach and .tlist commands. The following snippet should to the job:

    .foreach(pid {.tlist *\Test.exe}){.attach pid;.break;}

    I've tried putting everything into on file:

    .loadby sos mscorwks

    .foreach(pid {.tlist *\Test.exe}){.attach pid;.break;}

    .foreach /pS 0C (v {!Name2EE mscorlib.dll System.Collections.ArrayList.Add}){bp v ".dump -u /ma C:\\DUMPFILE.dmp;g;";.break;}

    g

    and to call it from the command-line on cdb:

    cdb -cf test.script but it doesn't work (I just get the Syntax-Help). Do you know what is wrong with it?

    Thank you!

  • To set a breakpoint on a managed function you can use !bpmd from sos.dll for example !bpmd myapp.exe MyApp.Main

    Attach to a process, load up sos.dll and run !help bpmd to see what you need to do (you need to make sure that mscorlib is loaded first etc.)

    The easiest way to create a configuration file to set breakpoints on certain things is probably to create an adplus configuration file.  There you can also set up precommands like .load sos etc. that happen directly on attach.... have a look at the windbg help section for adplus to see how this is done...

    Tess

  • also, attaching and running commands will only work if you attach to a process first, and i think that might be what is missing in your script...   you can run the commandline tlist.exe and parse the output in a batch file and then use pass the appropriate pid etc. to cdb to attach to it if that is what you are looking for

  • Hello!

    Thank you for your answer!

    I know the !bpmd Command. But unlike to the bp Command, an action (like .dump and G)cannot be specified, or am I missing something?

    If I understand you correctly, a script only gets executed when a process is attached? I already thought about that, but what makes me think that that I can do it from the script is the fact that there is a .attach command in windbg. Whatever.... The idea with tlist.exe and a .cmd file sounds very interesting. I'll give it a try!

  • So would the Hang on WaitOne show a High CPU usage?

  • no, waitone, waitany etc. are low cpu waits, they just sit idle until an event wakes them up

Page 1 of 1 (9 items)
Leave a Comment
  • Please add 5 and 7 and type the answer here:
  • Post