Welcome to MSDN Blogs Sign in | Join | Help

This week I attended and spoke at the Oredev conference in Malmö Sweden, and it was great fun as usual. It was a pretty productive few days with a lot of good talks and conversations with some really cool people.

I’ll put up a link to the talk as it becomes available

Channel9 video and chat with Scott Hanselman for Hanselminutes

imageI got to meet “The man, the myth, the legend” Scott Hanselman and he recorded a 10 minute demo for Channel9 where I showed him how to debug a .Net performance issue using Memory dumps in Visual Studio .NET 2010 Beta 2.

http://www.hanselman.com/blog/HanselminutesOn9DebuggingCrashDumpsWithTessFerrandezAndVS2010.aspx

We also had a really nice chat about debugging (kind of a 101 primer) for hanselminutes that should be out on http://www.hanselminutes.com sometime in the near future.

 

 

.NET Rocks and dnr TV with Carl Franklin and Richard Campbell

Earlier this year I had a chance to speak to my pod-casting idols Carl and Richard on .NET Rocks about debugging .net apps.  They were also hanging out in the hallways of Oredev and it was a treat to meet them in person.  image

To follow up on that talk I showed Carl some debugging demos with Windbg and sos.dll, as well as a Visual Studio 2010 demo for an episode of dnrTV.  I’ll post a link to it when it comes out but if you haven’t done so before, you should definitely check out .net rocks podcasts and dnrTV episodes.  Very entertaining and educational.

carlpub2
Carl is not only a good podcaster/interviewer, it turns out that he is also really good at playing the guitar and singing. 

On Wednesday night he and Shawn Wildermuth pulled out their guitars and had an impromptu jam session at the Bishops Arms pub in Malmö.

Scott recorded it for the world to view on youtube but the recording really didn’t do it the justice it deserved.

Pure awesome!!!

image

 

 

Surface demo in the Microsoft booth

The Swedish Microsoft booth featured a Surface table where you could try it out.  We have one at work so I’ve seen it before but still, they were showing some really amazing stuff on it.

You can see a demo here, where Johan Lindfors who is now blogging on http://programmeramera.se/ shows it off.

 

All in all, a very good few days… can’t wait til next time…

Now it’s off to TechEd Berlin for a week of fun (Shawn promised to bring his guitar again) before conference season is over for me for this year.

/Tess

4 Comments
Filed under:

For all of you who have been asking me if there is a 64bit version of Debug Diag (used to debug 64 bit processes), I am happy to say that yes, Debug Diag 1.1 64bit is now released.

You can download it here:

http://www.microsoft.com/DOWNLOADS/details.aspx?FamilyID=28bd5941-c458-46f1-b24d-f60151d875a3&displaylang=en#filelist

And an updated whitepaper talking about how to use it is available here:

http://www.microsoft.com/downloads/details.aspx?familyid=4A2FBD0D-0635-440C-A08B-ED81BDBB5960&displaylang=en

Neither this version (nor the 32bit one) officially support debugging/analyzing on Vista/2008/Win7 because of some known compatibility issues (being looked at for future versions), however for a lot of tasks it works quite well on those operating systems as well.

I had an interesting case today where an asp.net app was stuck in a true deadlock.  In other words two threads were both waiting for resources that the other thread owned.

The scenario in these cases usually goes something like this:

FunctionA (locks on ResourceA) and calls FunctionB where it needs ResourceB
FunctionB (locks on ResourceB) and calls FunctionA where it needs ResourceA

And is typically pretty easy to spot, understand and fix…

However, in this particular case things were a bit different.

We had gotten memory dumps of the process and after loading them up in windbg, loding sos.dll, and running ~* e !clrstack to see what the threads were working on, we found that a lot of the threads were sitting in System.Threading.Monitor.Enter like below, indicating that they were waiting on a lock…

0:015> !clrstack
OS Thread Id: 0x19c0 (15)
ESP       EIP    
0240f1a0 7d61d06f [GCFrame: 0240f1a0]
0240f2dc 7d61d06f [HelperMethodFrame: 0240f2dc] System.Threading.Monitor.Enter(System.Object)
0240f330 10250ad1 ClassA.ThisFunctionLocksOnStringA()
0240f35c 10250a8e _Default.Page_Load(System.Object, System.EventArgs)
0240f364 66f2a7ff System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0240f374 660b2fb4 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0240f388 660ac4b4 System.Web.UI.Control.OnLoad(System.EventArgs)
0240f39c 660ac4f3 System.Web.UI.Control.LoadRecursive()
0240f3b4 660a85a4 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0240f50c 660a81d4 System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0240f544 660a8101 System.Web.UI.Page.ProcessRequest()
0240f57c 660a8096 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0240f588 660a8072 System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0240f59c 102508a5 ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0240f5a0 660ae546 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0240f5d4 66081a7c System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0240f614 6608cd13 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0240f664 66080ffc System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0240f680 6608456c System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0240f6b4 66084213 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0240f6c4 660833ac System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0240f8d8 79f68c4e [ContextTransitionFrame: 0240f8d8]
0240f90c 79f68c4e [GCFrame: 0240f90c]
0240fa68 79f68c4e [ComMethodFrame: 0240fa68]

Note:  I have given the methods in ClassA (and later on ClassB and ClassC) names that tell us what they are locking on so that it will be easier to follow along.  If this were a real scenairo you would obviously have to look at the code for ClassA.ThisFunctionLocksOnStringA() to see what it was locking on:)

Finding the deadlock

I loaded up SOSEX.dll from http://www.stevestechspot.com/SOSEXUpdatedV11Available.aspx#ac51f1a46-0016-4f24-9ca7-61125105c6e1 and ran !dlk which looks for deadlocks.

0:026> !dlk
Deadlock detected:
CLR thread 26 holds sync block 001e10e4 OBJ:06e401d0[System.String]
             waits sync block 001e12c4 OBJ:06ec6618[System.String] STRVAL=LOCKA
CLR thread 55 holds sync block 001e12c4 OBJ:06ec6618[System.String] STRVAL=LOCKA
             waits sync block 001e10e4 OBJ:06e401d0[System.String]
CLR Thread 26 is waiting at ClassA.ThisFunctionAlsoLocksOnStringA()+0x6(IL)
CLR Thread 55 is waiting at ClassB.ThisFunctionLocksOnStringB()+0x6(IL)

 

1 deadlock detected.

This tells us that we have a deadlock between CLR Thread 26 and CLR Thread 55, where they both own locks on resources that the other thread is waiting for, so now we know that we have a deadlock and which threads are involved.

!dlk is really great for looking at deadlocks but it has one little snag and that is that it reports the CLR thread numbers which is different than the logical thread IDs that we are used to.   However, we can “easily” map them to the logical thread IDs if we use the !threads output.  The 2nd column in !threads shows the CLR thread ID in hex.

CLR Thread 26 (0x1a) = logical thread 44
CLR Thread 55 (0x37) = logical thread 73

0:026> !threads
ThreadCount: 65
UnstartedThread: 0
BackgroundThread: 65
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
...
  44   1a 1b58 0ee85eb0   380b220 Enabled  00000000:00000000 001e6608     2 MTA (Threadpool Worker)
...
  73   37 1610 0eea4bc0   380b220 Enabled  00000000:00000000 001e6608     2 MTA (Threadpool Worker)
...

We could also have avoided doing the math here by dumping out the active locks with !sos.syncblk

0:026> !sos.syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
   22 001e10e4           41         1 0ee85eb0  1b58  44   06e401d0 System.String
   32 001e12c4           41         1 0eea4bc0  1610  73   06ec6618 System.String
-----------------------------
Total           35
CCW             2
RCW             0
ComClassFactory 0
Free            0

Syncblk here tells us that thread 44 owns the lock to syncblock 06e401d0 (so t44 is CLR Thread 26) and thread 73 owns the lock to syncblock 06ec6618 (so t73 is CLR Thread 55 in the !dlk output).

Ok, getting back off that tangent again:), now we know that 44 is waiting on 73 and 73 is waiting on 44, but why?

The stacks for thread 44 and thread 73 look like this

0:044> !clrstack
OS Thread Id: 0x1b58 (44)
ESP       EIP    
116ee5f0 7d61d06f [GCFrame: 116ee5f0]
116ee72c 7d61d06f [HelperMethodFrame: 116ee72c] System.Threading.Monitor.Enter(System.Object)
116ee780 10250757 ClassA.ThisFunctionAlsoLocksOnStringA()
116ee7b0 102506bf ClassC.ThisFunctionLocksOnStringC()
116ee7dc 1025064e Default2.Page_Load(System.Object, System.EventArgs)


0:073> !clrstack
OS Thread Id: 0x1610 (73)
ESP       EIP    
1287e770 7d61d06f [GCFrame: 1287e770]
1287e8ac 7d61d06f [HelperMethodFrame: 1287e8ac] System.Threading.Monitor.Enter(System.Object)
1287e900 10250b6f ClassB.ThisFunctionLocksOnStringB()
1287e930 10250ad7 ClassA.ThisFunctionLocksOnStringA()
1287e95c 10250a8e _Default.Page_Load(System.Object, System.EventArgs)

44 is waiting for the lock on StringA (where StringA="LockA") which 73 took in ClassA.ThisFunctionLocksOnStringA() before calling into ClassB.ThisFunctionLocksOnStringB(), so that is all well and good.

The question here is why does Thread 44 hold a lock on StringB?, there is no method on the stack that locked on StringB…

Why are we locking?

ClassB and ClassC look like this (with their own individual lock objects LockB and LockC):

public class ClassB
{
    public static string LockB = "";

 

    public static void ThisFunctionLocksOnStringB()
    {
        lock (LockB)
        {
        //Do some stuff
        }
    }
...
}

public class ClassC
{
    public static string LockC = "";

    public static void ThisFunctionLocksOnStringC()
    {
        lock (LockC)
        {
            ClassA.ThisFunctionAlsoLocksOnStringA();
        }
    }

...
}

At first sight, LockB and LockC look like different objects, but in reality they are just pointers to the same string constant “”, so in reality this would be the same as saying lock(“”) in both cases.  In fact if you did have a 3rd function saying lock(“”)  this would also have to wait until the lock for LockB or LockC was released.

Summary

When using locks, best practice is to use a private static object so that you can control who/where code locks on this object.

There are some good guidelines for locks in general at:
http://msdn.microsoft.com/en-us/library/c5kehkcz.aspx
and
http://www.guidanceshare.com/wiki/.NET_2.0_Performance_Guidelines_-_Locking_and_Synchronization


Laters,
Tess

I got an email today on an issue that I think is pretty common when you try to memory leak analysis on a dump with debug diag.

They had set up a leak rule in debug diag to monitor for leaks and then ran the memory analysis on it to see what was leaking and the results looked something like this:

  Warning mscorwks.dll is responsible for 372.46 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions:
mscorwks!EEVirtualAlloc+119: 312.00 MBytes worth of outstanding allocations.
mscorwks!EEHeapAlloc+142: 60.46 MBytes worth of outstanding allocations.
If this is unexpected, please contact the vendor of this module, Microsoft Corporation, for further assistance with this issue.

So what does this mean?  Is mscorwks a really leaky component?

Turns out that debug diag leak monitoring feature is really good when it comes to native leaks (non .net), and not so good with high .net memory usage.

The reason for this is pretty simple.  If you have a native component (like a COM+ component) making allocations etc. debug diag will pick up the stacks that did virtualalloc or heapalloc and point out the ones that haven’t de-allocated their memory so you can see stacks that still have outstanding allocations.

As most of you know, memory management in .NET is centralized through the garbage collector, so this means that any and all memory used for .net objects is actually allocated by the GC (through functions in  mscorwks.dll).   This means that when you use something like debug diag, that focuses on native allocations, it will seem like mscorwks.dll is leaking, while what is really happening is that you have growing .net memory usage.

The reason it says for you to contact Microsoft is because mscorwks.dll is part of the Microsoft .NET Framework, and while you are welcome to call our support to troubleshoot these kind of things, it will likely come down to high memory usage caused by the application in question.

To troubleshoot these types of memory issues, a good start might be to go through the memory leak labs or to look at any posts tagged with memory.

Laters,

Tess

I was out looking for some sample scripts for tinyget and ran across this ANTS Memory Profiler review where Scott Seely has tested it out on some of my debugging labs.

I haven’t had the time and/or opportunity to test it out myself but since many people have asked me if I had used it or what I thought of it I thought I’d just point you to someone who has instead:)

Have fun,

Tess

The other day I came across an issue where an ASP.NET site stopped responding and didn’t start serving requests again until the W3WP.exe process was restarted.

We grabbed some memory dumps with debug diag before restarting the process to see what was going on.

Debugging the dumps:

I opened the dump in windbg, and loaded up sos (.loadby sos mscorwks).

As with all hang dumps I usually check what the threads are doing (~* kb) and I found that one thread (10) was initiating a garbage collection.

0:010> kb
ChildEBP RetAddr  Args to Child             
01cceeb4 7c827d0b 77e61d1e 00000280 00000000 ntdll!KiFastSystemCallRet
01cceeb8 77e61d1e 00000280 00000000 01cceefc ntdll!NtWaitForSingleObject+0xc
01ccef28 79e718fd 00000280 000000fa 00000000 kernel32!WaitForSingleObjectEx+0xac
01ccef6c 79e718c6 00000280 000000fa 00000000 mscorwks!PEImage::LoadImage+0x199
01ccefbc 79e7187c 000000fa 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01ccefcc 7a0e288e 000000fa 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01ccf020 79f25226 00000000 9985f121 00032ca8 mscorwks!Thread::SysSuspendForGC+0x52a
01ccf10c 79f25011 00000001 00000024 00000000 mscorwks!WKS::GCHeap::SuspendEE+0x16c
01ccf130 79f24ef2 00000000 00000000 00000024 mscorwks!WKS::GCHeap::GarbageCollectGeneration+0x25b
01ccf1b8 79e74f4f 7a3867e8 00000024 00000000 mscorwks!WKS::gc_heap::try_allocate_more_space+0x1a0
01ccf1d4 79e74d29 7a3867e8 00000024 00000000 mscorwks!WKS::gc_heap::allocate_more_space+0x18
01ccf1f4 79e74cc2 00000022 00000000 00000009 mscorwks!WKS::GCHeap::Alloc+0x6c
01ccf208 79ea3646 00000022 00000000 00000000 mscorwks!Alloc+0x72
01ccf244 79f13708 00000009 9985f2c5 00000000 mscorwks!SlowAllocateString+0x29
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
01ccf2e8 79342fb3 00000048 00000000 00000000 mscorwks!FramedAllocateString+0xa0
01ccf308 79342f5a 00000000 00000008 00000006 mscorlib_ni+0x282fb3
00000000 00000000 00000000 00000000 00000000 mscorlib_ni+0x282f5a

This is really not all that uncommon since GCs happen quite frequently. What was interesting about this one though was that it was sitting in mscorwks!Thread::SysSuspendForGC which basically means that it is trying to suspend all the .net threads in order to get started with the garbage collection.

Whenever you see a thread stuck in this, you are almost always running into some GC deadlock condition.

To find out why we can threads can’t be suspended we can look at the !threads output to find out if any threads (other than the GC initiator) has preemptive GC disabled.  (This means that it can’t be suspended)

0:000> !threads
ThreadCount: 12
UnstartedThread: 0
BackgroundThread: 11
PendingThread: 0
DeadThread: 1
Hosted Runtime: yes
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  10    1 5e5c 0010e5a0   1808220 Disabled 00000000:00000000 00136610     3 MTA (GC) (Threadpool Worker)
  12    2 52c0 00117b20      b220 Enabled  00000000:00000000 000dc7d0     0 MTA (Finalizer)
  13    3 4ca0 0011c638      1220 Enabled  00000000:00000000 000dc7d0     0 Ukn
   8    4 4034 0012f6b8   880a220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
  14    5 5424 00133818    80a220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
XXXX    8    0 0226e710      9820 Enabled  00000000:00000000 000dc7d0     0 Ukn
  19    7 42ac 0226e350   880b220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
  20    6 5dbc 09d756e8   180b220 Enabled  00000000:00000000 00136610     2 MTA (Threadpool Worker)
   5    9 4f74 0014f238       220 Enabled  00000000:00000000 000dc7d0     0 Ukn
  17    a 4378 09d90880       220 Enabled  00000000:00000000 000dc7d0     0 Ukn
  23    c 5a4c 09ead790   880b220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
  26    b 5230 09ece8f0   180b222 Disabled 00000000:00000000 00136610     1 MTA (Threadpool Worker)

From the !threads output we can see that in fact thread 26 has preemptive GC disabled.  I’ve talked about this in earlier posts, but in short, some technologies like fusion (used to load assemblies) will disable GC preemption to avoid getting interrupted during some critical operations.

So what is this thread doing…  the MyTools.MyAssemblyLoader constructor is trying to load up an assembly using Assembly.LoadFile, so that is why preemptive GC is disabled.  This is usually a very quick operation but here it is stuck waiting on some event before it can finish.

0:026> kb
ChildEBP RetAddr  Args to Child             
0df0cd90 7c827d0b 77e61d1e 00000c5c 00000000 ntdll!KiFastSystemCallRet
0df0cd94 77e61d1e 00000c5c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0df0ce04 79e718fd 00000c5c ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0df0ce48 79e718c6 00000c5c ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
0df0ce98 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
0df0cea8 79f90f81 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
0df0ceb8 79f917ba 7a390970 0230cc18 ffffffff mscorwks!CExecutionEngine::WaitForEvent+0x37
0df0cecc 7a15cfce 0230cc18 ffffffff 00000000 mscorwks!ClrWaitEvent+0x17
0df0cedc 7a153339 0df0d2ac 00000000 00000000 mscorwks!CSimpleFusionBindSink::Wait+0xe
0df0cef8 7a1545d2 09ed0180 0013a868 00000200 mscorwks!BindHelper+0x7d
0df0d194 7a2d0c7c 0df0d2ac 0013a868 00000000 mscorwks!ExplicitBind+0x1a6
0df0d4b0 793f583f 00000000 17b75570 17b75498 mscorwks!AssemblyNative::LoadFile+0x1de
0df0d4d0 7934f2ed 00000000 00000000 17b7546c mscorlib_ni+0x33583f
0df0d4f4 085f6ec4 7a3177c0 17b3df74 00000000 mscorlib_ni+0x28f2ed
WARNING: Frame IP not in any known module. Following frames may be wrong.
0df0d530 085f6646 17b70088 17b6f1b4 17b70088 0x85f6ec4
0df0d7c8 085f48a2 790d57b4 17b6e648 17b40284 0x85f6646
*** WARNING: Unable to verify checksum for System.Data.ni.dll
0df0dc10 651a82d5 17b5275c 79103ae8 00000000 0x85f48a2
0df0dff0 79f23aa2 95b9e085 026d1f54 17893ecc System_Data_ni!_load_config_used+0x56c45
0df0e0a0 00000000 00000000 00000000 00000000 mscorwks!COMNlsInfo::nativeChangeCaseString+0x18f

 

0:026> !sos.clrstack
OS Thread Id: 0x5230 (26)
ESP       EIP    
0df0d200 7c8285ec [HelperMethodFrame_PROTECTOBJ: 0df0d200] System.Reflection.Assembly.nLoadFile(System.String, System.Security.Policy.Evidence)
0df0d4b8 793f583f System.Reflection.Assembly.LoadFile(System.String)
0df0d4cc 085f704d MyTools.MyAssemblyLoader..ctor(System.String)
...
0df0ebb0 083e85b0 MyAPP.Default.Page_Load(System.Object, System.EventArgs)
0df0ed10 698a1928 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0df0ed20 6629769f System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0df0ed34 66143a84 System.Web.UI.Control.OnLoad(System.EventArgs)
0df0ed44 66143ad0 System.Web.UI.Control.LoadRecursive()
0df0ed58 66155106 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0df0ef10 66154a1b System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0df0ef48 66154967 System.Web.UI.Page.ProcessRequest()
0df0ef80 66154887 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0df0ef88 6615481a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0df0ef9c 083e826e ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0df0efa8 65ff27d4 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0df0efdc 65fc15b5 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0df0f01c 65fd32e0 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0df0f06c 65fc0225 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0df0f088 65fc550b System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0df0f0bc 65fc5212 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0df0f0c8 65fc3587 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0df0f278 79f35ee8 [ContextTransitionFrame: 0df0f278]
0df0f2c8 79f35ee8 [GCFrame: 0df0f2c8]
0df0f420 79f35ee8 [ComMethodFrame: 0df0f420]

Looking around at other threads, we find another thread that is also loading up an assembly from the same location so this is probably the thread that will set the event so that thread 26 can continue.

0:020> kL 2000
ChildEBP RetAddr 
0993cec0 7c827d0b ntdll!KiFastSystemCallRet
0993cec4 77e61d1e ntdll!NtWaitForSingleObject+0xc
0993cf34 79e718fd kernel32!WaitForSingleObjectEx+0xac
0993cf78 79e718c6 mscorwks!PEImage::LoadImage+0x199
0993cfc8 79e7187c mscorwks!CLREvent::WaitEx+0x117
0993cfd8 79f40d79 mscorwks!CLREvent::Wait+0x17
0993cfe8 79f40e96 mscorwks!WKS::GCHeap::WaitUntilGCComplete+0x32
0993d024 79e7ee77 mscorwks!Thread::RareDisablePreemptiveGC+0x1a1
0993d05c 79e75c65 mscorwks!CrstBase::Enter+0x1e6
0993d060 79e75c59 mscorwks!EEEnterCriticalSection+0x9
0993d068 79e75c3b mscorwks!CExecutionEngine::AcquireLock+0x9
0993d074 79eb2f2d mscorwks!ClrEnterCriticalSection+0xf
0993d080 7a1481d3 mscorwks!CCriticalSection::Lock+0x17
0993dd2c 7a14893a mscorwks!CAssemblyCacheItem::MoveAssemblyToFinalLocation+0x31f
0993dd74 7a146e2d mscorwks!CAssemblyCacheItem::LegacyCommit+0x245
0993ddb0 7a16198e mscorwks!CAssemblyCacheItem::Commit+0x5d
0993e45c 7a161e12 mscorwks!CAsmDownloadMgr::CreateAssembly+0x85b
0993e498 7a162550 mscorwks!CAsmDownloadMgr::DoSetupPushToCache+0x50
0993e720 79f90e65 mscorwks!CAsmDownloadMgr::DoSetup+0x26a
0993e76c 79f8f2cc mscorwks!CAssemblyDownload::DoSetup+0x7b
0993e7a0 79f8f1e4 mscorwks!CAssemblyDownload::DownloadComplete+0xb6
0993e9fc 79f8f0ca mscorwks!CAssemblyDownload::KickOffDownload+0x19c
0993ea64 79f2fd56 mscorwks!CAssemblyName::BindToObject+0x568
0993eaac 7a1545d2 mscorwks!BindHelper+0x6b
0993ed48 7a2d0c7c mscorwks!ExplicitBind+0x1a6
0993f064 793f583f mscorwks!AssemblyNative::LoadFile+0x1de
0993f0bc 79e7ae3e mscorlib_ni+0x33583f
0993f0d0 00000000 mscorwks!MetaSig::MetaSig+0x125

 

0:020> !sos.clrstack
OS Thread Id: 0x5dbc (20)
ESP       EIP    
0993edb4 7c8285ec [HelperMethodFrame_PROTECTOBJ: 0993edb4] System.Reflection.Assembly.nLoadFile(System.String, System.Security.Policy.Evidence)
0993f06c 793f583f System.Reflection.Assembly.LoadFile(System.String)
0993f080 085f704d MyTools.MyAssemblyLoader..ctor(System.String)
0993f0b0 083e8612 MyAPP.Default.Page_Load(System.Object, System.EventArgs)
0993f210 698a1928 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0993f220 6629769f System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0993f234 66143a84 System.Web.UI.Control.OnLoad(System.EventArgs)
0993f244 66143ad0 System.Web.UI.Control.LoadRecursive()
0993f258 66155106 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0993f410 66154a1b System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0993f448 66154967 System.Web.UI.Page.ProcessRequest()
0993f480 66154887 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0993f488 6615481a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0993f49c 083e826e ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0993f4a8 65ff27d4 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0993f4dc 65fc15b5 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0993f51c 65fd32e0 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0993f56c 65fc0225 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0993f588 65fc550b System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0993f5bc 65fc5212 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0993f5c8 65fc3587 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0993f778 79f35ee8 [ContextTransitionFrame: 0993f778]
0993f7c8 79f35ee8 [GCFrame: 0993f7c8]
0993f920 79f35ee8 [ComMethodFrame: 0993f920]

The problem here is that this thread has enabled preemptive GC again and the GC has suspended it so that it is now waiting for the GC to finish before it can finish.

In other words we have a deadlock situation caused by a race condition with two threads loading the same dll at the same time:

Thread 10 – initiating GC – suspending all threads
Thread 26 – can’t be suspended until it has finished loading the dll but is waiting for thread 20 to do so
Thread 20 – waiting for the GC to finish to continue loading its dll

Solutions:

This issue could be resolved by putting a lock around the Assembly.Load in the MyTools.MyAssemblyLoader constructor.

However, this issue is actually fixed in this hotfix which is included in .NET 2.0 SP1 so if you are running into this you can install SP1 for .NET Framework 2.0 to fix the problem.

Have a good one,

Tess

TechEd_Europe_Blog_SM_STD
TechED Europe – Berlin -  9-13th of November


This will be the 2nd time I present at TechED. Last time I had an absolute blast and heard some really  good speakers.

I am still working with the TechED team on exactly what type of sessions I will get, and exactly what I’ll be talking about but I’m sure it won’t surprise anybody reading this blog that there will be a lot of debugging involved:) 

There will be lot’s of windbg and maybe some Visual Studio 2010 stuff, and of course since I’ll be presenting in the web track there will be plenty of asp.net stuff.  Having said that, most things I say apply to any .net application.

On the speakers list, the people I am looking forward to seeing the most are Mark Russinovich of course, Roy Osherove and my friend Magnus Mårtensson.

a


OreDev – Malmö, Sweden – 2-6th of November

Coincidentally Magnus happens to be working for the company (dotway) that organizes Oredev.

I’m really excited about Oredev this year.  The lineup is sooooooo good that I will have a hard time choosing what sessions to go to.

The Scotts (Hanselman, Allen and Bellware) for sure, and it is always fun to hear what my colleague Johan Lindfors has to say.

Oredev is a pretty cool conference because they pull in sessions from many different areas so this is a perfect place to go and watch what people outside of the .NET Development field are doing.

I’ll of course be talking about what I know best:)  Debugging, troubleshooting, Asp.Net, you know the drill.

 

Hope to see you all there… I’ll be on twitter @TessFerrandez (on and off) during the events so don’t be shy.

Tess

2 Comments
Filed under:

I recently had a customer that just moved from Windows 2008 to Windows 2008 R2 (changing from IIS 7 to IIS 7.5) and after the move they got some weird forms authentication errors in the event log

The problem

They have two applications (MyWebApp and MyLoginWebApp) where forms authentication is dealt with in MyLoginWebApp

<authentication mode="Forms">
  <forms name="MyAppAuth" loginUrl="/MyLoginWebApp/Default.aspx" protection="All" timeout="720" path="/" slidingExpiration="true" />
</authentication>

Once the user is authenticated they are redirected back to MyWebApp but on the redirect, forms authentication fails (and logs events like the below), and redirects back to the MyLoginWebApp application.

Event ID: 1315
Event message: Forms authentication failed for the request
Reason: The ticket supplied was invalid

The two applications are setup to run in different application pools in order to avoid problems occurring in one of the apps to affect the other.

Autogenerated machine keys

Forms authentication requires a machine key to encrypt and decrypt the ticket and this has to be the same on the sending and receiving end.  You can use to specify a manually generated key or use an autogenerated one.

Now, by default when using AutoGenerate machinekeys, the autogenerated machinekey is isolated per application. 

To be able to use the same autogenerated machinekey over multiple applications you need to go in to IIS Manager (on IIS7+) and in the Machine Key settings, uncheck the box for “Generate a unique key for each application”.   If you are running IIS6, you can remove the IsolateApps keyword from the <MachineKey…> tag in Machine.Config to allow for the key to be the same across applications.

If you are crossing applications, and especially if you are in a web farm scenario where you may be crossing machine boundaries, you should consider using a pre-generated Machine Key rather than an autogenerated one.  

Using a manually generated key actually worked for them, but for various reasons they wanted to use an AutoGenerated one and as this worked in IIS 7 they were curious why it didn’t work now.

Troubleshooting

To get to the bottom of the problem I started looking at how these autogenerated keys were stored and what could be causing the two applications to not use the same one.

I found this blog post by Kev talking about exactly that and describing how the autogenkey was stored in the registry.  The blog post actually talks about the exact same issue we are seeing. In his case the reason for the failure was that if the process doesn’t have permissions to write to the registry it has to regenerate the key each time a process starts up.

So just for house keeping, the registry key where the autogenerated key is stored is:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ASP.NET\2.0.50727.0\AutoGenKeys\S-1-5-21-3520846933-1734744541-1252751799-1013

This seemed to fit our problem pretty well.  When you are separating the apps out into different application pools they run in different processes, and if they didn’t have enough permissions to write the key it would be autogenerated at startup (different ones for the different application pools).

The problem was that I could reproduce this, and I could see the autogen keys in my registry so I knew that we didn’t have trouble writing it. 

While looking at this though, I discovered that my two application pools were running under different users (for example the user for the DefaultAppPool was DefaultAppPool and the user for my TestAppPool was TestAppPool). 

Looking back at the registry key, it contains a SID, so the autogenkey is stored on a per user bases, i.e. in our case, since the two processes are running under different users they would have different autogen keys, which explains why we are seeing issues here.

It turns out that there was a change between 7 and 7.5 that Dave wrote about here where they changed the default identity of the application pool from Network Service to an automatically generated AppPool identity.  So this is why there was a difference between 7 and 7.5 when using autogen keys.

The solution

The solution here is rather obvious once you know the background, either you manually generate a key, rather than using an autogen one, or you change the identity of the application pool to the same user for each app pools. 

Have fun,

Tess

3 Comments
Filed under:

Is it a plane, is it a bird, is it a UFO?

Before you can delve into any kind of troubleshooting of an issue you need to thoroughly define it.  If you don’t you’ll probably end up spending a lot of time randomly gathering and looking at data that is probably not even relevant to the issue at hand.  More importantly, how do you even know that your problem is fixed if you don’t have a good definition of the problem?

This might sound like common sense, but having worked with troubleshooting the better part of the last 10 years I can tell you that it is extremely common that people start looking at data before really understanding the problem.  More often than not this is because people are stressed and want to fix the problem fast, or maybe because the problem space is unknown to them so they have difficulties defining what information or details are important.  Sometimes symptoms are even contradictory which makes it even more difficult to troubleshoot something, as you don’t know if you can trust the data.

I don’t have a silver bullet recipe for success but I wanted to share how I approach an issue (in a pretty generic way) and I welcome any and all input that you might have on the topic as good scoping/problem definitions is something that is often a topic of discussion at my workplace.

What I do know for a fact is that a lot of times, the issues where we don’t have a proper problem definition that is agreed between all parties, tend to take a lot longer to resolve than the ones they do.  Similar to if you have a software project where you haven’t spent proper time on working with the customer on their requirements.  And in the end, if you had a good agreement between all parties, you don’t end up resolving the wrong problem.

My 9 questions for a pretty thorough problem description

When I call up a customer to start working on an issue I am generally looking for them to answer 9 questions in their own words.

1. What is happening?
2. What did you expect to happen?
3. When is it happening?
4. When did it start happening?
5. How does this problem affect you?
6. What do you think the problem is? and what data are you basing this on?
7. What have you tried so far?
8. What is the expected resolution?
9. Is there anything that would prohibit certain troubleshooting steps or solutions?

All of these questions are pretty open ended and a lot of the questions can’t be answered on the first pass before gathering more data, but at least then we know what data we need to gather to continue.

Throughout the process of defining the issue, I usually also try to assess the reliability of the information I get so that I know what is a known fact and what is hearsay to avoid spending too much time on things that may not be relevant.

If you’re troubleshooting an issue by yourself, it might be worthwhile pairing up with someone to explain the problem since a lot of things tend to get clearer if you say them out loud and have to explain them.

What am I really looking for when asking these questions?

1. What is happening

Let’s say we have a problem where we are experiencing a “hang” in an ASP.NET application on a production server, this would be what I am looking for or not looking for in my definition.

Bad answer: Occasionally our IIS server just hangs.

This is a pretty common answer, but in reality it says nothing about what is going on.  The words hang, crash or memory leak for example mean a lot of different things to different people. 

A crash for example might mean anything from “we just get blank pages”, user gets a “service unavailable”, there is a “stopped unexpectedly” log in the eventlog, user sees a “NullReferenceException” page in the browser, or really anything where the application is not doing what it is supposed to.

A hang, could be anything from a “crash”, to a particular page responding in 5 seconds rather than 2 seconds, or something where the pages stop responding completely, leaving the user with a spinning cursor, or a timeout, or a complete system freeze.

Memory leaks are my favorite ones, people tend to use “memory leak” as the bucket for cases where “we don’t know what is going on so we must have a memory leak”.  Even in the cases where there are actually memory problems, there is a big variation in the symptoms, like anything from “the process grows from 200MB private bytes to 1 GB in 20 minutes” to outofmemoryexceptions to “watching the process in taskmanager i see some memory growth”.

In either case, the more details you can get, the better off you are, and if it is not detailed enough you need to set up an action to get more data before starting to troubleshoot.

Better answer:  Once or twice a day over the last two weeks we have gotten reports from customers that the login page is not responding.  Eventually if they wait long enough the page will time out.  We have confirmed this behavior by logging in at the time of the failure, but all other pages seem to be working well during the problem period.   The problem persists until we restart IIS.  We have not seen any events in the event log matching the time of the failure.

Note:  There are a couple of things worth mentioning here.  The timing of once or twice a day is important, as once we resolve the issue, we can be fairly sure it is resolved if it doesn’t happen for two days for example.   The fact that it is verified by someone in-house makes the symptom pretty reliable. The fact that the problem persists until an IISReset is issued tells us a couple of things.  1. we have a window for getting memory dumps before recycling, 2. there is a good chance the contention exists in the app since it is not persistent after a recycle.  And finally, the fact that no other pages are affected give us a very defined area to look at. 

Of course this is just a random issue, but you get the idea of how much better we are off with this answer than the first answer.

2. What did you expect to happen

This is a question that is often overlooked because the answer is often assumed, but the reality is not as simple as that.

If we take a memory issue for example, to understand what is considered bad memory patterns, we need to understand what the memory usage is in testing for x number of users, what the baseline memory usage is and how much memory we are expecting to store in session/cache etc. per user.

For a performance issue we need to have a baseline response time.  Even if the problem we are trying to resolve is that pages are timing out, it is important to know if the expected response time for the page is 5 seconds or a few milliseconds, and also if the application has been stress tested and shown those results during testing.

In a crash scenario where you are looking at error events in the event log it is also nice to note down that you are expecting for the process to recycle every 24 hrs for example based on recycling reasons, so that you know that eventlogs relating to that can be discarded.

In the best of cases we would have a solid set of performance logs, event logs and iis logs from normal operation to compare to once we gather for the faulty state.

3. When is it happening?

A solid repro scenario is of course optimal, but with production type issues this is seldom the case.   Things that I am looking for here are.

  • How often the problem reproduces (eg. once or twice a day in the above example)
  • Do we know what actions people take, i.e. it happens when they try to log in?
  • Do we know if it is confined to one server, one application, one page, one action?
  • Do we know if it is reproducible in test?
  • Is it happening only for certain users?
  • Is it happening only under load?
  • Does it only happen when memory usage is high, or when CPU usage is over 80%?
  • Does it always happen at 8 am when the first people in the office start logging in?
  • etc. etc.  any and all conditions relating to the issue are interesting.

4. When did it start happening?

If the application has been working well for a while and suddenly starts behaving weird, it stands to reason that perhaps something that happened around that time could have affected it.

Often though there is a difference between the time when it started happening and when it was first discovered, so make sure to verify the time that it started happening against any available data (eventlogs, iislogs, performance logs etc.) that you might have.

All in all, knowing when it started happening is often one of the most important clues to finding root cause.

5. How does this problem affect you?

When I work on an issue, I like to know what kind of impact the problem has on the customer and the users of the system.  

The reason I ask about this is because if a problem causes users not to be able to log in for example, and this is a critical application for the business, obviously we should be starting by finding a temporary fix, rather than going through root cause analysis, or maybe starting with a temporary fix first (like recycling when memory reaches 600 MB if the problem is a memory leak, to avoid an unexpected crash), and then preparing for root cause analysis.

It also tells me a little bit about the priority that troubleshooting is going to get.  If the issue is very severe, maybe it is ok to perform some troubleshooting actions that have a lot of impact on the system, if it is going to help us find the problem faster.

6. What do you think the problem is? and what data are you basing this on?

This is especially interesting if you come in as an external troubleshooter, as you get a lot of insight into known problem areas of the application by asking what they think the problem might be.

There is a temptation to follow down the path of what is already guessed because it often sounds very plausible, but when I look at a problem I try to keep a very open mind to avoid getting stuck in a tunnel.

If someone believes that the problem is “x” then oftentimes, because of how our minds work, we tend to look at the data from that direction, even if it doesn’t fit. Part #2 of this question is what data you are basing your theory on?  I try to look at that data with fresh eyes, to see if it really collaborates the theory or if there are some holes in it.   To be honest, if a person comes to me with an issue and already has a theory, the first thing i try to do is to disprove it.

7. What have you tried so far?

This question serves two purposes.

1. Finding out what has been done so we don’t need to re-invent the wheel.   
2. Finding out the results of those actions as it gives us more data about the problem.

Keep in mind here as well that reliability is important.  If something was presumably done but there is no documentation of it or the results, it is probably worthwhile re-doing it, depending on how complex the task was.

8. What is the expected resolution?

This might sound like a repeat of question 2 (What did you expect to happen), but it is very different.

An expected resolution might be to “avoid the crash” which you can do by preemptive recycling, separating the app into a different application pool, moving to 64-bit, reverting back to a previous build etc.

Another expected resolution might be to “find the root cause to avoid that it happens again”.

And a third may be, “if we get the pages to consistently respond in less than 5 seconds under load we are good”.

Defining the expected resolution is crucial as this is what you will measure and verify the solution against to determine when you are done troubleshooting.  If you don’t have this, how will you know when you are done?  

This expected resolution, just like requirements in a software project, should also be agreed upon so that all involved parties work against the same goal.

9. Is there anything that would prohibit certain troubleshooting steps or solutions?

Knowing the limitations of data gathering or certain solutions both determine what actions you can take and change the expectations of what can be achieved. 

On this question I would expect answers like, we can not install any tools on the server without going through a process of 10 change requests, or we can reproduce this on a test machine so you can live debug if you need to.

 

Summarizing everything

Once I have all the facts from the questions above I usually sit down and summarize everything in a format that looks something like this:

Problem description:
===========================

Expected resolution:
===========================

Troubleshooting done:
============================

Next steps:
===========================

Timeline for next steps:
===========================

This is a format I use to document problems throughout the course of the troubleshooting, updating it as the troubleshooting progresses and this is a format that we use both in our support cases and our bug reports in different variations.  

It has proven to be a pretty effective way to keep tabs on issues especially when we have to collaborate on a case or hand over an issue to someone else for some reason or another.

 

Happy troubleshooting,

Tess

Summer is almost over and a new season of heavy work is beginning.  I wanted to take this opportunity to start on a new page and make sure that I’m spending my cycles doing the right things. 

I’ve been blogging for almost 3 years now and throughout the blogs life I’ve been blogging pretty much the same way, trying to show examples of where me and fellow developers have gotten into situations that have caused crashes, memory leaks, exceptions, hangs etc., so that we can all learn from the situations and avoid them.

Throughout the 3 years I have gotten a lot of comments saying that posts have helped developers, but I have also gotten more private comments saying that what I am talking about lies so far out of their realms that they don’t understand half of what I am saying.  Some of these comments have come from people that I know and respect and that I know are extremely proficient developers, which makes me think that maybe I am not explaining things in a very understandable or approachable way.  

I have also noticed that people tend to send me private emails rather than comment on the blog, maybe because people feel a bit too intimidated by post-mortem debugging to comment directly on the blog since the topic is a bit low-level. Maybe I am all wrong and that is not at all why people email me rather than comment:)

Either way, since this has been bugging me for a while, I wanted to let you know how I look at my blog and what I want it to be, and I also wanted to solicit your ideas on how to make it more that way.

My philosophy with the blog

I work as an ASP.NET Support Escalation Engineer which means that my job is to help developers solve these types of issues.  When I can help them resolve their issues that makes both me and them happy.   In reality though they would probably be a lot happier if they could resolve their own issues and didn’t have to call, or even more happy if they didn’t have these issues in the first place.

With all that, my goal has always been to try to avoid that people need to call support, so effectively my goal is to make my job obsolete:).  I realize that people will always run into pitfalls so I set up this blog to teach people debugging and make people more aware about the internals of asp.net and the CLR, and to learn from each other.

I want people to be less intimidated by debugging and see that it is in fact something that we can all do, something that is useful to all developers and something that is not as difficult or far out there as it may seem at first look.

I want people to ask lots of questions, or comment on things that they want to know more about, don’t understand or don’t agree with.  The more such comments and questions there are, the more I and others can learn from them.  In a perfect world, this would be a nice debugging community where we all learn from each other.

My ask from you

I would like to learn from you what ideas you have around how I can accomplish this goal better. 

How many of you just glance at the posts and think “hmm, probably interesting for someone who is deep down into debugging but I’ll never use this”?  How many of you really read them and learn something from them?

Are the posts too long, too short, too deep, not deep enough, too boring, too many stupid jokes?

What makes you shy away from a post and what makes you want to learn more? 

How can I make things more applicable to the world you live in?

What do you want to see more or less of?

How can I help you so that you can apply more of what I am talking about to your day to day job?

Do you want me to mix up the debugging posts more with some lighter posts about other topics?  If so, what would you be interested in? 

Feel free to be as candid as you can:)

 

Have fun,
Tess

58 Comments
Filed under: ,

I came back from a long vacation today and I’ve finally caught up with enough emails to do something useful with my day.

As I scanned through my blog email bucket I noticed that I had a lot of emails relating to my last post on dumping .net exceptions in windbg, and most of them asked why they see OutOfMemoryExceptions on the heap, and how to troubleshoot this.

I think I have mentioned this before, but it seems like it is worth mentioning again as it is probably buried deep in some post from way back when…

When you dump out all the exceptions on your .net GC heap (using !dumpheap -type Exception for example) these 3 types of exceptions will always show up, independently of what type of .net process you are debugging.

79330c30        1           72 System.ExecutionEngineException
79330ba0        1           72 System.StackOverflowException
79330b10        1           72 System.OutOfMemoryException

Contrary to what you may believe, this does not necessarily mean that the process has suffered from a stack overflow for example. 

When you run !dumpheap –type Exception, what you are really doing is dumping out any type of object on the GC heap whose name contains the word Exception, not just the ones that are thrown.

These 3 exceptions (ExecutionEngineException, StackOverflowException and OutOfMemoryException) are very special in the sense that it would be impossible to create an object of type OutOfMemoryException when you are in an OOM situation, as that would fail with yet another OOM.  In the same way you can’t create a StackOverflowException object when you are running into a stack overflow as this would require a call to its constructor which is impossible when the stack is overflowing.

To mitigate this, the CLR will generate these 3 exception objects at startup so that it can just throw them whenever the exception occurs.

In short, if you see these on the heap, don’t panic, unless you have some other evidence that they are occurring, they probably aren’t.   

To see if you are really suffering from one, run !threads and you should see if any threads have recently thrown one.   All of these will make the process crash/exit so you shouldn’t really see them unless you are also seeing a crash.  The only one that you might see without a crash is the OOM, but if you do see one when you run !threads or in the event viewer, you are pretty close to a crash so it should be investigated.

If you do see any of these running !threads, check out the post index to find articles related to troubleshooting each of these types of issues.

Have a good one,

Tess

I found a set of really nice, short videos by Arturo Toledo on the Silverlight blog that I thought were worth sharing.

He goes through some of the design principles using the Microsoft Expression tools and talks about what you should think about when designing a web page, logo or whatever it is you are designing.

Principles of design:

Balance  - how you can achieve a sense of balance, and avoid the feeling of being pulled too much to any part of the design.
Rhythm – using repeating visual patterns to create appealing visual effects.
Symmetry and asymmetry – how to achieve balance both in symmetrical and asymmetrical designs.
Emphasis – using different factors like form, color, shape etc. to make some elements stand out
Unity – how to make the viewer feel like everything in design belongs there
Points, Lines and Form – using points, lines and forms as markers for other visual elements
Colors – how to choose a color palette where the colors jive well together

Hands-on videos:

Experimenting with User Interface Colors with Expression Design
Creating Mosaic and Tiled Patterns using Expression Design
Learning to create Icons using Expression Design  
Creating a Sample Icon using Expression Design

I’m hoping there is more to come because so far these have been very interesting for someone like me who has always been intrigued by design but has no sense of what works and what doesn’t.

Tess

Recently I had a case where the customer had an issue with session variables.  The claim was that if they use in-proc session state their session variables would work just fine, but if they changed the session mode to stateserver in web.config their sessions were lost.

<sessionState mode="StateServer" cookieless="UseCookies" stateConnectionString="tcpip=127.0.0.1:42424"/>

When looking at bit closer at the repro, it wasn’t really all session variables that were lost, it was only ones set in Page_Error, and when they were read in an error page they redirected to (with the customErrors setting in web.config) would be null.

Why is there a difference in this case between in-proc and out of process session state?

When you use in-proc session state, the session objects are stored in the memory of the process, and thus when you run code like Session[“mySessionVar”] = “some value”;  it is immediately written to that session variable.

With out-of-proc session state on the other hand (stateserver or sqlsessionstate), all session variables for the current session are read from the store at the beginning of each request, and then any changes are committed back to the session store at the end of the request.

In a case where an exception occurs, the request is effectively aborted so none of the changes that are made to the session during that request are written back to the store.  

Old session variables and values will still be intact though…

So how do you handle this situation in a case where you use state server or SQL session state?

If you want to use session state to store values relating to errors in pages that should be available in an error page you can clear the error in page_error and manually redirect to the error page.

The code could look something like this:

protected void Page_Error(object sender, EventArgs e)
{
    Session["ErrorPage"] = "Default2.aspx";
    Exception ex = Server.GetLastError();
    Session["ErrorMessage"] = ex.Message.ToString();
    Server.ClearError();
    Response.Redirect("~/ErrorPages/ErrorPage.aspx");
}

Clearing the error allows the request to finish processing and the session variables to be committed to the store.

Laters,
Tess

If my post on dump debugging in VS 2010 piqued your interest, here is a list of a number of other new debugging features coming up in 2010

http://blogs.msdn.com/ms_joc/archive/2009/05/27/dev10-beta1-free-at-last.aspx

I for one will be following the blog to get the scoop…

On a related note, John Robbins also wrote a nice post about historical debugging in VS2010,  I haven’t tried it yet but I can already see a number of ASP.NET scenarios where it can be applied.

http://www.wintellect.com/CS/blogs/jrobbins/archive/2009/06/16/how-does-vs2010-historical-debugging-work.aspx

Laters,

Tess

3 Comments
Filed under: ,

I know that for a lot of you this is a bit far into the future, but I thought it would be nice to run through a few of my lab scenarios to show off some of the new features in Visual Studio 2010. 

I am sure that I will miss a bunch of cool features here as this post is just based on my first experience with this, but I will likely post more about these features as we get closer to RTM.  As we are only in Beta, anything and everything may change before RTM and the below is just based on my own impressions so there may be other better ways of doing things.

Why debug dumps with Visual Studio .NET

I love WinDbg and will probably continue using it for many years to come since it is a very simple and clean and powerful debugger.  I do realize though that not everyone debugs production issues (hangs/crashes etc.) every day (at least i hope not:)), and that most .net developers are very comfortable in Visual Studio.NET, so it is nice to be able to stay within that tool where you recognize the IDE and structure.

If you have compiled the app “debug” and set debug=true in web.config if you are working with an asp.net app, debugging a dump is very similar to being stopped at a breakpoint.  Of course, since it is not a live process you can’t hit go, but you can match up source code, see locals etc.  without having to learn a bunch of new commands.

There are a couple of caveats here:

1. .NET debugging only works on 4.0 dumps – with 3.5 and earlier you only get native debugging

2. Visual Studio as opposed to windbg doesn’t seem to use deferred symbol loading (I may be missing a setting here), but this means that when you start debugging a dump you load up all symbols which might take a while (at least the first time).  With windbgs deffered symbol loading, it only loads symbols when it needs them which makes it significantly faster to load up a dump in windbg if you only want to check some quick things.

3. Mileage varies depending on if your app is compiled debug, you have source lined up etc.   If the app is not compiled debug you will not get things like locals or source synchronization.

How do you gather the memory dumps?

You can get dumps with windbg (adplus) as shown in this post or debug diag, or in visual studio.net for that matter (debug/save dump as… when you are debugging)

Typically in a web scenario or in a scenario where you are reproing a problem in a winforms app or windows service on a customer machine you wouldn’t have visual studio installed there, and then adplus or debug diag are good options for gathering dumps.  

Setting up symbols

For Visual Studio to be able to resolve native (non-dotnet) function names/parameters/types etc., it uses symbols, like any other debugger.  For all Microsoft components there are public symbols located at http://msdl.microsoft.com/download/symbols.   If you have your own native components you would have to compile them with symbols in order to resolve function names.

You can set up the symbol paths for your symbols under Tools/options/debugging/symbols and a typical setup would look like this:

image

The first time it loads up the symbols it will find them in http://msdl.microsoft.com/download/symbols and as it downloads them it will put them in your cache location (c:\webcache).  The next time you look at a dump, it will check the c:\webcache location first, and only download the symbols it can’t find there.   This means that the first time you look at a dump (any dump) it may take a long time depending on your internet speed as these files can be pretty big, but the next time it should be significantly faster.

In here you can also add the locations of any of your own symbols and this setting is persisted, so the next time you open VS, the symbol paths are there for you.

Notice that there is a button for “Load symbols from Microsoft symbol servers”.  This is available once you have started debugging a dump or any process, and this will do the above for you, setting up a local cache if you haven’t done this already.  I like setting it up explicitly as pushing the “Load symbols from Microsoft symbol servers” leaves the UI “hanging” without visual feedback so you don’t know how far you have gotten in the process, while setting up a symbol path allows you to see the modules/symbols as they load.

Debugging a simple hang scenario (Debugging labs - Lab 1)

I recompiled Lab 1 with 4.0 (just switching some settings in web.config), ran a stress and grabbed a dump with adplus.  Look at the Lab 1 instructions on how to repro and get the dump if you are interested in following along.

1. Open the dump in Visual Studio

File/Open/File – select the dump

This will present you with the dump file summary which contains info about the dump, any exceptions that might have occurred (in this case just a wake debugger (0x80000007) from cdb when getting the dump), along with some info about the OS version and the CLR version.

image

If you move to the right (not shown here) there are also some options to set up symbol paths and debug either mixed or native.  The default is mixed.

2. Click on the super obvious green go button:)  to start debugging

This will start loading symbols and get you in debugging mode where you can see the stacks for all the threads etc. so grab a coffee, chat with your colleagues about the weekend or watch the symbols as they load:)

At the end it may pop up a message box talking about an unknown exception (this is the wake debugger exception so don’t be alarmed, just click ok or break)

Once you are done loading all the symbols you are presented with the current call stack.

image

If this would have been a dump taken at a crash or exception this stack is what you would have started investigating.  In the case of a hang dump this is rather meaningless since the active thread is thread 0 – just the main thread, which often has little or nothing to do with what you are debugging.

For hangs, we want to see all the stacks, and in particular those running .net code or those that appear blocked for any reason.

3. Open the “Parallel Stacks” window (Debug/Windows/Parallel Stacks)

This is a feature I really like in Visual Studio (and you can use it when you debug normally as well).  What it does is it pulls out all the stacks and looks for commonalities.   If you have 5 threads doing the exact same thing it will group them together so you get a very nice overview of what is going on.

In this case (image below) for example it shows us that we have 16 threads sitting in DataLayer.GetFeaturedProducts.  15 of these are waiting in a Monitor.Enter (note that it shows both .net and native stacks here), and one is sitting in a Thread.Sleep.

image

From this, and knowing the code we can deduce that all these requests for FeaturedProducts are waiting in a lock (Monitor.Enter) and the lock is probably owned by the thread sitting in the Thread.Sleep.

Note:  So far we haven’t done anything that requires debugging to be turned on, so even if debug=false, you can still get this type of information, and in most cases this might be all you need.

3.  Look at source code and locals.

If you want to go into more detail about what is going on, you can right click on the GetFeaturedProcudts frame for example and choose “Switch to frame”, which brings up the source code (if the source is lined up) and shows us with a green arrow what statement we are on (the lock).

If you click on the “Show threads in source” button on the debugging toolbar…

image

…it also highlights in gray the location that the other thread is on and you can also switch to that thread if you want to by right clicking the warning sign in the left sidebar and choosing “switch to frame”

image

In this case we don’t have a whole lot of locals set, but if you did you could get to them in the same way you would if debugging normally.

image

In summary:

I am really interested to see where this will be going in the future and what additional features if any we’ll see in RTM.  

Based on my first experience with this here are my impressions:

+ Why I would use Visual Studio to debug dumps

+ You don’t have to learn a new environment (windbg) or a lot of new commands (sos) for simple scenarios
+ Really nice visualization of the threads, it makes it very easy to spot the important ones
+ Matching up source code is very useful as with windbg you have to jump through hoops and match IL with source to figure out where you’re at
+ Looking at locals the Visual Studio way is a lot easier than using !do in sos to navigate through it.   As VS can make use of debug info it can also name the locals which is not possible in windbg.
+ The native and managed stacks are intermingled as Visual Studio is a true mixed mode debugger while windbg for example is really native only (using sos as a helper to look at .net stacks)

- Why I would stick with windbg

- If you don’t have debug info, you can’t get locals or items on the stacks.  You can get this with windbg/sos, even if you don’t have variable names it is still very useful information.
- Since Windbg uses deferred symbol loading it is a bit faster to load up a dump in windbg.
- You can debug .net 2.0/3.5/1.1/1.0 dumps etc. in windbg and 4.0 dumps
- In Visual Studio (beta 1) I have not found a way to look at the managed heaps so debugging memory issues is still a task for windbg, or by loading sos in visual studio.
- SOS contains a lot of nice commands that allow you to look at .net data in various ways (looking at exceptions, gc heap info etc.), again you can load sos in visual studio to get this info, but windbg feels more natural for this. 

In reality I will probably use a combination of both + debug diag depending on scenario, but I hope this post has given you some insight into the topic and what’s to come.

Have fun,

Tess

More Posts Next page »
 
Page view tracker