If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

ASP.NET Case Study: Hang with mixed-mode dlls

ASP.NET Case Study: Hang with mixed-mode dlls

  • Comments 4

If you use mixed mode dlls (assemblies with .net and c++ code) you need to take care to not have any .net entry points so that you don't end up with a GC/LoaderLock deadlock like this one

What is a managed/.net entry point you might ask... it basically means that during the loading of the assembly the assembly may call some .net methods.  For example, if you have a dllmain that calls into managed code, or if you have managed constructors for static value types.  In esscence, anything that would allow you to call into managed code whilst holding the loaderlock.  

Problem explanation

The loaderlock is a native critical section that is used when loading a dll using CreateObject, LoadLibrary, GetProcAddress, FreeLibrary, GetModuleHandle or on the first load when invoking a method using pinvoke. If you have a .net assembly referencing a mixed mode assembly you will also enter the loaderlock the first time you access something that requires you to load up that mixed mode assembly.  

If our mixedmode assembly is called MyPDFWriter.dll then scenario where you would see this deadlock would look like this.

Thread 1:  Loads MyPDFWriter.dll and gets the loaderlock. While loading MyPDFWriter.dll it executes some .net code and makes an allocation that triggers the GC so it is waiting for the GC (thread 2)

Thread 2 (GC Thread): Is performing a GC and in doing so it needs to get the loaderlock that thread 1 owns.

There are also similar scenarios where the deadlock chain is a little bit longer, but that is the basic story.  In short what you want to avoid is a chance to trigger or wait for a GC while holding the loaderlock.   The resolution to this issue is usually to compile the dlls with /NOENTRY.

 

There is an MDA (Managed Debugging Assistant) that can help identify attempts to execute managed code while holding the loaderlock, and this can be very effective to use if you suspect that you are running into this issue.

 

Another variation

Today I am going to talk about a variation of this issue where the mixed mode dlls don't have a managed entry point, or at least they don't have either .net code in dllmain or static constructors that can get them in trouble.  

Before I go into the technical discussion, i just want to mention that this will only occurr with mixed mode dlls, if the dlls are not loaded with Assembly.Load, and if you are running .net framework 1.0 or 1.1, and if you are running the server version of the GC.  I will explain why later, but just wanted to mention that so that you know if you fit the bill or not.

In this case we have 18 threads waiting for critical sections in stacks similar to this one:

  56  Id: 35a0.dbc Suspend: 0 Teb: 7ff82000 Unfrozen
ChildEBP RetAddr  Args to Child              
05effbc8 7c827d0b 7c83d236 000001a0 00000000 ntdll!KiFastSystemCallRet 
05effbcc 7c83d236 000001a0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc 
05effc08 7c83d281 000001a0 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x1a3 
05effc28 7c82ee3b 7c8877a0 00000000 7ffdf000 ntdll!RtlEnterCriticalSection+0xa8 
05effcb8 7c82ec9f 05effd28 05effd28 00000000 ntdll!LdrpInitializeThread+0x68 
05effd14 7c8284c5 05effd28 7c800000 00000000 ntdll!_LdrpInitialize+0x16f 
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25 

The critical section we are waiting for is the loaderlock

0:032> !locks 7c8877a0 

CritSec ntdll!LdrpLoaderLock+0 at 7c8877a0
WaiterWoken        No
LockCount          18
RecursionCount     2
OwningThread       228c
EntryCount         0
ContentionCount    30
*** Locked

and this is owned by the thread with the OS ID 228c... if we move to this thread we can see that it has triggered a GC and is waiting for the GC to finish (whilst holding the loaderlock) so we definitely fit the scenario 

0:032> ~~[228c]s
eax=00000000 ebx=00000000 ecx=00000027 edx=0000010a esi=00000548 edi=00000000
eip=7c8285ec esp=02dba2dc ebp=02dba34c iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
ntdll!KiFastSystemCallRet:
7c8285ec c3              ret

0:023> kb
ChildEBP RetAddr  Args to Child              
02dba2d8 7c827d0b 77e61d1e 00000548 00000000 ntdll!KiFastSystemCallRet 
02dba2dc 77e61d1e 00000548 00000000 00000000 ntdll!NtWaitForSingleObject+0xc 
02dba34c 77e61c8d 00000548 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac 
02dba360 792085fb 00000548 ffffffff 00000000 kernel32!WaitForSingleObject+0x12 
02dba380 79203fac 00000000 00000000 00000218 mscorsvr!GCHeap::GarbageCollectGeneration+0x1a9 
02dba3b0 791b7cb4 03f48718 00000218 00000000 mscorsvr!gc_heap::allocate_more_space+0x181 
02dba5d8 791bb333 03f48718 00000216 00000000 mscorsvr!GCHeap::Alloc+0x7b 
02dba5ec 791c121e 00000216 00000000 00000000 mscorsvr!Alloc+0x3a 
02dba608 791b2de3 00000103 1612bd1c 791b2e14 mscorsvr!SlowAllocateString+0x26 
02dba614 791b2e14 7fffffff 1612bd1c 02dba638 mscorsvr!UnframedAllocateString+0xc 
02dba67c 79996e54 00000101 7fffffff 79996daf mscorsvr!FramedAllocateString+0x2c 
02dba688 79996daf 1612bcd4 1611bdbc 7999fcd5 mscorlib_79990000+0x6e54
02dba694 7999fcd5 00000100 1612bb2c 00000001 mscorlib_79990000+0x6daf
02dba6b0 79aaf348 00000080 00000000 00000001 mscorlib_79990000+0xfcd5
02dba6c8 79aaf4b9 1612bb44 00000004 00000001 mscorlib_79990000+0x11f348
02dba6d8 79aaeb47 1611bf70 1611bf88 1611bd8c mscorlib_79990000+0x11f4b9
02dba700 79aaec38 00000001 02dba76c 1611bd8c mscorlib_79990000+0x11eb47
02dba72c 79aaee7c 00000000 1611bd7c 79a84c4f mscorlib_79990000+0x11ec38
02dba738 79a84c4f 00000000 00000000 057ccdb8 mscorlib_79990000+0x11ee7c
02dba754 79998b7a 1611b0c4 791b202e 00000000 mscorlib_79990000+0xf4c4f

Looking at the GC threads (the ones starting with "mscorsvr!gc_heap::gc_thread_stub" ) we can see that one of them is waiting for a critical section (the loaderlock)

0:036> kL
ChildEBP RetAddr  
0442f878 7c827d0b ntdll!KiFastSystemCallRet
0442f87c 7c83d236 ntdll!NtWaitForSingleObject+0xc
0442f8b8 7c83d281 ntdll!RtlpWaitOnCriticalSection+0x1a3
0442f8d8 7c82f20c ntdll!RtlEnterCriticalSection+0xa8
0442f90c 7c82f336 ntdll!LdrLockLoaderLock+0x133
0442f988 7c82f2a3 ntdll!LdrGetDllHandleEx+0x94
0442f9a4 77e65185 ntdll!LdrGetDllHandle+0x18
0442f9f0 77e6528f kernel32!GetModuleHandleForUnicodeString+0x20
0442fe68 77e65155 kernel32!BasepGetModuleHandleExW+0x17f
0442fe80 792094a5 kernel32!GetModuleHandleW+0x29
0442feac 792094f2 mscorsvr!GetProcessMemoryLoad+0x1a
0442ff1c 7920810d mscorsvr!gc_heap::generation_to_condemn+0x22d
0442ff88 792036b0 mscorsvr!gc_heap::garbage_collect+0x110
0442ffac 79227e06 mscorsvr!gc_heap::gc_thread_function+0x42
0442ffb8 77e64829 mscorsvr!gc_heap::gc_thread_stub+0x1e
0442ffec 00000000 kernel32!BaseThreadStart+0x34

So with this we have identified our loaderloc/GC deadlock.  Now the question is why do we run into this and what can we do about it...

If we look at the .net stack for thread 23 (228c) we can see that it is doing policy resolution

0:023> !clrstack
Thread 23
ESP         EIP       
0x02dba658  0x7c8285ec [FRAME: HelperMethodFrame] 0x793e67b0 is not a MethodDesc
0x02dba684  0x79996e54 [DEFAULT] String System.String.GetStringForStringBuilder(String,I4)
0x02dba690  0x79996daf [DEFAULT] [hasThis] String System.Text.StringBuilder.GetNewString(String,I4)
0x02dba6a0  0x7999fcd5 [DEFAULT] [hasThis] Class System.Text.StringBuilder System.Text.StringBuilder.Append(SZArray Char,I4,I4)
0x02dba6c0  0x79aaf348 [DEFAULT] [hasThis] Void System.Security.Util.Tokenizer.SBArrayAppend(Char)
0x02dba6d0  0x79aaf4b9 [DEFAULT] [hasThis] I4 System.Security.Util.Tokenizer.NextTokenType()
0x02dba6e0  0x79aaeb47 [DEFAULT] [hasThis] Void System.Security.Util.Parser.ParseContents(Class System.Security.SecurityElement,Boolean)
0x02dba70c  0x79aaec38 [DEFAULT] [hasThis] Void System.Security.Util.Parser.ParseContents(Class System.Security.SecurityElement,Boolean)
0x02dba738  0x79aaee7c [DEFAULT] [hasThis] Void System.Security.Util.Parser..ctor(Class System.Security.Util.Tokenizer)
0x02dba740  0x79a84c4f [DEFAULT] [hasThis] Void System.Security.Policy.PolicyLevel.Load(Boolean)
0x02dba774  0x79a84abb [DEFAULT] [hasThis] Void System.Security.Policy.PolicyLevel.IndividualCheckLoaded(Boolean)
0x02dba7a4  0x79a849e2 [DEFAULT] [hasThis] Void System.Security.Policy.PolicyLevel.CheckLoaded(Boolean)
0x02dba7e0  0x79a88caf [DEFAULT] [hasThis] Class System.Security.Policy.PolicyStatement System.Security.Policy.PolicyLevel.Resolve(Class System.Security.Policy.Evidence,I4,SZArray Char)
0x02dba80c  0x79abcb04 [DEFAULT] [hasThis] Class System.Security.PermissionSet System.Security.PolicyManager.Resolve(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet)
0x02dba864  0x79abe8fd [DEFAULT] Class System.Security.PermissionSet System.Security.SecurityManager.ResolvePolicy(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet,Class System.Security.PermissionSet,Class System.Security.PermissionSet,ByRef Class System.Security.PermissionSet,Boolean)
0x02dba8a4  0x79abe781 [DEFAULT] Class System.Security.PermissionSet System.Security.SecurityManager.ResolvePolicy(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet,Class System.Security.PermissionSet,Class System.Security.PermissionSet,ByRef Class System.Security.PermissionSet,ByRef I4,Boolean)
0x02dbab6c  0x791b7f92 [FRAME: GCFrame] 
0x02dbb098  0x791b7f92 [FRAME: DebuggerClassInitMarkFrame] 
0x02dbb5c0  0x791b7f92 [FRAME: GCFrame] 
0x02dbc72c  0x791b7f92 [FRAME: GCFrame] 
0x02dbd724  0x791b7f92 [FRAME: GCFrame]

This means that our managed entry point here was neither custom .net calls in dllmain or some initialization of static variables.  The reason we are doing policy resolution is because we are loading up a strong named assembly and while doing so it needs to do policy resolution.

I was able to figure out which dll we were trying to load but the steps I took to find it are less scientific that I would have wished for, so don't worry about them too much (I wont be able to explain why I found it there:)), I just want to show how I found it.

I dumped out the stackobjects using !dso and poking around i found a char[] that I dumped out that happened to contain the name of the mixed mode assembly (random.mixed.modedll.dll).  It was actually called something else, but since the 3rd party mixed mode dll is not at fault here I choose not to name it.  

0:023> !dso
Thread 23
ESP/REG    Object     Name
...
0x2dba7fc 0x1611a288 System.Security.Permissions.StrongNamePublicKeyBlob
0x2dba804 0x161199b4 System.Char[]
0x2dba810 0x161170a4 System.Security.Policy.Evidence
0x2dba81c 0x161199b4 System.Char[]
0x2dba820 0x16117354 System.Collections.ArrayList/ArrayListEnumeratorSimple
0x2dba824 0x16116e8c System.Security.Policy.PolicyLevel
0x2dba828 0x16117210 System.Security.Policy.PermissionRequestEvidence
...

0:023> du 0x161199b4 
161199b4  ".Е."
0:023> du
161199bc  "."
0:023> du
161199c0  ""
0:023> du
161199c2  ".file://C:/windows/assembly/gac/"
16119a02  "random.mixed.modedll/3.1.103.0__"
16119a42  "f4bbbf243f314012/random.mixed.mod"
16119a82  "edll.dll.."

You can see if an assembly is mixed mode or not by opening it up in reflector and checking if it is referencing Microsoft.VisualC.  If it does, it is mixed mode.

  

I should add also that in this case we weren't directly loading this dll, it was loaded because the dll that we were loading had a reference to it.

Possible solutions

As I mentioned earlier this issue only happens on 1.1 or 1.0, in an application that loads strong named mixed mode dlls in a way that uses the loaderlock. 

The reason it does not happen in 2.0 is that the compilation model and policy resolution is completely different. It is so different that it is not feasible to back port this to 1.1 since it means a complete change in architecture.

The reason it only happens when using the server GC (which you do on multiproc boxes in services like asp.net) is because when you use the server GC, garbage collection is done on separate threads.  If you use the workstationgc you would GC on the thread that holds the loaderlock and in this case you could not run in to this scenario.

Finally, if you use assembly.load you don't take the loaderlock so in this case there is no chance of a loaderlock/GC deadlock.

 

With this in mind there are a couple of different resolutions to the issue.

1. Move to 2.0.  This is probably the best solution if it is feasible.

2. Stop using the strong named mixed mode dll.  This one is self explanatory but of course you are probably using the assembly for a reason:)

3. Change the gcversion to non-concurrent workstation (see this post for more info on the GC and GC modes).  You can do this temporarily to get a quick fix while resolving the issue, but in the long run I would not recommend running the workstation version on a multiproc asp.net app because of the potential performance degradation and higher memory usage that this may incurr.  The serverGC is optimized for this scenario.

4. Manually load up the mixed mode assemblies in application_start or anywhere prior to the location where you would normally load them, using assembly.load.  This will perform the policy check so that you don't have to perform it while holding the loaderlock.

Note: adding the strong named assembly to the bin directory is not a solution as it is not supported and can cause other blocking issues or exceptions. See this post for more details.

 Until next time,

Tess

  • Does 3.5 have the same compilation model and policy resolution as 2.0?

  • .NET: Introducing the Big Mailer Utility - ASP.NET Upload Utility ASP.NET Case Study: Hang with mixed

  • Iturea,

    yes, 3.5 is not its own CLR core, it is extras built on the 2.0 core so you are still using the 2.0 compilation model and garbage collector when using 3.5

  • Well this is kind of deep. I have used dlls in various ways but I guess this is special case and worth studing.

    Thanks

    Josh

    http://riverasp.net

Page 1 of 1 (4 items)
Leave a Comment
  • Please add 2 and 3 and type the answer here:
  • Post