Usually I write blog articles on topics that people request via email or comments on other blogs. Well, nobody has ever asked me to write anything about shutdown.
But then I look at all the problems that occur during process shutdown in the unmanaged world. These problems occur because many people don’t understand the rules, or they don’t follow the rules, or the rules couldn’t possibly work anyway.
We’ve taken a somewhat different approach for managed applications. But I don’t think we’ve ever explained in detail what that approach is, or how we expect well-written applications to survive an orderly shutdown. Furthermore, managed applications still execute within an unmanaged OS process, so they are still subject to the OS rules. And in V1 and V1.1 of the CLR we’ve horribly violated some of those OS rules related to startup and shutdown. We’re trying to improve our behavior here, and I’ll discuss that too.
Questionable APIs
Unfortunately, I can’t discuss the model for shutting down managed applications without first discussing how unmanaged applications terminate. And, as usual, I’ll go off on a bunch of wild tangents.
Ultimately, every OS process shuts down via a call to ExitProcess or TerminateProcess. ExitProcess is the nice orderly shutdown, which notifies each DLL of the termination. TerminateProcess is ruder, in that the DLLs are not informed.
The relationship between ExitProcess and TerminateProcess has a parallel in the thread routines ExitThread and TerminateThread. ExitThread is the nice orderly thread termination, whereas if you ever call TerminateThread you may as well kill the process. It’s almost guaranteed to be in a corrupt state. For example, you may have terminated the thread while it holds the lock for the OS heap. Any thread attempting to allocate or release memory from that same heap will now block forever.
Realistically, Win32 shouldn’t contain a TerminateThread service. To a first approximation, anyone who has ever used this service has injected a giant bug into his application. But it’s too late to remove it now.
In that sense, TerminateThread is like System.Threading.Thread.Suspend and Resume. I cannot justify why I added those services. The OS SuspendThread and ResumeThread are extremely valuable to a tiny subset of applications. The CLR itself uses these routines to take control of threads for purposes like Garbage Collection and – as we’ll see later – for process shutdown. As with TerminateThread, there’s a significant risk of leaving a thread suspended at a “bad” spot. If you call SuspendThread while a thread is inside the OS heap lock, you better not try to allocate or free from that same heap. In a similar fashion, if you call SuspendThread while a thread holds the OS loader lock (e.g. while the thread is executing inside DllMain) then you better not call LoadLibrary, GetProcAddress, GetModuleHandle, or any of the other OS services that require that same lock.
Even worse, if you call SuspendThread on a thread that is in the middle of exception dispatching inside the kernel, a subsequent GetThreadContext or SetThreadContext can actually produce a blend of the register state at the point of the suspension and the register state that was captured when the exception was triggered. If we attempt to modify a thread’s context (perhaps bashing the EIP – on X86 – to redirect the thread’s execution to somewhere it will synchronize with the GC or other managed suspension), our update to EIP might quietly get lost. Fortunately it’s possible to coordinate our user-mode exception dispatching with our suspension attempts in order to tolerate this race condition.
And probably the biggest gotcha with using the OS SuspendThread & ResumeThread services is on Win9X. If a Win9X box contains real-mode device drivers (and yes, some of them still do), then it’s possible for the hardware interrupt associated with the device to interact poorly with the thread suspension. Calls to GetThreadContext can deliver a register state that is perturbed by the real-mode exception processing. The CLR installs a VxD on those operating systems to detect this case and retry the suspension.
Anyway, with sufficient care and discipline it’s possible to use the OS SuspendThread & ResumeThread to achieve some wonderful things.
But the managed Thread.Suspend & Resume are harder to justify. They differ from the unmanaged equivalents in that they only ever suspend a thread at a spot inside managed code that is “safe for a garbage collection.” In other words, we can report all the GC references at that spot and we can unwind the stack and register state to reveal our caller’s execution state.
Because we are at a place that’s safe for garbage collection, we can be sure that Thread.Suspend won’t leave a thread suspended while it holds an OS heap lock. But it may be suspended while it holds a managed Monitor (‘lock’ in C# or ‘SyncLock’ in VB.NET). Or it may be suspended while it is executing the class constructor (.cctor) of an important class like System.String. And over time we intend to write more of the CLR in managed code, so we can enjoy all the benefits. When that happens, a thread might be suspended while loading a class or resolving security policy for a shared assembly or generating shared VTables for COM Interop.
The real problem is that developers sometimes confuse Thread.Suspend with a synchronization primitive. It is not. If you want to synchronize two threads, you should use appropriate primitives like Monitor.Enter, Monitor.Wait, or WaitHandle.WaitOne. Of course, it’s harder to use these primitives because you actually have to write code that’s executed by both threads so that they cooperate nicely. And you have to eliminate the race conditions.
I’m already wandering miles away from Shutdown, and I need to get back. But I can’t resist first mentioning that TerminateThread is distinctly different from the managed Thread.Abort service, both in terms of our aspirations and in terms of our current implementation.
Nobody should ever call TerminateThread. Ever.
Today you can safely call Thread.Abort in two scenarios.
Longer term, we’re committed to building enough reliability infrastructure around Thread.Abort that you can reasonably expect to use it to control threads that remain completely inside managed code. Aborting threads that interleave managed and unmanaged execution in a rich way will always remain problematic, because we are limited in how much we can control the unmanaged portion of that execution.
ExitProcess in a nutshell
So what does the OS ExitProcess service actually do? I’ve never read the source code. But based on many hours of stress investigations, it seems to do the following:
1) Kill all the threads except one, whatever they are doing in user mode. On NT-based operating systems, the surviving thread is the thread that called ExitProcess. This becomes the shutdown thread. On Win9X-based operating systems, the surviving thread is somewhat random. I suspect that it’s the last thread to get around to committing suicide.
2) Once only one thread survives, no further threads can enter the process… almost. On NT-based systems, I only see superfluous threads during shutdown if a debugger attaches to the process during this window. On Win9X-based systems, any threads that were created during this early phase of shutdown are permitted to start up. The DLL_THREAD_ATTACH notifications to DllMain for the starting threads will be arbitrarily interspersed with the DLL_PROCESS_DETACH notifications to DllMain for the ensuing shutdown. As you might expect, this can cause crashes.
3) Since only one thread has survived (on the more robust NT-based operating systems), the OS now weakens all the CRITICAL_SECTIONs. This is mixed blessing. It means that the shutdown thread can allocate and free objects from the system heap without deadlocking. And it means that application data structures protected by application CRITICAL_SECTIONs are accessible. But it also means that the shutdown thread can see corrupt application state. If one thread was wacked in step #1 above while it held a CRITICAL_SECTION and left shared data in an inconsistent state, the shutdown thread will see this inconsistency and must somehow tolerate it. Also, data structures that are protected by synchronization primitives other than CRITICAL_SECTION are still prone to deadlock.
4) The OS calls the DllMain of each loaded DLL, giving it a DLL_PROCESS_DETACH notification. The ‘lpReserved’ argument to DllMain indicates whether the DLL is being unloaded from a running process or whether the DLL is being unloaded as part of a process shutdown. (In the case of the CLR’s DllMain, we only ever receive the latter style of notification. Once we’re loaded into a process, we won’t be unloaded until the process goes away).
5) The process actually terminates, and the OS reclaims all the resources associated with the process.
Well, that sounds orderly enough. But try running a multi-threaded process that calls ExitProcess from one thread and calling HeapAlloc / HeapFree in a loop from a second thread. If you have a debugger attached, eventually you will trap with an ‘INT 3’ instruction in the OS heap code. The OutputDebugString message will indicate that a block has been freed, but has not been added to the free list… It has been leaked. That’s because the ExitProcess wacked your 2nd thread while it was in the middle of a HeapFree operation.
This is symptomatic of a larger problem. If you wack threads while they are performing arbitrary processing, your application will be left in an arbitrary state. When the DLL_PROCESS_DETACH notifications reach your DllMain, you must tolerate that arbitrary state.
I’ve been told by several OS developers that it is the application’s responsibility to take control of all the threads before calling ExitProcess. That way, the application will be in a consistent state when DLL_PROCESS_DETACH notifications occur. If you work in the operating system, it’s reasonable to consider the “application” to be a monolithic homogenous piece of code written by a single author. So of course that author should put his house in order and know what all the threads are doing before calling ExitProcess.
But if you work on an application, you know that there are always multiple components written by multiple authors from different vendors. These components are only loosely aware of each other’s implementations – which is how it should be. And some of these components have extra threads on the side, or they are performing background processing via IOCompletion ports, threadpools, or other techniques.
Under those conditions, nobody can have the global knowledge and global control necessary to call ExitProcess “safely”. So, regardless of the official rules, ExitProcess will be called while various threads are performing arbitrary processing.
The OS Loader Lock
It’s impossible to discuss the Win32 model for shutting down a process without considering the OS loader lock. This is a lock that is present on all Windows operating systems. It provides mutual exclusion during loading and unloading.
Unfortunately, this lock is held while application code executes. This fact alone is sufficient to guarantee disaster.
If you can avoid it, you must never hold one of your own locks while calling into someone else’s code. They will screw you every time.
Like all good rules, this one is made to be broken. The CLR violates this rule in a few places. For example, we hold a ‘class constructor’ lock for your class when we call your .cctor method. However, the CLR recognizes that this fact can lead to deadlocks and other problems. So we have rules for weakening this lock when we discover cycles of .cctor locks in the application, even if these cycles are distributed over multiple threads in multi-threaded scenarios. And we can see through various other locks, like the locks that coordinate JITting, so that larger cycles can be detected. However, we deliberately don’t look through user locks (though we could see through many of these, like Monitors, if we chose). Once we discover a visible, breakable lock, we allow one thread in the cycle to see uninitialized state of one of the classes. This allows forward progress and the application continues. See my earlier blog on “Initializing code” for more details.
Incidentally, I find it disturbing that there’s often little discipline in how managed locks like Monitors are used. These locks are so convenient, particularly when exposed with language constructs like C# lock and VB.NET SyncLock (which handle backing out of the lock during exceptions), that many developers ignore good hygiene when using them. For example, if code uses multiple locks then these locks should typically be ranked so that they are always acquired in a predictable order. This is one common technique for avoiding deadlocks.
Anyway, back to the loader lock. The OS takes this lock implicitly when it is executing inside APIs like GetProcAddress, GetModuleHandle and GetModuleFileName. By holding this lock inside these APIs, the OS ensures that DLLs are not loading and unloading while it is groveling through whatever tables it uses to record the state of the process.
So if you call those APIs, you are implicitly acquiring a lock.
That same lock is also acquired during a LoadLibrary, FreeLibrary, or CreateThread call. And – while it is held – the operating system will call your DllMain routine with a notification. The notifications you might see are:
DLL_THREAD_ATTACH
The thread that calls your DllMain has just been injected into the process. If you need to eagerly allocate any TLS state, this is your opportunity to do so. In the managed world, it is preferable to allocate TLS state lazily on the first TLS access on a given thread.
DLL_THREAD_DETACH
The thread that calls your DllMain has finished executing the thread procedure that it was started up with. After it finishes notifying all the DLLs of its death in this manner, it will terminate. Many unmanaged applications use this notification to de-allocate their TLS data. In the managed world, managed TLS is automatically cleaned up without your intervention. This happens as a natural consequence of garbage collection.
DLL_PROCESS_ATTACH
The thread that calls your DllMain is loading your DLL via an explicit LoadLibraryEx call or similar technique, like a static bind. The lpReserved argument indicates whether a dynamic or static bind is in progress. This is your opportunity to initialize any global state that could not be burned into the image. For example, C++ static initializers execute at this time. The managed equivalent has traditionally been a class constructor method, which executes once per AppDomain. In a future version of the CLR, we hope to provde a more convenient module constructor concept.
DLL_PROCESS_DETACH
If the process is terminating in an orderly fashion (ExitProcess), your DllMain will receive a DLL_PROCESS_DETACH notification where the lpReserved argument is non-null. If the process is terminating in a rude fashion (TerminateProcess), your DllMain will receive no notification. If someone unloads your DLL via a call to FreeLibrary or equivalent, the process will continue executing after you unload. This case is indicated by a null value for lpReserved. In the managed world, de-initialization happens through notifications of AppDomain unload or process exit, or through finalization activity.
The DLL_THREAD_ATTACH and DLL_THREAD_DETACH calls have a performance implication. If you have loaded 100 DLLs into your process and you start a new thread, that thread must call 100 different DllMain routines. Let’s say that these routines touch a page or two of code each, and a page of data. That might be 250 pages (1 MB) in your working set, for no good reason.
The CLR calls DisableThreadLibraryCalls on all managed assemblies other than certain MC++ IJW assemblies (more on this later) to avoid this overhead for you. And it’s a good idea to do the same on your unmanaged DLLs if they don’t need these notifications to manage their TLS.
Writing code inside DllMain is one of the most dangerous places to write code. This is because you are executing inside a callback from the OS loader, inside the OS loader lock.
Here are some of the rules related to code inside DllMain:
1) You must never call LoadLibrary or otherwise perform a dynamic bind.
2) You must never attempt to acquire a lock, if that lock might be held by a thread that needs the OS loader lock. (Acquiring a heap lock by calling HeapAlloc or HeapFree is probably okay).
3) You should never call into another DLL. The danger is that the other DLL may not have initialized yet, or it may have already uninitialized. (Calling into kernel32.dll is probably okay).
4) You should never start up a thread or terminate a thread, and then rendezvous with that other thread’s start or termination.
As we shall see, the CLR violates some of these rules. And these violations have resulted in serious consequences for managed applications – particularly managed applications written in MC++.
And if you’ve ever written code inside DllMain – including code that’s implicitly inside DllMain like C++ static initializers or ‘atexit’ routines – then you’ve probably violated some of these rules. Rule #3 is especially harsh.
The fact is, programs violate these rules all the time and get away with it. Knowing this, the MC++ and CLR teams made a bet that they could violate some of these rules when executing IJW assemblies. It turns out that we bet wrong.
I’m going to explain exactly how we screwed this up with IJW assemblies, but first I need to explain what IJW assemblies are.
IJW
IJW is how we internally refer to mixed managed / unmanaged images. If you compile a MC++ assembly with ‘/clr’ in V1 or V1.1, it almost certainly contains a mixture of managed and unmanaged constructs.
In future versions, I expect there will be ways to compile MC++ assemblies with compiler-enforced guarantees that the image is guaranteed pure managed, or guaranteed pure verifiable managed, or – ultimately – perhaps even pure verifiable 32-bit / 64-bit neutral managed. In each case, the compiler will necessarily have to restrict you to smaller and smaller subsets of the C++ language. For example, verifiable C++ cannot use arbitrary unmanaged pointers. Instead, it must restrict itself to managed pointers and references, which are reported to the garbage collector and which follow certain strict rules. Furthermore, 32-bit / 64-bit neutral code cannot consume the declarations strewn through the windows.h headers, because these pick a word size during compilation.
IJW is an acronym for “It Just Works” and it reflects the shared goal of the C++ and CLR teams to transparently compile existing arbitrary C++ programs into IL. I think we did an amazing job of approaching that goal, but of course not everything “just works.” First, there are a number of constructs like inline assembly language that cannot be converted to managed execution. The C++ compiler, linker and CLR ensure that these methods are left as unmanaged and that managed callers transparently switch back to unmanaged before calling them.
So inline X86 assembly language must necessarily remain in unmanaged code. Some other constructs are currently left in unmanaged code, though with sufficient effort we could provide managed equivalents. These other constructs include setjmp / longjmp, member pointers (like pointer to virtual method), and a reasonable startup / shutdown story (which is what this blog article is supposed to be about).
I’m not sure if we ever documented the constructs that are legal in a pure managed assembly, vs. those constructs which indicate that the assembly is IJW. Certainly we have a strict definition of this distinction embedded in our code, because the managed loader considers it when loading. Some of the things we consider are:
Of course, IJW assemblies can have many imports, exports, fixups, and other constructs. As with pure managed assemblies, the entrypoint is constrained to be a JMP to mscoree.dll’s _CorExeMain or _CorDllMain function. This is the “outer entrypoint”. However, the COM+ header of the PE file has an optional “inner entrypoint”. Once the CLR has proceeded far enough into the loading process on a DLL, it will dispatch to this inner entrypoint which is… your normal DllMain. In V1 and V1.1, this inner entrypoint is expressed as a token to a managed function. Even if your DllMain is written as an unmanaged function, we dispatch to a managed function which is defined as a PInvoke out to the unmanaged function.
Now we can look at the set of rules for what you can do in a DllMain, and compare it to what the CLR does when it sees an IJW assembly. The results aren’t pretty. Remember that inside DllMain:
You must never call LoadLibrary or otherwise perform a dynamic bind
With normal managed assemblies, this isn’t a concern. For example, most pure managed assemblies are loaded through Assembly.Load or resolution of an AssemblyRef – outside of the OS loader lock. Even activation of a managed COM object through OLE32’s CoCreateInstance will sidestep this issue. The registry entries for the CLSID always mention mscoree.dll as the server. A subkey is consulted by mscoree.dll – inside DllGetClassObject and outside of the OS loader lock – to determine which version of the runtime to spin up and which assembly to load.
But IJW assemblies have arbitrary DLL exports. Therefore other DLLs, whether unmanaged or themselves IJW, can have static or dynamic (GetProcAddress) dependencies on an IJW assembly. When the OS loads the IJW assembly inside the loader lock, the OS further resolves the static dependency from the IJW assembly to mscoree.dll’s _CorDllMain. Inside _CorDllMain, we must select an appropriate version of the CLR to initialize in the process. This involves calling LoadLibrary on a particular version of mscorwks.dll, violating our first rule for DllMain.
So what goes wrong when this rule is violated? Well, the OS loader has already processed all the DLLs and their imports, walking the tree of static dependencies and forming a loading plan. It is now executing on this plan. Let’s say that the loader’s plan is to first initialize an IJW assembly, then initialize its dependent mscoree.dll reference, and then initialize advapi32.dll. (By ‘initialize’, I mean give that DLL its DLL_PROCESS_ATTACH notification). When mscoree.dll decides to LoadLibrary mscorwks.dll, a new loader plan must be created. If mscorwks.dll depends on advapi32.dll (and of course it does), we have a problem. The OS loader already has advapi32.dll on its pending list. It will initialize that DLL when it gets far enough into its original loading plan, but not before.
If mscorwks.dll needs to call some APIs inside advapi32.dll, it will now be making those calls before advapi32.dll’s DllMain has been called. This can and does lead to arbitrary failures. I personally hear about problems with this every 6 months or so. That’s a pretty low rate of failure. But one of those failures was triggered when a healthy application running on V1 of the CLR was moved to V1.1 of the CLR. Ouch.
You must never attempt to acquire a lock, if that lock might be held by a thread that needs the OS loader lock
It’s not possible to execute managed code without potentially acquiring locks on your thread. For example, we may need to initialize a class that you need access to. If that class isn’t already initialized in your AppDomain, we will use a .cctor lock to coordinate initialization. Along the same lines, if a method requires JIT compilation we will use a lock to coordinate this. And if your thread allocates a managed object, it may have to take a lock. (We don’t take a lock on each allocation if we are executing on a multi-processor machine, for obvious reasons. But eventually your thread must coordinate with the garbage collector via a lock before it can proceed with more allocations).
So if you execute managed code inside the OS loader lock, you are going to contend for a CLR lock. Now consider what happens if the CLR ever calls GetModuleHandle or GetProcAddress or GetModuleFileName while it holds one of those other locks. This includes implicit calls to LoadLibrary / GetProcAddress as we fault in any lazy DLL imports from the CLR.
Unfortunately, the sequence of lock acquisition is inverted on the two threads. This yields a classic deadlock.
Once again, this isn’t a concern for pure managed assemblies. The only way a pure managed assembly can execute managed code inside the OS loader lock is if some unmanaged code explicitly calls into it via a marshaled out delegate or via a COM call from its own DllMain. That’s a bug in the unmanaged code! But with an IJW assembly, some methods are managed and some are unmanaged. The compiler, linker and CLR conspire to make this fact as transparent as possible. But any call from your DllMain (i.e. from your inner entrypoint) to a method that happened to be emitted as IL will set you up for this deadlock.
You should never call into another DLL
It’s really not possible to execute managed code without making cross-DLL calls. The JIT compiler is in a different DLL from the ExecutionEngine. The ExecutionEngine is in a different DLL from your IJW assembly.
Once again, pure managed assemblies don’t usually have a problem here. I did run into one case where one of the Microsoft language compilers was doing a LoadLibrary of mscorlib.dll. This had the side effect of spinning up the CLR inside the OS loader lock and inflicting all the usual IJW problems onto the compilation process. Since managed assemblies have no DLL exports, it’s rare for applications to load them in this manner. In the case of this language compiler, it was doing so for the obscure purpose of printing a banner to the console at the start of compilation, telling the user what version of the CLR it was bound to. There are much better ways of doing this sort of thing, and none of those other ways would interfere with the loader lock. This has been corrected.
You should never start up a thread or terminate a thread, and then rendezvous
This probably doesn’t sound like something you would do. And yet it’s one of the most common deadlocks I see with IJW assemblies on V1 and V1.1 of the CLR. The typical stack trace contains a load of an IJW assembly, usually via a DLL import. This causes mscoree.dll’s _CorDllMain to get control. Eventually, we notice that the IJW assembly has been strong name signed, so we call into WinVerifyTrust in WinTrust.dll. That API has a perfectly reasonable expectation that it is not inside the OS loader lock. It calls into the OS threadpool (not the managed CLR threadpool), which causes the OS threadpool to lazily initialize itself. Lazy initialization involves spinning up a waiter thread, and then blocking until that waiter thread starts executing.
Of course, the new waiter thread must first deliver DLL_THREAD_ATTACH notifications to any DLLs that expect such notifications. And it must obviously obtain the OS loader lock before it can deliver the first notification. The result is a deadlock.
So I’ve painted a pretty bleak picture of all the things that can go wrong with IJW assemblies in V1 and V1.1 of the CLR. If we had seen a disturbing rate of failures prior to shipping V1, we would have reconsidered our position here. But it wasn’t until later that we had enough external customers running into these difficulties. With the benefits of perfect hindsight, it is now clear that we screwed up.
Fortunately, much of this is fixable in our next release. Until then, there are some painful workarounds that might bring you some relief. Let’s look at the ultimate solution first, and then you can see how the workarounds compare. We think that the ultimate solution would consist of several parts:
If you don’t know what Customer Debug Probes are, please hunt them down on MSDN. They are a life-saver for debugging certain difficult problems in managed applications. I would recommend starting with http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=c7b955c7-231a-406c-9fa5-ad09ef3bb37f, and then reading most of Adam Nathan’s excellent blogs at http://blogs.gotdotnet.com/anathan.
Of the above 4 changes, we’re relatively confident that the first 3 will happen in the next release. We also experimented with the 4th change, but it’s unlikely that we will make much further progress. A key obstacle is that there is no OS-approved way that can efficiently detect execution inside the loader lock. Our hope is that a future version of the OS would provide such a mechanism.
This is all great. But you have an application that must run on V1 or V1.1. What options do you have? Fortunately, Scott Currie has written an excellent article on this very subject. If you build IJW assemblies, please read it at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vcconmixeddllloadingproblem.asp.
The Pure Managed Story
If you code in a language other than MC++, you’re saying “Enough about IJW and the OS loader lock already.”
Let’s look at what the CLR does during process shutdown. I’ll try not to mention IJW, but I’ll have to keep talking about that darn loader lock.
From the point of view of a managed application, there are three types of shutdown:
1) A shutdown initiated by a call to TerminateProcess doesn’t involve any further execution of the CLR or managed code. From our perspective, the process simply disappears. This is the rudest of all shutdowns, and neither the CLR developer nor the managed developer has any obligations related to it.
2) A shutdown initiated by a direct call to ExitProcess is an unorderly shutdown from the point of view of the managed application. Our first notification of the shutdown is via a DLL_PROCESS_DETACH notification. This notification could first be delivered to the DllMain of mscorwks.dll, mscoree.dll, or any of the managed assemblies that are currently loaded. Regardless of which module gets the notification first, it is always delivered inside the OS loader lock. It is not safe to execute any managed code at this time. So the CLR performs a few house-keeping activities and then returns from its DllMain as quickly as possible. Since no managed code runs, the managed developer still has no obligations for this type of shutdown.
3) An orderly managed shutdown gives managed code an opportunity to execute outside of the OS loader lock, prior to calling ExitProcess. There are several ways we can encounter an orderly shutdown. Because we will execute managed code, including Finalize methods, the managed developer must consider this case.
Examples of an orderly managed shutdown include:
1) Call System.Environment.Exit(). I already mentioned that some Windows developers have noted that you must not call ExitProcess unless you first coordinate all your threads… and then they work like mad to make the uncoordinated case work. For Environment.Exit we are under no illusions. We expect you to call it in races from multiple threads at arbitrary times. It’s our job to somehow make this work.
2) If a process is launched with a managed EXE, then the CLR tracks the number of foreground vs. background managed threads. (See Thread.IsBackground). When the number of foreground threads drops to zero, the CLR performs an orderly shutdown of the process. Note that the distinction between foreground and background threads serves exactly this purpose and no other purpose.
3) Starting with MSVCRT 7.0, an explicit call to ‘exit()’ or an implicit call to ‘exit()’ due to a return from ‘main()’ can turn into an orderly managed shutdown. The CRT checks to see if mscorwks.dll or mscoree.dll is in the process (I forget which). If it is resident, then it calls CorExitProcess to perform an orderly shutdown. Prior to 7.0, the CRT is of course unaware of the CLR.
4) Some unmanaged applications are aware of the CLR’s requirements for an orderly shutdown. An example is devenv.exe, which is the EXE for Microsoft Visual Studio. Starting with version 7, devenv calls CoEEShutDownCOM to force all the CLR’s references on COM objects to be Release()’d. This at least handles part of the managed shutdown in an orderly fashion. It’s been a while since I’ve looked at that code, but I think that ultimately devenv triggers an orderly managed shutdown through a 2nd API.
If you are following along with the Rotor sources, this all leads to an interesting quirk of EEShutDown in ceemain.cpp. That method can be called:
Of course, our passage through EEShutDown is quite different when we are outside the OS loader lock, compared to when we are inside it. When we are outside, we do something like this:
From the perspective of managed code, at this point we are finished with the shutdown, though of course we perform many more steps for the unmanaged part of the shutdown.
There are a couple of points to note with the above steps.
Finally, a lot more happens inside EEShutDown than the orderly managed steps listed above. We have some unmanaged shutdown that doesn’t directly impact managed execution. Even here we try hard to limit how much we do, particularly if we’re inside the OS loader lock. If we must shutdown inside the OS loader lock, we mostly just flush any logs we are writing and detach from trusted services like the profiler or debugger.
One thing we do not do during shutdown is any form of leak detection. This is somewhat controversial. There are a number of project teams at Microsoft which require a clean leak detection run whenever they shutdown. And that sort of approach to leak detection has been formalized in services like MSVCRT’s _CrtDumpMemoryLeaks, for external use. The basic idea is that if you can find what you have allocated and release it, then you never really leaked it. Conversely, if you cannot release it by the time you return from your DllMain then it’s a leak.
I’m not a big fan of that approach to finding memory leaks, for a number of reasons:
There are at least two other ways to track leaks.
One way is to identify scenarios that can be repeated, and then monitor for leaks during the steady-state of repeating those scenarios. For example, we have a test harness which can create an AppDomain, load an application into it, run it, unload the AppDomain, then rinse and repeat. The first few times that we cycle through this operation, memory consumption increases. That’s because we actually JIT code and allocate data structures to support creating a 2nd AppDomain, or support making remote calls into the 2nd AppDomain, or support unloading that AppDomain. More subtly, the ThreadPool might create – and retain – a waiter thread or an IO thread. Or the application may trigger the creation of a new segment in the GC heap which the GC decides to retain even after the incremental contents have become garbage. This might happen because the GC decides it is not productive to perform a compacting collection at this time. Even the OS heap can make decisions about thread-relative look-aside lists or lazy VirtualFree calls.
But if you ignore the first 5 cycles of the application, and take a broad enough view over the next 20 cycles of the application, a trend becomes clear. And if you measure over a long enough period, paltry leaks of 8 or 12 bytes per cycle can be discovered. Indeed, V1 of the CLR shipped with a leak for a simple application in this test harness that was either 8 or 12 bytes (I can never remember which). Of that, 4 bytes was a known leak in our design. It was the data structure that recorded the IDs of all the AppDomains that had been unloaded. I don’t know if we’ve subsequently addressed that leak. But in the larger scheme of things, 8 or 12 bytes is pretty impressive.
Recently, one of our test developers has started experimenting with leak detection based on tracing of our unmanaged data structures. Fortunately, many of these internal data structures are already described to remote processes, to support out-of-process debugging of the CLR. The idea is that we can walk out from the list of AppDomains, to the list of assemblies in each one, to the list of types, to their method tables, method bodies, field descriptors, etc. If we cannot reach all the allocated memory blocks through such a walk, then the unreachable blocks are probably leaks.
Of course, it’s going to be much harder than it sounds. We twiddle bits of pointers to save extra state. We point to the interiors of heap blocks. We burn the addresses of some heap blocks, like dynamically generated native code snippets, into JITted code and then otherwise forget about the heap address. So it’s too early to say whether this approach will give us a sound mechanism for discovering leaks. But it’s certainly a promising idea and worth pursuing.
Rambling Security Addendum
Finally, an off-topic note as I close down:
I haven’t blogged in about a month. That’s because I spent over 2 weeks (including weekends) on loan from the CLR team to the DCOM team. If you’ve watched the tech news at all during the last month, you can guess why. It’s security.
From outside the company, it’s easy to see all these public mistakes and take a very frustrated attitude. “When will Microsoft take security seriously and clean up their act?” I certainly understand that frustration. And none of you want to hear me whine about how it’s unfair.
The company performed a much publicized and hugely expensive security push. Tons of bugs were filed and fixed. More importantly, the attitude of developers, PMs, testers and management was fundamentally changed. Nobody on our team discusses new features without considering security issues, like building threat models. Security penetration testing is a fundamental part of a test plan.
Microsoft has made some pretty strong claims about the improved security of our products as a result of these changes. And then the DCOM issues come to light.
Unfortunately, it’s still going to be a long time before all our code is as clean as it needs to be.
Some of the code we reviewed in the DCOM stack had comments about DGROUP consolidation (remember that precious 64KB segment prior to 32-bit flat mode?) and OS/2 2.0 changes. Some of these source files contain comments from the ‘80s. I thought that Win95 was ancient!
I’ve only been at Microsoft for 6 years. But I’ve been watching this company closely for a lot longer, first as a customer at Xerox and then for over a decade as a competitor at Borland and Oracle. For the greatest part of Microsoft’s history, the development teams have been focused on enabling as many scenarios as possible for their customers. It’s only been for the last few years that we’ve all realized that many scenarios should never be enabled. And many of the remainder should be disabled by default and require an explicit action to opt in.
One way you can see this change in the company’s attitude is how we ship products. The default installation is increasingly impoverished. It takes an explicit act to enable fundamental goodies, like IIS.
Another hard piece of evidence that shows the company’s change is the level of resource that it is throwing at the problem. Microsoft has been aggressively hiring security experts. Many are in a new Security Business Unit, and the rest are sprinkled through the product groups. Not surprisingly, the CLR has its own security development, PM, test and penetration teams.
I certainly wasn’t the only senior resource sucked away from his normal duties because of the DCOM alerts. Various folks from the Developer Division and Windows were handed over for an extended period. One of the other CLR architects was called back from vacation for this purpose.
We all know that Microsoft will remain a prime target for hacking. There’s a reason that everyone attacks Microsoft rather than Apple or Novell. This just means that we have to do a lot better.
Unfortunately, this stuff is still way too difficult. It’s a simple fact that only a small percentage of developers can write thread-safe free-threaded code. And they can only do it part of the time. The state of the art for writing 100% secure code requires that same sort of super-human attention to detail. And a hacker only needs to find a single exploitable vulnerability.
I do think that managed code can avoid many of the security pitfalls waiting in unmanaged code. Buffer overruns are far less likely. Our strong-name binding can guarantee that you call who you think you are calling. Verifiable type safety and automatic lifetime management eliminate a large number of vulnerabilities that can often be used to mount security attacks. Consideration of the entire managed stack makes simple luring attacks less likely. Automatic flow of stack evidence prevents simple asynchronous luring attacks from succeeding. And so on.
But it’s still way too hard. Looking forwards, a couple of points are clear:
1) We need to focus harder on the goal that managed applications are secure, right out of the box. This means aggressively chasing the weaknesses of our present system, like the fact that locally installed assemblies by default run with FullTrust throughout their execution. It also means static and dynamic tools to check for security holes.
2) No matter what we do, hackers will find weak spots and attack them. The very best we can hope for is that we can make those attacks rarer and less effective.
I’ll add managed security to my list for future articles.