Preface

OS loader has always intrigued me - probably because it works behind the scenes and no-one normally bothers to understand what is that is does exactly, until strange or funny things start happening. And they do. And then we read through the documentation and we are forced to remember that there's more to loading a binary than just slapping it into process address space. In fact there's a wonderful article by Matt Pietrek that discusses those matters. I strongly encourage every person who deals with native code to go and read it - it may be quite enlightening for you - I know it was for me. When you know how things get loaded, you are less likely to forget to re-base your binary, consider early binary binding etc.

Every now and then another piece of information or a great summary on the subject comes up and I find myself mystified with the whole loader topic all over again. This time it was a very lengthy post in Chris Brumme's blog. As many people have mentioned, the post in question is very long and very dense with technical information well, what else did you expect from Chris's blog? :)  Anyway, in order to absorb the topic better and in hopes of getting the whole thing out of my system I decided to write things down.

 

DllMain and OS loader

As we are all well aware now, things are not as easy as they seem. In fact are they ever? DllMain which used to be briefly discussed in most books on Win32 as a reasonably innocent initialization routine may now look like a vicious monster which obeys no rules and causes nasty side-effects. But let's get to the source - MSDN reference

It all starts innocently enough. The article defines DllMain as an optional entry point into a DLL, called by the system when the DLL gets attached to a process or a thread; outlines the somewhat tricky but reasonable rules that govern the calls (for instance, calls may be unmatched for a thread if it's a main thread of the process or if it was already running when LoadLibrary was called), discusses abnormal termination and then

...whoa...

Without missing a heart-beat, it carries on describing what you can do there. That is pretty startling as of itself since when should you be limited in that regard? - but as you keep reading, things just get worse. It turns out, you can do pretty much nothing at all. Calls to LoadLibrary/LoadLibraryEx are explicitly prohibited. Other calls into kernel32 are OK. But you can't call into User32. And don't use CRT memory management (unless you are linked statically) - use HeapAlloc instead. Oh, and of course don't call anything that would do any such nasty things: that would be bad. One last thing - don't read the registry either. Have a nice day.

The fact that none of this is written is big, bold, maybe even red print is truly unfortunate - it really ought to be, because most people simply miss that part. So let's say, you have read it all now the question is: why?

The thing is, as far as your binary is concerned, DllMain gets called at a truly unique moment. By that time OS loader has found, mapped and bound the file from disk, but - depending on the circumstances - in some sense your binary may not have been "fully born". Things can be tricky.

In a nutshell, when DllMain is called, OS loader is in a rather fragile state. First off, it has applied a lock on its structures to prevent internal corruption while inside that call, and secondly, some of your dependencies may not be in a fully loaded state. Before a binary gets loaded, OS Loader looks at its static dependencies. If those require additional dependencies, it looks at them as well. As a result of this analysis, it comes up with a sequence in which DllMains of those binaries need to be called. It's pretty smart about things and in most cases you can even get away with not following most of the rules described in MSDN - but not always.

The thing is, the loading order is unknown to you, but more importantly, it's built based on the static import information. If some dynamic loading occurs in your DllMain during DLL_PROCESS_ATTACH and you're making an outbound call, all bets are off. There is no guarantee that DllMain of that binary will be called and therefore if you then attempt to GetProcAddress into a function inside that binary, results are completely unpredictable as global variables may not have been initialized. Most likely you will get an AV.

Another scenario is when you start spinning a new thread on DLL_THREAD_ATTACH and wait for it to finish initialization via some syncronization technique. This blocks your thread in DllMain, while still keeping OS lock. This can lead to deadlocks.

Overall, if anything - anything - goes wrong in DllMain of one of the binaries, the whole process may be doomed.

The trouble is, definition of "wrong" is very, very vague in this case. For instance, developers using MC++ know that you shouldn't even dream of having DllMain in your library. And if you do you do, you may be very, very sorry. I think CLR folks want to fix this for the "Whidbey" release.

Chris Brumme lists the following things that should never, ever be done in DllMain.

·         Dynamic binds. That includes LoadLibrary/UnloadLibrary calls or anything that may call implicitly call them

·         Locking of any kind. If you are trying to acquire a lock that is currently help by a thread that needs OS loader lock (which you may be holding), you'll deadlock.

·         Cross-binary calls. As been discussed the binary youre calling into may not have been initialized or have already been unutilized.

·         Starting new threads and then wait for completion. As discussed, thread in question may need to acquire OS lock that you are holding.

 

So, what does this tell us?

 

DllMain is that gun you can easily shoot yourself with

How many people do you know that did stupid things like calling CoInitialize() in DllMain? I know of cases when that was done on DLL_THREAD_ATTACH, which not only means that we were risking to hit a deadlock, but also that any thread in that process will have COM initialized. What's worse, it may be initialized with the wrong threading model. And then people will be wondering how the heck they ended up with STA threads in thread pools. Or something much more subtle like calling a system function that starts a worker thread as part of its execution? How many times did you do all those things?

Another problem with this is that all these horrors can present themselves under very limited circumstances. In most cases things do work fine, but a race condition, a slightly modified DLL load order or other factors may change everything. Which means you may not even know it until your ship. This may be fine for a user application (well, things like that are never fine, it's just that the damage may not be substabtial), but this is always bad for servers - especially if you are talking enterprise availability. I don't think this can ever become a security threat - one you can fight anyway - but random crashes are just not nice.

So let's get back to what we can do in DllMain. According to MSDN, "The entry-point function should perform only simple initialization or termination tasks."

These tasks can only include calls to Kernel32 (excluding LoadLibrary/LoadLibraryEx). If you look at what this means for you, you will find that this is extremely liming.  Further, CRT functions, including memory allocations are not safe unless you are statically linked. This means that seemingly innocent things something like g_pMyGlobalObject = new CMyGlobalObject() can theoretically cause all kinds of nasty stuff  because they will use malloc that is dynamically linked from msvcr*.dll.

This leaves us with primitive types, synchronization objects initialization ... that's about it. And definitely - definitely - no managed code.

So what am I saying? There aren't too many things that are legal there; it's extremely easy to do illegal stuff - you have to always know if what you're calling really does, which is extremely difficult if you use something defined elsewhere - C/C++ LIB for instance; the compiler won't tell you that you are doing the wrong thing; and the code is likely to run fine in most cases... but not all of them.

Where options does this leave us with?

  • Just say no. Avoid the darn thing altogether and link with /noentry. Reconsider the way you deal with globals. Do lazy TLS initialization. 
  • Be very careful. Sometimes you simply have to use it. It's just too ugly not to. Have a full code review. See what's being done and what OS does. Make sure that everyone understands that DllMain is just different. Read and memorize horror stories about people who didn't know better.
    One thing you can do here to minimize the damage is disabling calls to your DllMain when new threads join/leave the process - this can be done with DisableThreadLibraryCalls. This is generally a good idea in all cases where you don't need thread-level initialization because OS loader doesn't need to call into your binary every time a new thread is born
  • Be afraid. Be very afraid. Well, just leave things where they are. Things don't crash right now and you have other things to do. Good plan.

 

 

Silver lining : DllMain and resource leaks diagnostics

There's one piece of information that gets provided through DllMain which you can't possibly get any other way. If you review the signature of DllMain, youll notice that the last argument passed in despite being called lpReserved actually has some meaning:

 

If fdwReason is DLL_PROCESS_ATTACH, lpvReserved is NULL for dynamic loads and non-NULL for static loads.

If fdwReason is DLL_PROCESS_DETACH, lpvReserved is NULL if DllMain has been called by using FreeLibrary and non-NULL if DllMain has been called during process termination.

As you see, lpvReserved does tell you something. Although I can't see why you would be interested in knowing whether your DLL has been statically or dynamically loaded - there may be uses there, I just don't see them - but knowing how you are being unloaded could be interesting.

For one, if you're managing some kind of resource in DllMain, which only lives within process context, you can possibly skip some clean-up if you knew that the process is dying as it is. This is not too valuable because the very nature of DllMain does not make it a very good entry point for resource management.

There are cases, however, when you expect your DLL to be unloaded in a specific way and you can use DllMain to verify that it is indeed being unloaded as you expect. For instance, if:

·         your DLL is in fact a COM server (and has no other uses), and

·         the COM host is well-behaved and

·         all of your COM objects have been properly released,

then you should expect that you will get lpvReserved=NULL - that is unloaded via FreeLibrary.

Heres what seems to be happening. Every well-behaved COM process should call CoUnintialize() on each thread when it gets shut down. Internally that calls DllCanUnloadNow on your binary which returns TRUE if all outstanding references are closed. If that's the case, COM will call FreeLibrary, which - unless there are other LoadLibrary references outstanding - will unload your DLL. That will pass lpvReserved=NULL. If any of these conditions is not satisfied, your DLL will reside in the process until it terminates and you'll get lpvReserved!=NULL( I'd like to thank Michael Entin - who really ought to start blogging - for helping me to get all the pieces together).

So if - and that's a big if - your application is well-behaved, and no-one ever messed up loading your DLL with LoadLibrary and forgetting to unload it, then lpvReserved!=NULL means that some of your COM objects have not been released. There's nothing your code can do about that - except maybe asserting - and you will then have to look into that further.

This approach is not limited to only COM leaks - theoretically you should expect that when your binary is leaving this world, it's not taking anything with it. You can look through the list of globally-managed resources and see of they have been disposed if. Be very, very careful there - you shouldn't be doing any stuff that may compromise OS loader: see the four bullets above.