Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What's wrong with this code, part 16, the answers

What's wrong with this code, part 16, the answers

  • Comments 13
I intentionally made the last "What's wrong with this code" simple, because it was just intended to exhibit a broken design pattern.

The bug appears here:

        cWaveDevices = waveOutGetNumDevs();
 

The problem is that waveOutGetNumDevs is in winmm.dll, and MSDN clearly states:

The entry-point function should perform only simple initialization or termination tasks. It must not call the LoadLibrary or LoadLibraryEx function (or a function that calls these functions), because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, the entry-point function must not call the FreeLibrary function (or a function that calls FreeLibrary) during process termination, because this can result in a DLL being used after the system has executed its termination code..

Because Kernel32.dll is guaranteed to be loaded in the process address space when the entry-point function is called, calling functions in Kernel32.dll does not result in the DLL being used before its initialization code has been executed. Therefore, the entry-point function can call functions in Kernel32.dll that do not load other DLLs. For example, DllMain can create synchronization objects such as critical sections and mutexes, and use TLS.

Calling functions that require DLLs other than Kernel32.dll may result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions load other system components. Conversely, calling functions such as these during termination can cause access violation errors because the corresponding component may already have been unloaded or uninitialized.

The problem with waveOutGetNumDevs() (and all the other MME APIs) is that under the covers they call LoadLibrary (to load wdmaud.drv).  They also read from the registry, RPC into the audiosrv service, and lots of other stuff.  In Windows XP, it appears that a fair number of applications got lucky and didn't hit the deadlocks inherent in these functions, but for Vista, one of the consequences of the new audio engine architecture is that it is now 100% guaranteed that if you call the MME APIs from your DllMain, you are absolutely going to deadlock your application.

 

Kudos:

The first poster: Mike asked exactly the right question - waveOutGetNumDevs will trigger the loading of another DLL, as did most of the other commenters.

Seth McCarus thought it was "interesting" that the last parameter was of type PCONTEXT. Since a PCONTEXT is a PVOID, it's not really that surprising.

Skywing made the most complete description of the errors, I'm including it entirely because it's well written (I edited out the APC stuff because (to be frank) I have no idea if that's true or not):

As for what's wrong with this code:

- It's dangerous to try and call functions from DllMain that reside in DLLs that your DLL is dependant on. The reason is that in some cases that DLL may not have run its DllMain yet, and if those functions rely on some state initialized during DllMain they may not operate as expected. This situation doesn't occur always, and as far as I know, with the current implementation, you'll only really see it in the case of either circular dependencies or some situations where you are loading a DLL dynamically (i.e. not at process-time initialization). The exceptions to this are DLLs that have only depedencies that you are logically guaranteed to have already initialized by the time your DllMain occurs. These include NTDLL (always initializes first) and KERNEL32 for Win32 programs (only depends on NTDLL and the main process image or one of its dependencies will always have linked to it).

- Depending on what dprintf does, that this might be bad. Particularly if it does something like make a CRT call and you are using the DLL version of the CRT. I would expect that it probably uses _vsnprintf or something of that sort, so I would flag it as dangerous.

- Masking off the top 16-bits for waveOutGetNumDevs is curious; there should not be any reason to do this according to the documentation for that function. Perhaps this is an artifact of some code ported from Win16, or maybe there is some implementation detail of waveOutGetNumDevs that this DLL is relying on.

Mea Culpas:

My code was wrong - the internal message that's used to get the number of devices returns an error code in the high 16 bits of the returned count of devices, for some reason, I assumed that error propagation mechanism applied to waveOutGetNumDevs, it doesn't.

  • Have you ever managed to post a What's Wrong without introducing two or three errors in the course of editing the code? If you have, it must have been a long time ago, before I started reading this blog.

    I'm not deriding in the least: it acts as a standing warning to me (and to all of us?) that there is no such thing as A Change So Small And Trivial That It Doesn't Need Testing. In fact, the smaller the change, the likelier I am not to notice obvious things wrong with it.
  • If you want to see more about how the context record works during DLL initialization for static loads, it's easy to experiment a bit to understand just what happens:

    I created two projects, a dll and an exe that links to the dll. The dll stops in the debugger so you can inspect the context record that is passed in as an argument and then alters it so that some registers have a signature value that is easily recognized later. The exe displays the modified registers (and is static linked to the dll).

    Some of the registers are modified before the main program captures them by functions setting up stack frames or similar things, but the output should make it clear how the context is the one restored when DLL initialization completes.

    By convention, Win32 stores the real program entrypoint in eax for kernel32!BaseProcessStart to call.

    Dll source:
    -----------

    #include <windows.h>
    #include <strsafe.h>

    BOOLEAN WINAPI DllMain(IN HANDLE Dll, IN ULONG Reason, IN PCONTEXT Context OPTIONAL)
    {
    switch(Reason)
    {

    case DLL_PROCESS_ATTACH:
    DisableThreadLibraryCalls((HMODULE)Dll);

    if(Context)
    {
    CHAR Msg[1024];

    //
    // Using non-DLL CRT
    //

    StringCchPrintfA(Msg, sizeof(Msg)/sizeof(Msg[0]), "Context record at: %p\n", Context);
    OutputDebugStringA(Msg);
    __debugbreak();

    if(Context)
    {
    Context->Ebp = 0xAAAAAAAA;
    Context->Ecx = 0xBBBBBBBB;
    Context->Edx = 0xCCCCCCCC;
    Context->Esi = 0xDDDDDDDD;
    Context->Edi = 0xEEEEEEEE;
    Context->Ebx = 0xFFFFFFFF;
    }
    }
    break;

    case DLL_PROCESS_DETACH:
    break;

    }

    return TRUE;
    }

    VOID WINAPI TestDllDummyFunction(VOID)
    {
    OutputDebugStringA("TestDllDummyFunction()\n");
    }


    Exe source:
    -----------

    #include <windows.h>
    #include <stdio.h>
    #include <strsafe.h>

    extern VOID WINAPI TestDllDummyFunction(VOID);

    #pragma comment(linker, "/ENTRY:Main")

    VOID WINAPI Main(VOID)
    {
    struct Regs
    {
    ULONG Ebp, Ecx, Edx, Esi, Edi, Ebx;
    };

    Regs SavedRegs;

    __asm
    {
    mov dword ptr [SavedRegs+0x00], Ebp
    mov dword ptr [SavedRegs+0x04], Ecx
    mov dword ptr [SavedRegs+0x08], Edx
    mov dword ptr [SavedRegs+0x0C], Esi
    mov dword ptr [SavedRegs+0x10], Edi
    mov dword ptr [SavedRegs+0x14], Ebx
    }

    TestDllDummyFunction();

    CHAR Msg[1024];

    StringCchPrintfA(Msg, sizeof(Msg)/sizeof(Msg[0]), "Saved registers:\nEbp: %08X\nEcx: %08X\nEdx: %08X\nEsi: %08X\nEdi: %08X\nEbx: %08X\n",
    SavedRegs.Ebp, SavedRegs.Ecx, SavedRegs.Edx, SavedRegs.Esi, SavedRegs.Edi, SavedRegs.Ebx);
    OutputDebugStringA(Msg);
    __debugbreak();
    }


    Output from debugger:
    ---------------------

    (c0c.5ac): Break instruction exception - code 80000003 (first chance)
    eax=00251eb4 ebx=7ffd5000 ecx=00000004 edx=00000010 esi=00251f48 edi=00251eb4
    eip=7c901230 esp=0012fb20 ebp=0012fc94 iopl=0 nv up ei pl nz na pe nc
    cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
    ntdll!DbgBreakPoint:
    7c901230 cc int 3
    0:000> g
    Context record at: 0012FD30
    (c0c.5ac): Break instruction exception - code 80000003 (first chance)
    eax=0012f2dc ebx=10000000 ecx=7c859e18 edx=0012f500 esi=0012fd30 edi=0012f5a8
    eip=100010bb esp=0012f5a0 ebp=0012f9e8 iopl=0 nv up ei pl zr na po nc
    cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
    TestDll!DllMain+0x5b:
    100010bb cc int 3
    0:000> dt nt!_CONTEXT 0012FD30
    +0x000 ContextFlags : 0x10017
    +0x004 Dr0 : 0
    +0x008 Dr1 : 0
    +0x00c Dr2 : 0
    +0x010 Dr3 : 0
    +0x014 Dr6 : 0
    +0x018 Dr7 : 0
    +0x01c FloatSave : _FLOATING_SAVE_AREA
    +0x08c SegGs : 0
    +0x090 SegFs : 0x38
    +0x094 SegEs : 0x23
    +0x098 SegDs : 0x23
    +0x09c Edi : 0
    +0x0a0 Esi : 0
    +0x0a4 Ebx : 0x7ffd5000
    +0x0a8 Edx : 0
    +0x0ac Ecx : 0
    +0x0b0 Eax : 0x401060
    +0x0b4 Ebp : 0
    +0x0b8 Eip : 0x7c810867
    +0x0bc SegCs : 0x1b
    +0x0c0 EFlags : 0x200
    +0x0c4 Esp : 0x12fffc
    +0x0c8 SegSs : 0x23
    +0x0cc ExtendedRegisters : [512] "???"
    0:000> u 0x7c810867
    kernel32!BaseProcessStartThunk:
    7c810867 33ed xor ebp,ebp
    7c810869 50 push eax
    7c81086a 6a00 push 0x0
    7c81086c e9bb640000 jmp kernel32!BaseProcessStart (7c816d2c)
    7c810871 90 nop
    7c810872 8bff mov edi,edi
    kernel32!SwitchToFiber:
    7c810874 90 nop
    7c810875 90 nop
    0:000> g
    TestDllDummyFunction()
    Saved registers:
    Ebp: 0012FFC0
    Ecx: 0012FFB0
    Edx: 7C90EB94
    Esi: DDDDDDDD
    Edi: EEEEEEEE
    Ebx: FFFFFFFF
    (c0c.5ac): Break instruction exception - code 80000003 (first chance)
    eax=0012f8dc ebx=ffffffff ecx=7c859e18 edx=0012fc00 esi=dddddddd edi=0012fbbc
    eip=004010e8 esp=0012fba0 ebp=0012ffc0 iopl=0 nv up ei pl zr na po nc
    cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
    TestDllInit!Main+0x88:
    004010e8 cc int 3
    0:000> u 0x401060
    TestDllInit!Main [testdllinit\testdllinit\testdllinit.cpp @ 10]:
    00401060 55 push ebp
    00401061 8bec mov ebp,esp
    00401063 81ec1c040000 sub esp,0x41c
    00401069 a100304000 mov eax,[TestDllInit!__security_cookie (00403000)]
    0040106e 33c5 xor eax,ebp
    00401070 8945fc mov [ebp-0x4],eax
    00401073 57 push edi
    00401074 89ade4fbffff mov [ebp-0x41c],ebp
  • Whenever I post a "What's wrong" without actually testing the code in isolation, I make mistakes (I'm human).

    Many of the claimed errors are typically intentional (differences in coding styles), and I call them out.

    This one was just sloppyness on my part - the code as written demonstrates the mistake perfectly, I'm just checking for an error condition that will never occur.
  • DisableThreadLibraryCalls() takes a parameter.
  • And let's not forget...

    http://blogs.msdn.com/mgrier/archive/2005/06/24/432455.aspx

    means that the fact that the error in the call to waveOutGetNumDevs is /not/ propagated is likely to hose the overall initialization sequence.
  • Larry,

    MSDN also states about DllMain:

    "While it is acceptable to create synchronization objects in DllMain, you should not perform synchronization in DllMain (or a function called by DllMain) because all calls to DllMain are serialized. Waiting on synchronization objects in DllMain can cause a deadlock."

    From this I deduce that I also should never call malloc/new from DllMain, because access to the heap in the runtime is serialized between threads with synchronization objects and therefore might cause deadlocks. Is it then safe to allocate memory in DllMain using allocators from kernel32, such as LocalAlloc or the heap APIs or is memory allocation in DllMain something that should never ever be done, regardless of the implementation or the origin of the allocators?
    Or am I just paranoid if I even count memory allocation as one of the things that shouldn't be done in DllMain?
  • Hi Larry,

    Supposed I had an application that was deadlocking because of this sort of error. If I attached a debugger to it while it was deadlocked, would it be obvious what the problem was? I mean, what would the callstacks look like? Would one of them include the call to (e.g.) waveOutGetNumDevs?
  • Oh, ick. That's nasty. Not very intuitive that it can cause problems either for non Win32 gurus.

    So the question now is, what are you guys gonna do about it? And why exactly does it deadlock - I assume there's some COM apartment goop under the covers here.
  • Jonas, it's blindingly obvious - you'd find a call stack with a DLL entrypoint calling into the MME APIs.

    Mike, it's not COM apartment goop per se. It's because the new audio stack uses COM and you can't call COM if your in a DLL (because you can't initialize COM).

    Stefan, as far as I know, it's safe to call malloc and new from DllMain, but that's about it.
  • Skywing, I came THIS close to removing your comment, not because it violates any of my posts, but because your code makes a bunch of HUGE assumptions about how the internals of Windows works.
  • Wednesday, December 07, 2005 7:10 AM by Stefan Kuhr
    > From this I deduce that I also should never
    > call malloc/new from DllMain, because access
    > to the heap in the runtime is serialized
    > between threads with synchronization objects
    > and therefore might cause deadlocks.

    Wednesday, December 07, 2005 6:23 PM by LarryOsterman
    > as far as I know, it's safe to call malloc
    > and new from DllMain, but that's about it.

    I'm 99% certain that MSDN agrees with Mr. Kuhr.

    Back to Mr. Kuhr's question:
    > Is it then safe to allocate memory in
    > DllMain using allocators from kernel32, such
    > as LocalAlloc or the heap APIs

    I'm 99% certain that MSDN says to call LocalAlloc even in DllMain for TLS purposes, and MSDN ALSO says to use heap APIs instead of LocalAlloc (not both on the same page), and MSDN has lots of bugs (not sure whether either or both of these pages are included). Maybe Mr. Grier could be persuaded to add a page to his blog? And if we're really really lucky, some year MSDN's own statements will be clarified?
  • >> Skywing, I came THIS close to removing your comment, not because it violates any of my posts, but because your code
    >> makes a bunch of HUGE assumptions about how the internals of Windows works.

    Of course it does; it's an experiment intended to shed light on those, and was clearly labeled such. Knowing how the actual implementation works is invaluable when you are trying to debug problems. For instance, if you are trying to debug a problem where your program is crashing on startup before any of your code runs thanks to some dll failing during initialization, knowing how early process startup works is very helpful indeed.

    I've lost count of how many times knowing how the actual implementation works for Windows for various things has helped me immensely when troubleshooting problems. You would be doing a dis-service to your audience by going out of your way to delete such information, I think...

    Certainly, hiding implementation details is good from a design standpoint, but there are times when it can be handy to know how things work "under the hood" when you are analyzing problems.
  • > I'm 99% certain that MSDN agrees with Mr. Kuhr

    I second Norman Diamond's comment. I don't see how it can be legal to call new/malloc - it seems to be exactly the sort of thing this WWWTC is designed to highlight. new/malloc (if it's in a linked dll) might not have been loaded yet, so you can't call it. That's in addition to the fact that there are probably synchronization objects that are grabbed (and if there aren't, you're relying on an implementation detail).

    Or am I missing something?
Page 1 of 1 (13 items)