Fixing an ICorDebugUnmanagedCallback induced hang

Fixing an ICorDebugUnmanagedCallback induced hang

Rate This
  • Comments 2

Hi debuggers, Andrew Richards here with a NTDebugging post that is a little different to what is usually posted.  Instead of talking about debugging, I’m going to talk about an issue I just faced while writing a debugger.

 

This debugger work is an extension of an upcoming article that I’ve written for MSDN Magazine (scheduled for the December 2011 issue). The MSDN Magazine article goes over how to write a native debugger using the DbgHelp API. It also explains how you can use this code to then make a plugin for Sysinternals ProcDump.

 

When debugging a managed application, you can take debugging one step further by being both a managed and unmanaged (native) debugger. To do this, you use the CLR Debugger API instead of the DbgHelp API.

 

What prompted this post was an issue that I hit while implementing the ICorDebugUnmanagedCallback::DebugEvent function of my unmanaged interface implementation. I was finding that the target process was hung after I processed in-band debug events but not out-of-band debug events. This was despite calling ICorDebugController::Continue, with or without calling ICorDebugProcess::ClearCurrentException first.

 

ICorDebug Interface:

Firstly, let’s take a step back and look at what it takes to get to the point of my issue. The goal in the initialization code is to get an instance of an ICorDebug based object.

 

Below is an abridged version of the code to do this using .NET 4.0; I have omitted the error handling and some of the cleanup (IUnknown::Release) to keep the code brief.

 

// Start COM

CoInitialize(NULL);

 

// Get a ICLRMetaHost instance (from .NET 4.0)

ICLRMetaHost* pCLRMetaHost = NULL;

CLRCreateInstance(CLSID_CLRMetaHost, IID_ICLRMetaHost, (LPVOID*)&pCLRMetaHost);

 

// Get an enumeration of the loaded runtimes in the target process (opened prior with OpenProcess)

IEnumUnknown* pEnumUnknown = NULL;

pCLRMetaHost->EnumerateLoadedRuntimes(hProcess, &pEnumUnknown);

 

// Use the first runtime found (Note, you can only debug one runtime at once)

IUnknown* pUnknown = NULL;

ULONG ulFetched = 0;

pEnumUnknown->Next(1, &pUnknown, &ulFetched);

 

// QueryInterface for the ICLRRuntimeInfo interface

ICLRRuntimeInfo* pCLRRuntimeInfo = NULL;

pUnknown->QueryInterface(__uuidof(ICLRRuntimeInfo), (void **)&pCLRRuntimeInfo);

 

// Get the ICorDebug interface (this allows you to debug .NET 2.0 targets with the .NET 4.0 API)

pCLRRuntimeInfo->GetInterface(CLSID_CLRDebuggingLegacy, IID_ICorDebug, (void **)&pCorDebug);

 

// Initialize the .NET 2.0 debugging interface

pCorDebug->Initialize();

 

// Allocate our ICorDebugManagedCallback2 implementation and apply it to ICorDebug

CCorDebugManagedCallback2* pCorDebugManagedCallback2 = new CCorDebugManagedCallback2();

pCorDebug->SetManagedHandler((ICorDebugManagedCallback*)pCorDebugManagedCallback2);

 

// Allocate our ICorDebugUnmanagedCallback implementation and apply it to ICorDebug

CCorDebugUnmanagedCallback* pCorDebugUnmanagedCallback = new CCorDebugUnmanagedCallback();

pCorDebug->SetUnmanagedHandler((ICorDebugUnmanagedCallback*)pCorDebugUnmanagedCallback);

 

// Start debugging the process; returns the ICorDebugProcess we’ll need in the callbacks

pCorDebug->DebugActiveProcess(nProcessId, TRUE, &pCorDebugProcess);

 

This code is pretty linear; if any call fails you are out of luck.  By the end, you have associated your own managed and unmanaged callback classes with the ICorDebug object and are attached as a debugger. The code supports a target process using any of the.NET versions (v1.0, v1.1, v2.0, v4.0). Note that .NET v3.0 and v3.5 applications are actually v2.0 applications from a debugger point-of-view as these .NET releases just contain additional class libraries.

 

My managed callback implementation supports the IUnknown, ICorDebugManagedCallback and ICorDebugManagedCallback2 interfaces. (I’m not going to discuss this code here).

 

My unmanaged callback implementation supports the IUnknown and ICorDebugUnmanagedCallback interfaces. It is in this class that I had the issue.

 

ICorDebugUnmanagedCallback Interface:

The ICorDebugUnmanagedCallback interface has just one function:

 

HRESULT DebugEvent (

    [in] LPDEBUG_EVENT  pDebugEvent,

    [in] BOOL           fOutOfBand

);

 

The function provides a DEBUG_EVENT structure in the same way that WaitForDebugEvent does. This is not surprising as under the covers, that is what the .NET 4.0 API is using – it is just passing it to us. As such, the rules for handling a DEBUG_EVENT structure apply here too.  Namely, close the handle passed with the CREATE_PROCESS_DEBUG_EVENT and LOAD_DLL_DEBUG_EVENT events.

 

Following the DebugEvent documentation, I ended up with (roughly) the code below – which hangs the target process.

 

STDMETHODIMP CCorDebugUnmanagedCallback::DebugEvent(LPDEBUG_EVENT pDebugEvent, BOOL fOutOfBand)

{

      BOOL bClear = TRUE;

      switch (pDebugEvent->dwDebugEventCode)

      {

      case EXCEPTION_DEBUG_EVENT:

            if (pDebugEvent->u.Exception.dwFirstChance != 0)

                  bClear = FALSE;

            break;

      case CREATE_PROCESS_DEBUG_EVENT:

            if (pDebugEvent->u.CreateProcessInfo.hFile)

                  CloseHandle(pDebugEvent->u.CreateProcessInfo.hFile);

            break;

      case LOAD_DLL_DEBUG_EVENT:

            if (pDebugEvent->u.LoadDll.hFile)

                  CloseHandle(pDebugEvent->u.LoadDll.hFile);

            break;

      }

      if (bClear)

            pCorDebugProcess->ClearCurrentException(pDebugEvent->dwThreadId);

 

      pCorDebugProcess->Continue(fOutOfBand);

      return S_OK;

}

 

If you know what to look for, the answer to the ‘hang’ issue is on the MSDN page:

 

You can call ICorDebugController::Continue only on a Win32 thread and only when continuing past an out-of-band event.

 

So what does this really mean?

 

What is means is that you must call ICorDebugController::Continue from any other thread than the one servicing the callback if the debug event is in-band (fOutOfBand == FALSE). The reason for this is to stop a race condition. In-band debug events can be interrupted by out-of-band debug events – that is, the DebugEvent function can be firing multiple times concurrently. By forcing the continuation on an alternate thread, the race condition is averted.

 

I’m being brief here (on purpose) as I don’t want to incorrectly dissect for you the extremely complex internals of the CLR. You just need to know that you must use another thread for the hang to be averted.

 

So what does the code look like now?  It’s something like this:

 

STDMETHODIMP CCorDebugUnmanagedCallback::DebugEvent(LPDEBUG_EVENT pDebugEvent, BOOL fOutOfBand)

{

      BOOL bClear = TRUE;

      switch (pDebugEvent->dwDebugEventCode)

      {

      case EXCEPTION_DEBUG_EVENT:

            if (pDebugEvent->u.Exception.dwFirstChance != 0)

                  bClear = FALSE;

            break;

      case CREATE_PROCESS_DEBUG_EVENT:

            if (pDebugEvent->u.CreateProcessInfo.hFile)

                  CloseHandle(pDebugEvent->u.CreateProcessInfo.hFile);

            break;

      case LOAD_DLL_DEBUG_EVENT:

            if (pDebugEvent->u.LoadDll.hFile)

                  CloseHandle(pDebugEvent->u.LoadDll.hFile);

            break;

      }

     

      if (bClear)

            pCorDebugProcess->ClearCurrentException(pDebugEvent->dwThreadId);

 

      if (fOutOfBand)

      {

            pCorDebugProcess->Continue(TRUE);

      }

      else

      {

            SetEvent(hEventContinueBegin);

            WaitForSingleEvent(hEventContinueDone, INFINITE);

      }

      return S_OK;

}

 

DWORD WINAPI CCorDebugUnmanagedCallbackThreadProc(LPVOID lpParameter)

{

      while (!bQuit)

      {

            switch (WaitForSingleObject(hEventContinueBegin, 1000))

            {

            case WAIT_OBJECT_0:

                  pCorDebugProcess->Continue(FALSE);

                  SetEvent(hEventContinueDone);

                  break;

            }

      }

      return 0;

}

 

For out-of-band debug events, nothing has changed; the ICorDebugProcess::Continue call is made locally.

For in-band debug events, an event is set to trigger the ICorDebugProcess::Continue on a dedicated thread. The dedicated thread sets an event to tell the callback thread that the Continue has been done.

 

Note that the above code is a massive simplification of what is actually required – there is a ton of code missing that passes all the interface pointers & handles around and to create & shutdown the thread at the correct time.

 

In-band vs. Out-of-band:

So what is the difference between In-band vs. Out-of-band debug events?

 

An out-of-band debug event causes all threads in the target process to suspend (it’s exactly the same as native debugger induced suspend). As such, it is not possible to use the managed debugging interfaces to gather information from the target – as the managed debugging thread is suspended.

 

An in-band debug event only causes the managed threads in the target process to suspend – the managed debugging thread is still running. As such, it is possible to use the managed debugging interfaces to gather information from the target.

 

The act of using the managed debugging thread from within an in-band debug event can cause an out-of-band debug event (the common examples being first chance exceptions).

 

Cleanup/Detach:

Just to be complete, below is the code to cleanup and (optionally) detach from the ICorDebug session. In .NET 4.0, the ICorDebugController::Detach will terminate the process if interop debugging (passing TRUE to ICorDebug::DebugActiveProcess) is used. Interop debugging is not supported in .NET 2.0 on x64 - so this is less of an issue.

 

// If the target process is still running, we need to detach.

if (bDetachNeeded)

{

      ICorDebugController* pCorDebugController = NULL;

      pCorDebugProcess->QueryInterface(__uuidof(ICorDebugController), (void**)&pCorDebugController);

      pCorDebugController->Stop(INFINITE /* Note: Value is ignored – always INFINITE */);

      pCorDebugController->Detach();

      pCorDebugController->Release();

}

pCorDebug->SetUnmanagedHandler(NULL);

pCorDebugUnmanagedCallback->Release();

 

pCorDebug->SetManagedHandler(NULL);

pCorDebugManagedCallback2->Release();

 

pCorDebug->Terminate();

 

pCorDebug->Release();

 

CoUninitialize();

 

There is still quite a big bit of code required to implement the debugger completely.

 

You’ll need an ICorDebugManagedCallback implementation that handles process exiting, attaching to an application domain (ICorDebugAppDomain::Attach), handling name changes, and continuation.

 

Plus, if you want to support .NET 2.0 debugging without .NET 4.0 installed, you’ll need to use LoadLibrary/GetProcAddress to call .NET 4.0 (optionally), and fall back to the .NET 2.0 GetVersionFromProcess and CreateDebuggingInterfaceFromVersion functions.

 

Conclusion:

The CLR Debugging API is not for the faint at heart.  There are numerous pitfalls when using the ICorDebug interface against different versions of the CLR, different versions of Windows, different architectures, and with or without interop debugging.

 

If you have any questions about the API, post a comment here and I’ll do my best to answer them for you.

Leave a Comment
  • Please add 4 and 8 and type the answer here:
  • Post
Page 1 of 1 (2 items)