.NET Crash Dump and Live Process Inspection

.NET Crash Dump and Live Process Inspection

Rate This
  • Comments 39

Analyzing crash dumps can be complicated. Although Visual Studio supports viewing managed crash dumps, you often have to resort to more specialized tools like the SOS debugging extensions or WinDbg. In today’s post, Lee Culver, software developer on the .NET Runtime team, will introduce you to a new managed library that allows you to automate inspection tasks and access even more debugging information. --Immo

Today are we excited to announce the beta release of the Microsoft.Diagnostics.Runtime component (called ClrMD for short) through the NuGet Package Manager.

ClrMD is a set of advanced APIs for programmatically inspecting a crash dump of a .NET program much in the same way as the SOS Debugging Extensions (SOS). It allows you to write automated crash analysis for your applications and automate many common debugger tasks.

We understand that this API won’t be for everyone -- hopefully debugging .NET crash dumps is a rare thing for you. However, our .NET Runtime team has had so much success automating complex diagnostics tasks with this API that we wanted to release it publicly.

One last, quick note, before we get started: The ClrMD managed library is a wrapper around CLR internal-only debugging APIs. Although those internal-only APIs are very useful for diagnostics, we do not support them as a public, documented release because they are incredibly difficult to use and tightly coupled with other implementation details of the CLR. ClrMD addresses this problem by providing an easy-to-use managed wrapper around these low-level debugging APIs.

Getting Started

Let's dive right into an example of what can be done with ClrMD. The API was designed to be as discoverable as possible, so IntelliSense will be your primary guide. As an initial example, we will show you how to collect a set of heap statistics (objects, sizes, and counts) similar to what SOS reports when you run the command !dumpheap –stat.

The “root” object of ClrMD to start with is the DataTarget class. A DataTarget represents either a crash dump or a live .NET process. In this example, we will attach to a live process that has the name “HelloWorld.exe” with a timeout of 5 seconds to attempt to attach:

        int pid = Process.GetProcessesByName("HelloWorld")[0].Id;
        using (DataTarget dataTarget = DataTarget.AttachToProcess(pid, 5000))
        {
            string dacLocation = dataTarget.ClrVersions[0].TryGetDacLocation();
            ClrRuntime runtime = dataTarget.CreateRuntime(dacLocation);

            // ...
        }    

You may wonder what the TryGetDacLocation method does. The CLR is a managed runtime, which means that it provides additional abstractions, such as garbage collection and JIT compilation, over what the operating system provides. The bookkeeping for those abstractions is done via internal data structures that live within the process. Those data structures are specific to the CPU architecture and the CLR version. In order to decouple debuggers from the internal data structures, the CLR provides a data access component (DAC), implemented in mscordacwks.dll. The DAC has a standardized interface and is used by the debugger to obtain information about the state of those abstractions, for example, the managed heap. It is essential to use the DAC that matches the CLR version and the architecture of the process or crash dump you want to inspect. For a given CLR version, the TryGetDacLocation method tries to find a matching DAC on the same machine. If you need to inspect a process for which you do not have a matching CLR installed, you have another option: you can copy the DAC from a machine that has that version of the CLR installed. In that case, you provide the path to the alternate mscordacwks.dll to the CreateRuntime method manually. You can read more about the DAC on MSDN.

Note that the DAC is a native DLL and must be loaded into the program that uses ClrMD. If the dump or the live process is 32-bit, you must use the 32-bit version of the DAC, which, in turn, means that your inspection program needs to be 32-bit as well. The same is true for 64-bit processes. Make sure that your program’s platform matches what you are debugging.

Analyzing the Heap

Once you have attached to the process, you can use the runtime object to inspect the contents of the GC heap:

        ClrHeap heap = runtime.GetHeap();
        foreach (ulong obj in heap.EnumerateObjects())
        {
            ClrType type = heap.GetObjectType(obj);
            ulong size = type.GetSize(obj);
            Console.WriteLine("{0,12:X} {1,8:n0} {2}", obj, size, type.Name);
        }
    

This produces output similar to the following:

         23B1D30       36 System.Security.PermissionSet
         23B1D54       20 Microsoft.Win32.SafeHandles.SafePEFileHandle
         23B1D68       32 System.Security.Policy.PEFileEvidenceFactory
         23B1D88       40 System.Security.Policy.Evidence
    

However, the original goal was to output a set of heap statistics. Using the data above, you can use a LINQ query to group the heap by type and sort by total object size:

        var stats = from o in heap.EnumerateObjects()
                    let t = heap.GetObjectType(o)
                    group o by t into g
                    let size = g.Sum(o => (uint)g.Key.GetSize(o))
                    orderby size
                    select new
                    {
                        Name = g.Key.Name,
                        Size = size,
                        Count = g.Count()
                    };

        foreach (var item in stats)
            Console.WriteLine("{0,12:n0} {1,12:n0} {2}", item.Size, item.Count, item.Name);
    

This will output data like the following -- a collection of statistics about what objects are taking up the most space on the GC heap for your process:

           564           11 System.Int32[]
           616            2 System.Globalization.CultureData
           680           18 System.String[]
           728           26 System.RuntimeType
           790            7 System.Char[]
         5,788          165 System.String
        17,252            6 System.Object[]
    

ClrMD Features and Functionality

Of course, there’s a lot more to this API than simply printing out heap statistics. You can also walk every managed thread in a process or crash dump and print out a managed callstack. For example, this code prints the managed stack trace for each thread, similar to what the SOS !clrstack command would report (and similar to the output in the Visual Studio stack trace window):

        foreach (ClrThread thread in runtime.Threads)
        {
            Console.WriteLine("ThreadID: {0:X}", thread.OSThreadId);
            Console.WriteLine("Callstack:");

            foreach (ClrStackFrame frame in thread.StackTrace)
                Console.WriteLine("{0,12:X} {1,12:X} {2}", frame.InstructionPointer, frame.StackPointer, frame.DisplayString);

            Console.WriteLine();
        }
    

This produces output similar to the following:

        ThreadID: 2D90
        Callstack:
                   0       90F168 HelperMethodFrame
            660E3365       90F1DC System.Threading.Thread.Sleep(Int32)
              C70089       90F1E0 HelloWorld.Program.Main(System.String[])
                   0       90F36C GCFrame
    

Each ClrThread object also contains a CurrentException property, which may be null, but if not, contains the last thrown exception on this thread. This exception object contains the full stack trace, message, and type of the exception thrown.

ClrMD also provides the following features:

  • Gets general information about the GC heap:
    • Whether the GC is workstation or server
    • The number of logical GC heaps in the process
    • Data about the bounds of GC segments
  • Walks the CLR’s handle table (similar to !gchandles in SOS).
  • Walks the application domains in the process and identifies which modules are loaded into them.
  • Enumerates threads, callstacks of those threads, the last thrown exception on threads, etc.
  • Enumerates the object roots of the process (as the GC sees them for our mark-and-sweep algorithm).
  • Walks the fields of objects.
  • Gets data about the various heaps that the .NET runtime uses to see where memory is going in the process (see ClrRuntime.EnumerateMemoryRegions in the ClrMD package).

All of this functionality can generally be found on the ClrRuntime or the ClrHeap objects, as seen above. IntelliSense can help you explore the various properties and functions when you install the ClrMD package. In addition, you can also use the attached sample code.

Please use the comments under this post to let us know if you have any feedback!

Attachment: ClrMDSample.cs
Leave a Comment
  • Please add 7 and 6 and type the answer here:
  • Post
  • To answer my own comment - it seems you can self-debug. Seems to work - would be nice to know if it is supported.

    BTW, there is an error in the sample - I assume it should be:

    // If we don't have the dac installed, we will use the long-name dac in the same folder.

    if (string.IsNullOrEmpty(dacLocation))  // ***** without '!' ? ******

       dacLocation = version.DacInfo.FileName;

  • @Hrvoje, yep that's an error, sorry about that, it should not have the '!'.  :(

    The self-debug case is not a supported scenario because there's not a sensible way to make it work.  For example, you can attempt to inspect your own heap with it, the ClrMD api itself will be allocating objects, which will trigger GCs, which in turn will cause your heap walk to fail when a GC rearranges the heap as you were walking it.

    This should always be used to inspect another process (or crash dump).

  • How can I call this to dump all objects under a given class or namespace from code?  I want to dump all objects under a given dialogue window when that window is supposedly closed and deallocated.  This would greatly help in finding objects that have not been garbage collected.  

  • I also want to do a memory snapshot of allocated ojbects by full type name and object id and then at a later time compare that to the current memory snapshot.  I'd want only the objects in the second snapshot that do not exist in the first one to be printed.   This helps for code that should clean up all of its resources when it exits.  

    I've used this in C++ in the past to put in automatic debug only checks for memory leaks (e.g.,  snapshot, call method A, snapshot, compare snapshots,  if snapshots differ, break in debug mode).

  • @Tom

    ClrType instances have a .Name which contains the namespace as well as the typename.  You can use this to separate out the heap by namespace (though I suppose it would be better to provide a Namespace property instead of making you parse out the name...that's not currently in the API).

    As to your second question about doing heap diffs, the main obstacle to doing this is that the GC relocates objects, and an object can still be alive between two snapshots, but the object got moved...so you don't know the instance is the same.  To solve this, we use a heuristic which basically does a diff of the type statistics (100 Foo objects in snapshot 1, 110 Foo objects in snapshot 2, 10 Foo objects difference).

    In fact, perfview's memory diagnostics already does this today:  www.microsoft.com/.../details.aspx

    (Memory diagnostic in PerfView is actually built on top of ClrMD.)

  • is there any limitation on the kind of process we can attach to? e.g not runnning as admin and more

    cause i have tried to attach to one of my own processes and got exception

    Could not attach to pid 514, HRESULT: 0xd00000bb

  • hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

    i write a dll in project ATL with a function witch function inclusive a input and output parameter is that byte* type

    STDMETHODIMP CMSDllServer::sum22(BYTE* aa,SHORT len)

    {

    AFX_MANAGE_STATE(AfxGetStaticModuleState());

    for(int i=1;i<len;i++)

    {

    aa[i]=i;

    }

     return S_OK;

    }

    in use during from this function in a windowes application c# doesnot exist no problem and array values returned to truth

    byte[] Packet = new byte[5];

    dllServer.sum22(ref Packet[0],5);

          //out put

            1,2,3,4,5

    but

    the same function in a webservice returned to only first index array and exist very big problem

           byte[] Packet = new byte[5];

    dllServer.sum22(ref Packet[0],5);

          //out put

            1,0,0,0,0

    help me pleaseeeeeeeeeeeeeeeeeeeeeeeeeeeee

    thanx

  • Seems my original comment never made it through.

    Our use case seems to be a bit simpler than this dll was intended to be used. We produce mission critical software (high availability and fault tolerance required, low installation count) which sometimes presents a challenge to monitor and diagnose.

    I view ClrMD as a possibility to implement a miniature adtools-like component that would always be present with the deployment of our software to cover these use cases:

    - monitor the target process and take a "call stack dump" if CPU usage > 90% for more than X ms

    - monitor the target process and take a "memory object count" if memory usage > X

    - monitor the target process (GUI application) and take a "call stack dump" if the UI thread is blocked for more than 250ms (there was a MS visual studio extension that did the same thing, but produced minidumps - there was a nice video about it on channel9, but I can't seem to find it. Basically, microsoft was using it to detect usability problems in production).

    However, I would also like to see the following:

    - integrated ability to take usable minidump and full dumps of attached process, something like clrdump (google it, first result - seems the comments delete any of my posts that contains a link without warning...)

    - ability to take a snapshot of native call stacks - 95% of our threads are .net, but there are a couple of native threads we integrate with through C++/CLI and we would really like to see both native and managed callstacks. There should be an easy way to convert the call stack to a readable format with the matching symbols (symbol server support). I know this may not be your primary use case, but it would complete our needed feature set.

  • Hi,

    Nice ! :-)

    It's really nice to see this kind of things released publicly since all analysis needs (automated or not) are not covered by SOS or even by other extensions like Steve Johnson's SOSEX (www.stevestechspot.com/SOSEXUpdatendashReadyForNET45.aspx).

    Gaël

  • @Hrvoje

    > ability to take a snapshot of native call stacks

    You can do this using the IDebug* interfaces provided by DbgEng (full details too long for a comment here, but you can search for IDebugControl::GetStackTrace).  ClrMD provides a (mostly) complete wrapper around those interfaces:

           using Microsoft.Diagnostics.Runtime;

           ...

           IDebugControl control = (IDebugControl)dataTarget.DebuggerInterface;

           control.GetStackTrace(...);

    > integrated ability to take usable minidump and full dumps of attached process, something like clrdump.

           IDebugClient client = (IDebugClient)dataTarget.DebuggerInterface;

           client.WriteDumpFile(...);

    You can use the IDebug* interfaces to fully control your process (go, step, etc), but again...that's more detail than I can put in this comment.  The API is still in beta too.  =)

  • @li-raz

    I should have pointed out in the post, attaching to a process requires debugger privileges (in this case, that almost certainly means running the program as administrator).

    You do not need admin privileges to load a crash dump.

  • Is the clrCmd .net library on a path to be fully supported and part of .NET 4.x or later?   We can use beta version code in our development environment but not in our production environment given the production environment has many different long running server processes.

    Here is the quote from the blog post:

    The ClrMD managed library is a wrapper around CLR internal-only debugging APIs. Although those internal-only APIs are very useful for diagnostics, we do not support them as a public, documented release because they are incredibly difficult to use and tightly coupled with other implementation details of the CLR. ClrMD addresses this problem by providing an easy-to-use managed wrapper around these low-level debugging APIs.

  • @Tom: The current version of ClrMD is a pre-release, which means its license doesn’t allow usage in production environments. Once we turn ClrMD into a stable release it will allow production use.

  • Is there any plan to provide one for .Net 3.5 framework (CLR 2)?

  • Thanks for this Immo. I've actually spent the past few months writing exactly the same library using the CorDebug, MetaData and Symbol server APIs. I've actually got it all up and running, although I've targeted crash dumps (full and partial) as opposed to a live process.

    Do you have any thoughts on ClrMD vs CorDebug? Obviously CorDebug is geared towards debugging and happens to support crash dumps as an extra bonus, while ClrMD is focussed on analysis and not debugging.

    It's great that ClrMD takes away all of the grunt labour I've had to do in order to get CorDebug up and running for crash dumps, like implementing ICorDebugDataTarget (great fun for partial dumps!) and parsing MetaData binary signatures which is a truely painful experience.

    It's awesome to have an officially supported way of doing this now, but any thoughts on whether CorDebug will continue to support crash dumps? And any future plans for ClrMD, is this just the beginning ofClrMD ? Really excited by this so would love to hear anything you have to say :)

    ps - is the team hiring? I've got experience in CorDebug, MetaData and the Symbol Server API's :D

Page 1 of 3 (39 items) 123