.NET Crash Dump and Live Process Inspection

.NET Crash Dump and Live Process Inspection

Rate This
  • Comments 39

Analyzing crash dumps can be complicated. Although Visual Studio supports viewing managed crash dumps, you often have to resort to more specialized tools like the SOS debugging extensions or WinDbg. In today’s post, Lee Culver, software developer on the .NET Runtime team, will introduce you to a new managed library that allows you to automate inspection tasks and access even more debugging information. --Immo

Today are we excited to announce the beta release of the Microsoft.Diagnostics.Runtime component (called ClrMD for short) through the NuGet Package Manager.

ClrMD is a set of advanced APIs for programmatically inspecting a crash dump of a .NET program much in the same way as the SOS Debugging Extensions (SOS). It allows you to write automated crash analysis for your applications and automate many common debugger tasks.

We understand that this API won’t be for everyone -- hopefully debugging .NET crash dumps is a rare thing for you. However, our .NET Runtime team has had so much success automating complex diagnostics tasks with this API that we wanted to release it publicly.

One last, quick note, before we get started: The ClrMD managed library is a wrapper around CLR internal-only debugging APIs. Although those internal-only APIs are very useful for diagnostics, we do not support them as a public, documented release because they are incredibly difficult to use and tightly coupled with other implementation details of the CLR. ClrMD addresses this problem by providing an easy-to-use managed wrapper around these low-level debugging APIs.

Getting Started

Let's dive right into an example of what can be done with ClrMD. The API was designed to be as discoverable as possible, so IntelliSense will be your primary guide. As an initial example, we will show you how to collect a set of heap statistics (objects, sizes, and counts) similar to what SOS reports when you run the command !dumpheap –stat.

The “root” object of ClrMD to start with is the DataTarget class. A DataTarget represents either a crash dump or a live .NET process. In this example, we will attach to a live process that has the name “HelloWorld.exe” with a timeout of 5 seconds to attempt to attach:

        int pid = Process.GetProcessesByName("HelloWorld")[0].Id;
        using (DataTarget dataTarget = DataTarget.AttachToProcess(pid, 5000))
        {
            string dacLocation = dataTarget.ClrVersions[0].TryGetDacLocation();
            ClrRuntime runtime = dataTarget.CreateRuntime(dacLocation);

            // ...
        }    

You may wonder what the TryGetDacLocation method does. The CLR is a managed runtime, which means that it provides additional abstractions, such as garbage collection and JIT compilation, over what the operating system provides. The bookkeeping for those abstractions is done via internal data structures that live within the process. Those data structures are specific to the CPU architecture and the CLR version. In order to decouple debuggers from the internal data structures, the CLR provides a data access component (DAC), implemented in mscordacwks.dll. The DAC has a standardized interface and is used by the debugger to obtain information about the state of those abstractions, for example, the managed heap. It is essential to use the DAC that matches the CLR version and the architecture of the process or crash dump you want to inspect. For a given CLR version, the TryGetDacLocation method tries to find a matching DAC on the same machine. If you need to inspect a process for which you do not have a matching CLR installed, you have another option: you can copy the DAC from a machine that has that version of the CLR installed. In that case, you provide the path to the alternate mscordacwks.dll to the CreateRuntime method manually. You can read more about the DAC on MSDN.

Note that the DAC is a native DLL and must be loaded into the program that uses ClrMD. If the dump or the live process is 32-bit, you must use the 32-bit version of the DAC, which, in turn, means that your inspection program needs to be 32-bit as well. The same is true for 64-bit processes. Make sure that your program’s platform matches what you are debugging.

Analyzing the Heap

Once you have attached to the process, you can use the runtime object to inspect the contents of the GC heap:

        ClrHeap heap = runtime.GetHeap();
        foreach (ulong obj in heap.EnumerateObjects())
        {
            ClrType type = heap.GetObjectType(obj);
            ulong size = type.GetSize(obj);
            Console.WriteLine("{0,12:X} {1,8:n0} {2}", obj, size, type.Name);
        }
    

This produces output similar to the following:

         23B1D30       36 System.Security.PermissionSet
         23B1D54       20 Microsoft.Win32.SafeHandles.SafePEFileHandle
         23B1D68       32 System.Security.Policy.PEFileEvidenceFactory
         23B1D88       40 System.Security.Policy.Evidence
    

However, the original goal was to output a set of heap statistics. Using the data above, you can use a LINQ query to group the heap by type and sort by total object size:

        var stats = from o in heap.EnumerateObjects()
                    let t = heap.GetObjectType(o)
                    group o by t into g
                    let size = g.Sum(o => (uint)g.Key.GetSize(o))
                    orderby size
                    select new
                    {
                        Name = g.Key.Name,
                        Size = size,
                        Count = g.Count()
                    };

        foreach (var item in stats)
            Console.WriteLine("{0,12:n0} {1,12:n0} {2}", item.Size, item.Count, item.Name);
    

This will output data like the following -- a collection of statistics about what objects are taking up the most space on the GC heap for your process:

           564           11 System.Int32[]
           616            2 System.Globalization.CultureData
           680           18 System.String[]
           728           26 System.RuntimeType
           790            7 System.Char[]
         5,788          165 System.String
        17,252            6 System.Object[]
    

ClrMD Features and Functionality

Of course, there’s a lot more to this API than simply printing out heap statistics. You can also walk every managed thread in a process or crash dump and print out a managed callstack. For example, this code prints the managed stack trace for each thread, similar to what the SOS !clrstack command would report (and similar to the output in the Visual Studio stack trace window):

        foreach (ClrThread thread in runtime.Threads)
        {
            Console.WriteLine("ThreadID: {0:X}", thread.OSThreadId);
            Console.WriteLine("Callstack:");

            foreach (ClrStackFrame frame in thread.StackTrace)
                Console.WriteLine("{0,12:X} {1,12:X} {2}", frame.InstructionPointer, frame.StackPointer, frame.DisplayString);

            Console.WriteLine();
        }
    

This produces output similar to the following:

        ThreadID: 2D90
        Callstack:
                   0       90F168 HelperMethodFrame
            660E3365       90F1DC System.Threading.Thread.Sleep(Int32)
              C70089       90F1E0 HelloWorld.Program.Main(System.String[])
                   0       90F36C GCFrame
    

Each ClrThread object also contains a CurrentException property, which may be null, but if not, contains the last thrown exception on this thread. This exception object contains the full stack trace, message, and type of the exception thrown.

ClrMD also provides the following features:

  • Gets general information about the GC heap:
    • Whether the GC is workstation or server
    • The number of logical GC heaps in the process
    • Data about the bounds of GC segments
  • Walks the CLR’s handle table (similar to !gchandles in SOS).
  • Walks the application domains in the process and identifies which modules are loaded into them.
  • Enumerates threads, callstacks of those threads, the last thrown exception on threads, etc.
  • Enumerates the object roots of the process (as the GC sees them for our mark-and-sweep algorithm).
  • Walks the fields of objects.
  • Gets data about the various heaps that the .NET runtime uses to see where memory is going in the process (see ClrRuntime.EnumerateMemoryRegions in the ClrMD package).

All of this functionality can generally be found on the ClrRuntime or the ClrHeap objects, as seen above. IntelliSense can help you explore the various properties and functions when you install the ClrMD package. In addition, you can also use the attached sample code.

Please use the comments under this post to let us know if you have any feedback!

Attachment: ClrMDSample.cs
Leave a Comment
  • Please add 3 and 2 and type the answer here:
  • Post
  • @Alexey: Can you provide me your code for calculating the heap graph? (Mail:  toni.wenzel@googlemail.com)

    I'm currently investigating a memory leak of our own application. It would be interesting how you managed this.

    THX!

  • What the ClrRoot.Address used for? Points this to the same as ClrRoot.Object?

    How can I receive following informations:

    - The object (handle) which pins a object

    - The object which is pinned by the root (I guess ClrRoot.Object)

    I would like to know which object prevent which object from being collected (GC relocated).

  • What is the ClrType.GetFieldForOffset() "inner" parameter used for?

  • Great work!

    But when I tried it out on a production dump I don’t get the same answer from the sample code as I got from WinDbg for the command "!dump heap –stat".

    Example for strings

    The sample code returns: 16 318 082      199 815 System.String

    But in WinDbg I get :        21004872  191564     System.String

    I miss 3 Mb of string objects?!

    And when I trying to search for “ClaimsPrincipal” objects it’s possible to locate 46 of them with WinDbg but none with ClrMD?

    Is it something I have missed?

  • Wow, I wish I knew about this weeks ago. This is a fantastic little library and it's making my deep investigations into many millions of objects much more bearable. Thanks kindly :)

  • So, I might be doing something wrong, but I'm having a hard time working with array fields while trying to browse an object. I'm currently using ClrInstanceField.GetFieldValue(parentObjectAddress) which I was hoping would give me the address of the Array, since that is what it does for objects. Instead it seems to be returning something else? It also seems like it thinks the array in every generic List<T> is an Object[] but this would imply that Generic collections don't prevent boxing, which I know to be false.

    I'm also curious that when I use GetFieldValue on an Object type, the address it gives back seems to work fine with field.Type,  but heap.GetObjectType for the same address returns null or sometimes strange values. I only stumbled this way when trying to account for polymorphism while browsing referenced objects deeper than my starting point, since I figured ClrInstanceField.Type would reflect the general type definition, not necessarily the actual type stored in a particular instance (e.g. field definition type: IEnumerable, instance reference: ArrayList).

    Maybe you could provide some more sample code now that this has been in the wild for a while? Without documentation it has been hard to infer how one might dig deep into an object graph, especially regarding fields that aren't primitive values (structs/arrays/objects/etc.). There are very few deep resources online, though the ScriptCs module and a few other blogs have been helpful, I am encountering plenty of things that require a lot of trial and error, which is costing me more time than I was hoping this tool would save me. I still think the knowledge will benefit me in the long run, but a followup would be nice. Maybe some of those internal automated diagnostics might be safe to share with the public?

    On a positive note, I've had great success combining some work I did automating against dbgeng and SOS with this library and they appear to be complimenting each other well (since I already have some SOS parsing implemented).

  • I love this tool, but would also like to use an app written with it against some dumps containing unmanaged code from old legacy apps to automate large numbers of repetitive actions.  I'm thinking the tool can do it because DebugDiag v2 uses ClrMD and it can open unmanaged dumps.   But I can't figure out how to load the required unmanaged-compatible clr10\sos from ClrMD-based code.   The code seems to required the FindDAC step and, of course, there are no CLR versions in the dump at all.

    How can I get ClrMd to use the Clr10\Sos and let me use the ReadMemory, thread-related, and other convenient debugging commands?

    Thanks!

    -Bob

  • I realize now that I didn't put my name with my question, but I've further detailed the question above on StackOverflow. Sadly, I don't think there are many people using this extensively yet, so I'm concerned by the fact that the question is already well below the average number of viewers for a new question. I'm posting the link here both for experts that might see this as well as others who might have the same question:

    stackoverflow.com/.../how-to-properly-work-with-non-primitive-clrinstancefield-values-using-clrmd

  • Hi,

    We need to parse dictionary of type <string, List<objects>> using ClrMD. Dictionary are being stored as System.Collections.Hashtable+bucket[] into memory as per our understanding. We have tried to parse dictionary of type<string, List<objects>> by enumerating internal Hashtable+bucket[] objects, but we aren’t successful. We are able to parse dictionary values(List<objects>>) as individual arrays[]. But we aren’t able to correlate these individual arrays[] belongs to which Keys. To know this, we need to parse dictionary of type <string, List<objects>>.

    Can you please provide us pointers/direction on how to parse dictionary using ClrMD ? Sample of code will be helpful.

Page 3 of 3 (39 items) 123