Kirk Evans is a Microsoft Architect for the Azure Center of Excellence.
Introduction to SharePoint and Azure IaaS
Building SharePoint Apps with Windows Azure Platform as a Service
SharePoint Solutions and Architectures on Windows Azure Infrastructure Services
Understanding Authentication and Permissions with Apps for SharePoint and Office
In this post, we’ll see some steps about how to troubleshoot a production SharePoint site using WinDBG and PssCor2.dll. For an introduction to setting up your environment and getting started, see my blog post “Intro to WinDBG for .NET Developers”. In this post, we’ll apply some of our WinDBG skills to a SharePoint environment and show you how you can figure out what’s going on in your site.
In this example, a customer has raised the issue that their SharePoint site is very slow. Let’s assume that you have looked at IIS logs and notice server render times are abnormally long and you want to find out what the server is doing.
The first thing we need to do is figure out which process we need to capture memory for. In Windows Server 2008, open a command window to “c:\windows\system32\inetsrv” and type the following command:
appcmd list wp
This will list the worker processes for IIS, showing you the process ID (aka PID) and its associated application pool. Here’s the output on my machine.
Once you know the PID (in this screenshot, I want to troubleshoot PID 5292), open up task manager on the machine. Select the checkbox “Show processes from all users” and find the W3WP.exe process with the ID you just found. Right-click and choose “Create Dump File”.
Note that this will pause all threads for the process while the memory dump file is created, the length of time depends on the amount of memory the process is using.
Open WinDBG (making sure that you’ve already configured your environment for PssCor2.dll and set your symbol path) and choose “Open Crash Dump”. Open the .DMP file that task manager created for you. I copied PssCor2.dll into the same directory as WinDBG.exe, so I type:
Next, type “!help” to make sure that the psscor2.dll extension was loaded. Once you see the help instructions, we can start evaluating the memory dump file.
So far, this is my favorite command in the psscor2.dll extension. This handy little guy asks for all the .ASPX pages in memory. It shows the URL to the page, how long the thread has been running, what the thread ID is, and what the HTTP verb is for the response.
Looking at our screen shot, I can see a bunch of requests to “SlowWebPart.aspx” that have been running for 20 or more seconds. Most have a thread ID of “XXX”, but a few of them have a thread ID that we can inspect (like 50 and 55). One of the first things I do when trying to analyze a slow SharePoint site is to run this command to see how long things are taking. If you see one page frequently at the top, it is running the longest and is the culprit for consuming resources. If you see lots of different pages, you can look at the callstacks to see what everyone is doing and what they are waiting on.
When you hit a breakpoint in Visual Studio, you can see the call stack, showing the chain of methods that were called up until the breakpoint was hit. If you want to do something similar in WinDBG, you need to tell it thread contains the CLR stack you are interested in. You can get a list of managed threads with the command:
In WinDBG, you need to switch to a specific thread context to see its stack. In the previous !aspxpages example, we already know that there are threads 50 and 55 that we can take a look at, so we would issue the following statement:
The output is shown below.
Notice the yellow highlights in the screen shot. The first:
The command prompt is “0:000>”, and we issued the command to switch the thread context to thread ID 50. The second:
Notice the command prompt has changed to “0:050>”. This tells us that the current context is thread id 50. Looking at the stack, we see that our ASPX page was loaded, the CreateChildControls method was called on an object called MyWebPart.VisualWebPart1.VisualWebPart1, and then we see “MyWebPart.ServiceReference1.ServiceClient.GetData(Int32)” called before a bunch of WCF stuff. We can infer from looking at the stack that a web part or control is in the middle of calling a WCF service. We also saw earlier by using !aspxpages that this thread has been running for 24 seconds.
We will follow the steps in Tess Ferrandez’ demo video (watch the entire video, but scroll to 24:48 in the video) to list the CLR stack for every thread in memory.
~* e !clrstack
This statement says “for every thread, dump the CLR stack”. The output is very long, but we see a bunch of CLR stacks that look just like the screen shot above. This is our most likely suspect, a web part that is calling a web service synchronously and is waiting on a response.
Since we switched our context thread to thread ID 50, we want to inspect objects in the stack, kind of like looking at the Locals window in Visual Studio.
The output looks like:
I highlighted some of the results. Notice that the objects are shown as they are pushed onto the stack (oldest on stack at the bottom of the page). We can see our web part user control, and we can see some WCF stuff. Just like you would poke around in Visual Studio locals window to inspect various objects, we can inspect objects in WinDBG.
Notice the underlined item at the top of the WCF box. When you use !dso to dump the stack objects, the second column has a header column that says “Object”. This is actually the address for the object. We can use that piece of information to inspect the objects. For instance, we can inspect the ASCX user control by using its object address.
The output is:
Look at the very bottom of that output, we see that the user control contains a Label control. Want to see what’s in it? We can use the !dumpfield command. To see how to use it, use !help dumpfield to see the syntax. When we used !do, our object address was 00000001479219e0. We use that to inspect specific fields on that object, like drilling down into an object in the locals window in Visual Studio.
!dumpfield -field Label1 -string 00000001479219e0
The output shows the fields for the Label control. Pretty cool! We can use this trick to inspect the various objects in the stack. In our screenshot above (under the !DumpStackObjects heading), I underlined a System.String object with address 0000000141389b58. We can use the !dumpobject command to inspect it:
I highlighted in the result that the URI for the web service we are calling is “http://localhost:5555/service.svc”. Now we not only know that we have a bunch of web parts that are waiting on a response (and have been for a long time) from a service, but we also know the URI of the service. We can then do some tests to confirm that the web service is responding quite slowly!
We can then use this to determine a course of action. We have a web part that is blocking threads, waiting on a response from a web service that is not under our control. We shouldn’t block the thread waiting on a synchronous response from a web service, instead we should use an asynchronous request with a timeout. Our plan is then to change the web service call to be asynchronous, load test it using a tool like the Web Performance Test in Visual Studio 2010, and confirm that it now meets our performance expectations even when the web service we are calling is performing poorly.
Lest you think this example is contrived, let me assure you that it is not. This same behavior happens if you use HttpWebRequest synchronously, consume an RSS feed synchronously, call a WCF service synchronously, etc. I have seen a lot of people make web service calls from ASP.NET controls synchronously, this can kill your SharePoint site’s performance.
Video introduction to WinDBG for .NET Developers by Tess Ferrandez
.NET Debugging Demos Lab 4: High CPU Hang – Review also by Tess
Intro to WinDBG for .NET Developers
Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects
Thanks for your efforts here. I've always been intrigued as to what is consuming the memory in a SharePoint App. For example I develop and maintain a few large-ish 207 & 2010 publishing sites and sometimes see their memory usage hit 2GB. Are there any commands that can inspect the associated dump and report on how this memory usage is distributed? I'm particularly interested in what and how much data is stored in the Asp.Net/Publishing cache. Can you offer any tips?