This is the second in a series of 3 posts which discuss troubleshooting SQL Server Management Studio when it encounters errors or unexpected behaviour.

In this post we collect a memory dump of SSMS based upon specific .NET exception codes, for subsequent analysis.

As a reminder, here are the symptoms of the problem we are looking at:

1. Unable to view Job Activity in the SQL Server Agent Job Activity Monitor

2. Progress bar shows “Cannot display activity data”

3. Click on more details throws a message box with the error

Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)

with the additional details

Hour, Minute, and Second parameters describe an un-representable DateTime. (mscorlib)

It actually looks like this (click to enlarge)

image

4. The error repros with all versions of SSMS (2005 and 2008)

5. The error repros for all users whatever workstation they connect from.

The debugging approach

Initially I was thinking that since I had a .NET error, I could run adplus and use a config file to configure the debugger to capture the .NET errors. This would work and I’ve done it before, but then Tess reminded me that I could use DebugDiag and it would probably be easier and quicker. So I tried this:

Firstly I installed the DebugDiag tool on the machine where I was repro’ing the problem. I started it up and created a new rule, the properties of which are shown below. Being as I didn’t know exactly what was going on at this stage, I just elected to log everything to a file to begin with, rather than creating dumps at first, so I could get a better idea of what was really happening in the process. I expected that I was getting some .NET errors based upon the message box shown, but if I just created a dump on all .NET exceptions, I might miss something else that came before it. The rule looked like this:

1. Type = Crash

2. Process = ssms.exe (i was using SQL 2008 tools, if I’d been using SQL 2005 tools the process name would be sqlwb.exe)

3. Action type for unconfigured first chance exceptions = Log Stack Trace

4. Exceptions - CLR (.NET) Exception - All Exception Types = Log Stack Trace

5. Action Limits = 0 (since I’m only logging to file at this stage)

Here’s what it looks like in the GUI (click to enlarge)

image

I saved the rule, enabled it straight away, and then repro’d the problem again. I then disabled the rule and browsed for the log file which in my case was held here:

C:\Program Files (x86)\DebugDiag\Logs\

it was called:

Ssms__PID__2912__Date__10_19_2009__Time_03_35_46PM__898__Log.txt

Review of this file showed that actually 3 different exceptions had occurred in this order:

[10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
[10/19/2009 3:35:55 PM] CLR Exception Type - 'System.ArgumentOutOfRangeException'

then

[10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
[10/19/2009 3:35:55 PM] CLR Exception Type - 'Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException'

then finally

[10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
[10/19/2009 3:35:55 PM] CLR Exception Type - 'Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException'

It may be in this case that all these errors are related, however the theory of only logging information first proved valuable, as I want to create a dump of when the very first error occurs. Therefore I now know that I can edit my rule to tell DebugDiag to create a full user dump, when it encounters a CLR Exception Type - 'System.ArgumentOutOfRangeException', and that it should only do the dump once. The theory of logging first is also backed up by the fact that repro’ing this problem on a x86 machine throws a different chain of .NET exceptions (although the first one is still 'System.ArgumentOutOfRangeException')

That said, I’m now ready to capture a full user dump of the ssms.exe process, based upon the above rule, so I can debug it properly, and see the true root of the problem. I can edit my rule created earlier, and change the Exceptions dialogue box to add an additional Exception sub-rule, which looks like this (click to enlarge)

image

and has the properties

1. Exceptions - CLR (.NET) Exception - .NET Exception Types = System.ArgumentOutOfRangeException

2. Action Type = Full Userdump

3. Action Limit = 1

I add this exception rule, click OK a couple of times and re-enable the rule. Then run the repro again, and I notice the SSMS dialogue hang for a couple of seconds whilst it opens. This is because DebugDiag is creating the memory dump.

I disable the rule again, and then open the log file.The output is similar to before, except that after the first exception is logged, and after the stack trace of this exception, I note like the following:

[10/20/2009 3:14:08 PM] Created dump file C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of Ssms.exe\Ssms__PID__2912__Date__10_20_2009__Time_03_14_02PM__806__First Chance System.ArgumentOutOfRangeException.dmp
[10/20/2009 3:14:08 PM] Action limit of 1 reached for Exception 0xE0434F4D:SYSTEM.ARGUMENTOUTOFRANGEEXCEPTION.

This confirms that I have successfully created a dump, and I have created it at the point when the first exception was encountered. This technique ought to work for any CLR Exception that SSMS can encounter. As I mentioned above it would also be just as acceptable to use adplus with a config file to create the dump.

in Part 3, it’s time to analyze that dump.