SQL in Stockholm

SQL Server thoughts from the support team in Stockholm

SQL Server troubleshooting thoughts from Graham Kent, (ex) Senior Support Escalation Engineer at Microsoft Sweden.

  • SQL in Stockholm

    Troubleshooting SQL Server Management Studio with SQL Profiler and Debugging Tools – Part 2

    • 0 Comments

    This is the second in a series of 3 posts which discuss troubleshooting SQL Server Management Studio when it encounters errors or unexpected behaviour.

    In this post we collect a memory dump of SSMS based upon specific .NET exception codes, for subsequent analysis.

    As a reminder, here are the symptoms of the problem we are looking at:

    1. Unable to view Job Activity in the SQL Server Agent Job Activity Monitor

    2. Progress bar shows “Cannot display activity data”

    3. Click on more details throws a message box with the error

    Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)

    with the additional details

    Hour, Minute, and Second parameters describe an un-representable DateTime. (mscorlib)

    It actually looks like this (click to enlarge)

    image

    4. The error repros with all versions of SSMS (2005 and 2008)

    5. The error repros for all users whatever workstation they connect from.

    The debugging approach

    Initially I was thinking that since I had a .NET error, I could run adplus and use a config file to configure the debugger to capture the .NET errors. This would work and I’ve done it before, but then Tess reminded me that I could use DebugDiag and it would probably be easier and quicker. So I tried this:

    Firstly I installed the DebugDiag tool on the machine where I was repro’ing the problem. I started it up and created a new rule, the properties of which are shown below. Being as I didn’t know exactly what was going on at this stage, I just elected to log everything to a file to begin with, rather than creating dumps at first, so I could get a better idea of what was really happening in the process. I expected that I was getting some .NET errors based upon the message box shown, but if I just created a dump on all .NET exceptions, I might miss something else that came before it. The rule looked like this:

    1. Type = Crash

    2. Process = ssms.exe (i was using SQL 2008 tools, if I’d been using SQL 2005 tools the process name would be sqlwb.exe)

    3. Action type for unconfigured first chance exceptions = Log Stack Trace

    4. Exceptions - CLR (.NET) Exception - All Exception Types = Log Stack Trace

    5. Action Limits = 0 (since I’m only logging to file at this stage)

    Here’s what it looks like in the GUI (click to enlarge)

    image

    I saved the rule, enabled it straight away, and then repro’d the problem again. I then disabled the rule and browsed for the log file which in my case was held here:

    C:\Program Files (x86)\DebugDiag\Logs\

    it was called:

    Ssms__PID__2912__Date__10_19_2009__Time_03_35_46PM__898__Log.txt

    Review of this file showed that actually 3 different exceptions had occurred in this order:

    [10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
    [10/19/2009 3:35:55 PM] CLR Exception Type - 'System.ArgumentOutOfRangeException'

    then

    [10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
    [10/19/2009 3:35:55 PM] CLR Exception Type - 'Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException'

    then finally

    [10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
    [10/19/2009 3:35:55 PM] CLR Exception Type - 'Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException'

    It may be in this case that all these errors are related, however the theory of only logging information first proved valuable, as I want to create a dump of when the very first error occurs. Therefore I now know that I can edit my rule to tell DebugDiag to create a full user dump, when it encounters a CLR Exception Type - 'System.ArgumentOutOfRangeException', and that it should only do the dump once. The theory of logging first is also backed up by the fact that repro’ing this problem on a x86 machine throws a different chain of .NET exceptions (although the first one is still 'System.ArgumentOutOfRangeException')

    That said, I’m now ready to capture a full user dump of the ssms.exe process, based upon the above rule, so I can debug it properly, and see the true root of the problem. I can edit my rule created earlier, and change the Exceptions dialogue box to add an additional Exception sub-rule, which looks like this (click to enlarge)

    image

    and has the properties

    1. Exceptions - CLR (.NET) Exception - .NET Exception Types = System.ArgumentOutOfRangeException

    2. Action Type = Full Userdump

    3. Action Limit = 1

    I add this exception rule, click OK a couple of times and re-enable the rule. Then run the repro again, and I notice the SSMS dialogue hang for a couple of seconds whilst it opens. This is because DebugDiag is creating the memory dump.

    I disable the rule again, and then open the log file.The output is similar to before, except that after the first exception is logged, and after the stack trace of this exception, I note like the following:

    [10/20/2009 3:14:08 PM] Created dump file C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of Ssms.exe\Ssms__PID__2912__Date__10_20_2009__Time_03_14_02PM__806__First Chance System.ArgumentOutOfRangeException.dmp
    [10/20/2009 3:14:08 PM] Action limit of 1 reached for Exception 0xE0434F4D:SYSTEM.ARGUMENTOUTOFRANGEEXCEPTION.

    This confirms that I have successfully created a dump, and I have created it at the point when the first exception was encountered. This technique ought to work for any CLR Exception that SSMS can encounter. As I mentioned above it would also be just as acceptable to use adplus with a config file to create the dump.

    in Part 3, it’s time to analyze that dump.

  • SQL in Stockholm

    Troubleshooting SQL Server Management Studio with SQL Profiler and Debugging Tools – Part 1

    • 1 Comments

    This is the first in a series of 3 posts which discuss troubleshooting SQL Server Management Studio when it encounters errors or unexpected behaviour.

    In this post we look at using profiler, in the 2nd and 3rd we debug SSMS using public Microsoft debugging tools.

    Here are the symptoms of the problem we are looking at:

    1. Unable to view Job Activity in the SQL Server Agent Job Activity Monitor

    2. Progress bar shows “Cannot display activity data”

    3. Click on more details throws a message box with the error

    Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)

    with the additional details

    Hour, Minute, and Second parameters describe an un-representable DateTime. (mscorlib)

    It actually looks like this

    image

    4. The error repros with all versions of SSMS (2005 and 2008)

    5. The error repros for all users whatever workstation they connect from.

    These last 2 points are particularly relevant. Whenever you are looking at an SSMS problem, you always need to be asking these 2 questions. Straight at the off, you have potentially then eliminated whole areas of research (such as corruptions of .NET framework on specific workstations, incompatibilities in processor architecture specific versions of the framework, and localising whether the problem is with the SSMS client or the server instance being managed, or a combination of both)

    The initial approach – using SQL Profiler and some common sense

    When I initially came to this problem, the customer was experiencing the errors above. Normally when I get any errors or problems in SSMS, rather than reaching for debugger, I reach for SQL Profiler first (having asked the above 2 questions about where and when the issue repros), and this is a good rule of thumb. At the end of the day SSMS is just a GUI which issues TSQL commands to achieve functions in SQL Server. Whenever I get unexpected results in SSMS, the first thing I therefore do is to run a profiler trace against the server which SSMS is connected to. This means that you can see the TSQL which is issued, and you can sometimes find the problem straight off, if for example it issues some piece of bad code which obviously causes your error.

    In this case if you run the SQL profiler GUI tool (you can simply point it at the appropriate server instance and use the default options), you’ll note that the SSMS issues the following SQL batches.

    SET LOCK_TIMEOUT 10000
    select SERVERPROPERTY(N'servername')
    declare @RegPathParams sysname
                    declare @Arg sysname
                    declare @Param sysname
                    declare @MasterPath nvarchar(512)
     
                    declare @LogPath nvarchar(512)
                    declare @ErrorLogPath nvarchar(512)
                    declare @n int
     
                    select @n=0
                    select @RegPathParams=N'Software\Microsoft\MSSQLServer\MSSQLServer'+'\Parameters'
                    select @Param='dummy'
                    while(not @Param is null)
                    begin
                        select @Param=null
                        select @Arg='SqlArg'+convert(nvarchar,@n)
     
                        exec master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', @RegPathParams, @Arg, @Param OUTPUT
                        if(@Param like '-d%')
                        begin
                            select @Param=substring(@Param, 3, 255)
                            select @MasterPath=substring(@Param, 1, len(@Param) - charindex('\', reverse(@Param)))
                        end
                        else if(@Param like '-l%')
                        begin
                            select @Param=substring(@Param, 3, 255)
                            select @LogPath=substring(@Param, 1, len(@Param) - charindex('\', reverse(@Param)))
                        end
                        else if(@Param like '-e%')
                        begin
                            select @Param=substring(@Param, 3, 255)
                            select @ErrorLogPath=substring(@Param, 1, len(@Param) - charindex('\', reverse(@Param)))
                        end
                        
                        select @n=@n+1
                    end
                
     
     
                    declare @SmoRoot nvarchar(512)
                    exec master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\Setup', N'SQLPath', @SmoRoot OUTPUT
                
     
     
    SELECT
    CAST(case when 'a' <> 'A' then 1 else 0 end AS bit) AS [IsCaseSensitive],
    CAST(FULLTEXTSERVICEPROPERTY('IsFullTextInstalled') AS bit) AS [IsFullTextInstalled],
    @LogPath AS [MasterDBLogPath],
    @MasterPath AS [MasterDBPath],
    @ErrorLogPath AS [ErrorLogPath],
    @@MAX_PRECISION AS [MaxPrecision],
    @SmoRoot AS [RootDirectory],
    SERVERPROPERTY(N'ProductVersion') AS [VersionString],
    CAST(SERVERPROPERTY(N'MachineName') AS sysname) AS [NetName],
    CAST(SERVERPROPERTY(N'Edition') AS sysname) AS [Edition],
    CAST(SERVERPROPERTY(N'ProductLevel') AS sysname) AS [ProductLevel],
    CAST(SERVERPROPERTY('IsClustered') AS bit) AS [IsClustered],
    CAST(SERVERPROPERTY('IsSingleUser') AS bit) AS [IsSingleUser],
    CAST(SERVERPROPERTY('EngineEdition') AS int) AS [EngineEdition],
    SERVERPROPERTY(N'ResourceVersion') AS [ResourceVersionString],
    SERVERPROPERTY(N'ResourceLastUpdateDateTime') AS [ResourceLastUpdateDateTime],
    SERVERPROPERTY(N'BuildClrVersion') AS [BuildClrVersionString],
    SERVERPROPERTY(N'ComputerNamePhysicalNetBIOS') AS [ComputerNamePhysicalNetBIOS],
    SERVERPROPERTY(N'CollationID') AS [CollationID],
    SERVERPROPERTY(N'ComparisonStyle') AS [ComparisonStyle],
    SERVERPROPERTY(N'SqlCharSet') AS [SqlCharSet],
    SERVERPROPERTY(N'SqlCharSetName') AS [SqlCharSetName],
    SERVERPROPERTY(N'SqlSortOrder') AS [SqlSortOrder],
    SERVERPROPERTY(N'SqlSortOrderName') AS [SqlSortOrderName],
    convert(sysname, serverproperty(N'collation')) AS [Collation]
    exec sp_executesql N'SELECT
    dtb.collation_name AS [Collation],
    dtb.name AS [DatabaseName2]
    FROM
    master.sys.databases AS dtb
    WHERE
    (dtb.name=@_msparam_0)',N'@_msparam_0 nvarchar(4000)',@_msparam_0=N'msdb'    Microsoft SQL Server Management Studio    grahamk    EUROPE\grahamk    0    2    0    0    2912    52    2009-10-19 14:28:13.143    2009-10-19 14:28:13.143    0X00000000040000001A00730070005F006500780065006300750074006500730071006C002E01000082001A00E7206E0076006100720063006800610072002800310033003100290006010000530045004C004500430054000A006400740062002E006300    
    create table #tmp_sp_help_category
    (category_id int null, category_type tinyint null, name nvarchar(128) null)
    insert into #tmp_sp_help_category exec msdb.dbo.sp_help_category
    SELECT
    sv.name AS [Name],
    CAST(sv.enabled AS bit) AS [IsEnabled],
    tshc.name AS [Category],
    sv.category_id AS [CategoryID],
    tshc.category_type AS [CategoryType],
    null AS [CurrentRunStatus],
    null AS [CurrentRunStep],
    null AS [HasSchedule],
    null AS [HasStep],
    null AS [HasServer],
    null AS [LastRunDate],
    null AS [NextRunDate],
    null AS [LastRunOutcome],
    CAST(sv.job_id AS nvarchar(100)) AS [job_id]
    FROM
    msdb.dbo.sysjobs_view AS sv
    INNER JOIN #tmp_sp_help_category AS tshc ON sv.category_id = tshc.category_id
     
    drop table #tmp_sp_help_category
            

    and finally:

    exec msdb.dbo.sp_help_job

     

    These batches don’t appear to return any errors or warnings to profiler. The logical next step was to take a look at what this code does. Once you read through the various batches, you notice 2 types of batch. The first 5 batches are just required to get certain properties about the server instance and to populate properties of the tree view in SSMS Object Explorer. Whenever you profile SSMS you see lots of batches like this, which tend to precede the “real” work.

    The you see the final single batch, which executes msdb.dbo.sp_help_job. This is the real work that is required to populate the dialogue box that SSMS is opening. Since these batches are all just queries, and nothing is being updated, we can easily re-run them in SSMS query windows to see if anything else is thrown up, or see if weird results are returned that might explain our strange date time error.

    We tried this and it returned a wide record set of several hundred rows, which on initial glance was all well and good. At this stage, the pragmatists amongst you may feel it’s appropriate to just scan through this data and see if there are any glaring inconsistencies which can be noticed which might cause a DateTime related error. The truth is that there was such a piece of data, if you could be bothered to read all the data manually, but it was only a hypothesis. The stored procedure in question returns some columns that are related to dates and times in a true DateTime representation. However it also returns some columns as text representations of date parts. In this record set, these columns should always be populated with values, but in this case there were a number of rows which showed the value “0” in these columns. We therefore made the guess that this data was the problem. We removed this data and tried the repro again, and everything worked ok.

    We were lucky in this situation that the record set to review was only a few hundreds rows, and that the data corruption was obvious. On the basis that neither of these situations is always true, part 2 will show how to continue to troubleshoot this with debugging tools, to categorically prove what the problem was.

  • SQL in Stockholm

    SQL Server Database Mirroring Pre-requisites and guidelines

    • 0 Comments

    I noticed that we have posted a really good summary of the above points through the new fast publish system for KBs. (Note the disclaimer at the bottom which is not normally shown on full KBs). There are some really good points here to consider about setting up and configuring a mirroring environment.

    Things to consider when setting up database mirroring in SQL Server.

    In addition to this of course, you should remember my previous points made on this blog about Virtual Log File fragmentation which I recently saw in a production environment again.

    1413 error when starting Database Mirroring – How many virtual log files is too many?

    Slow recovery times and slow performance due to large numbers of Virtual Log Files

Page 1 of 1 (3 items)