SQL in Stockholm

SQL Server thoughts from the support team in Stockholm

SQL Server troubleshooting thoughts from Graham Kent, (ex) Senior Support Escalation Engineer at Microsoft Sweden.

  • SQL in Stockholm

    The end of the line

    • 0 Comments

    Today is my last day at Microsoft. I’m off to do something completely different, joining a small Scandinavian tech company called Basefarm.

    You can continue to find me online here:

    http://se.linkedin.com/in/globalgoat

    www.globalgoat.com

    Enjoy your SQL Servers ! It’s been a pleasure.

    Graham

  • SQL in Stockholm

    SQL Server blocking caused by database mirroring wait type DBMIRROR_DBM_EVENT

    • 0 Comments

    I was looking at a problem this morning about long blocking chains caused by the wait type DBMIRROR_DBM_EVENT, and it makes an interesting case study about database mirroring, blocking, undocumented wait types and general troubleshooting techniques for database mirroring. The information following relates ONLY to SQL 2005, SQL 2008 and SQL 2008 R2. However since the undocumented wait type is not supported, it is not guaranteed to be relevant to future versions higher than SQL 2008 R2 and considering the enhancements coming in this area of functionality, it may well be completely different.

    First off some thoughts about looking at root cause analysis of database mirroring problems. We see quite a few of these and often people do not have the appropriate data with which is make a full analysis. Typical examples being unexpected failovers, slow performance and sometimes blocking. Obviously the more data you have the easier, but some of key parts for looking at DBM problems historically are this:

    The perfmon counters in the SQLServer:Database Mirroring and SQLServer:Service Broker objects.

    Default profiler trace files (people often forget these but they have very useful DBM state change audit information in them)

    SQL Server Error Logs

    windows event logs

    backup of msdb database (this contains useful meta data such as the size of every log backup and the DBM monitor meta data if you've enabled it)

    memory dumps (if any were produced)

    Obviously if you've also proactively collected MPSReports or SQLDIAG as well, then this will be no bad thing. Back to my specific scenario.....

    In this example the symptom was that a huge blocking chain built on a high volume OLTP database where all the head blockers were waiting on DBMIRROR_DBM_EVENT. Here is the sequence of events I went through:

    1. Review the blocking chain itself - ho help here as all we can see is that we're waiting on DBMIRROR_DBM_EVENT

    2. Review the source for the undocumented wait type. Obviously you can't do this outside of MS, but I can say that at the time of writing this wait type represents the wait used when the principal is waiting for the mirror to harden an LSN, meaning that the transaction it's part of cannot commit. This immediately points quite specifically to the problem that the principal cannot commit transactions as it's waiting on the mirror. Now we need to investigate why the mirror is not committing transactions or why the principal doesn't know whether it is.

    3. Review the msdb system tables

    (a) Look at the [backupset] table to see if the size of the logs produced at the time of the problem are significantly higher then normal. If they were exceptionally large it may be that the mirror was flooded with transactions and could simply not keep up with the volume. This is why books online will tell you sometimes to disable mirroring if you need to do an exceptionally large logged operation such as an index rebuild. (reference for why this is at http://technet.microsoft.com/en-us/library/cc917681.aspx). Here i used the following TSQL

    SELECT backup_set_id,backup_start_date,database_name,has_bulk_logged_data,backup_size / 1000
    FROM [backupset]
    where backup_start_date between '2011-01-05 14:00:00' and '2011-01-05 19:30:00'
    go

    select round((AVG(backup_size)/1000),0)
    FROM [backupset]
    where database_name = 'mydatabase'

    (b) secondly I looked at the data in the tables [dbm_monitor_data]. The key here is to locate the timeframe in which we had a problem and then see if we were significant experiencing changes in any of the following:

    log_flush_rate
    send_queue_size
    send_rate
    redo_queue_size
    redo_rate

    These are all indicators similar to part (a) in that they might show a component or piece of architecture that wasn't responding. For example if the send_queue suddenly starts to grow but the re_do queue doesn't grow, then it would imply that the the principal cannot send the log records to the mirror so you'd want to look at connectivity maybe, or the service broker queues dealing with the actual transmissions.

    In this particular scenario we noted that all the counters appeared to have strange values, in that there were log backups going on of normal sizes, but there were no status changes, 0 send queue, 0 redo queue, a flat send rate and a flat redo rate. This is very strange as it implies that the DBM Monitor could not record any values from anywhere over the problem period.

    4. Review the SQL Server error logs. In this case there were no errors or information messages whatsoever, but in other scenarios such as this, it’s very common for errors in the 1400 range to be reported, examples of which you can find in other places in my other mirroring blogs, such as this Error 1413 example

    5. Review the default trace files – in this scenario I was not provided the default traces, however they are fantastic sources of DBM problem information, as they record state change events on all the partners.This is documented here:

    Database Mirroring State Change Event Class

    This often gives you a great picture of scenarios such as when network connectivity failed between one or all of the partners and then what the state of the partnership became afterwards.

    CONCLUSIONS:

    In this particular scenario I’m currently missing 2 key points of data, but that apart I can still make a reasonable hypothesis on the above information. We certainly can say that the blocking was caused by the fact that DBM was enabled to the due the blockers all waiting on the DBMIRROR_DBM_EVENT wait type. Since we know we didn’t flood the mirror with a large logged operation and that this deployment normally runs happily in this mode, we can exclude unusual large operations. This means that we have 2 potential candidates at this stage:

    1. Hardware problems on the connectivity between some or all of the partners.

    2. CPU exhaustion on the mirror server – simply unable to keep up with redos – the CPU exhaustion could itself be from a process outside of SQL Server or outside of the this mirror partnership.

    3. A problem with the mirroring code itself (we’d really need some memory dumps to confirm this though).

    Based upon experience I’d suspect 1 or 2, but I always keep an open mind about 3 as well, we’re trying to collect some more data now to look at this problem in more detail.

    As a final note though I noted some similar scenarios on the forums, and one particular one of note where the poster said that everything worked fine until they enabled mirroring, at which point the system ground to a halt with this wait type. This is far more likely to be a non-optimal configuration where some section of the architecture can simply not keep up with the volume of requests required by the partnership, and is a perfect example of why one must benchmark mirror deployments very carefully. In that scenario you should review this KB first : http://support.microsoft.com/kb/2001270

    Graham

  • SQL in Stockholm

    Compatibility problems between DMO and SMO connectivity on newer versions of SQL Server

    • 1 Comments

    We've recently been looking at a problem which triggered this error:

    [Microsoft][ODBC SQL Server Driver][SQL Server]To connect to this server you must use SQL Server Management Studio or SQL Server Management Objects (SMO)

    In our case it was actually being thrown by the SQL Server Web Data Administrator which is available on codeplex, but it could just as well be any application that you've written yourselves. In this case the application in question was running on an old Windows 2003 server and had been working fine for a long time, when connecting to old SQL 2000 instances. However when they started trying to connect to later versions of SQL they were getting the above error within the following stack:

    at SqlAdmin.ISqlServer.Connect(Object ServerName, Object Login, Object Password)
    at SqlAdmin.SqlServer.Connect()
    at SqlWebAdmin.databases.Page_Load(Object sender, EventArgs e)
    at System.Web.UI.Control.OnLoad(EventArgs e)
    at System.Web.UI.Control.LoadRecursive()
    at System.Web.UI.Page.ProcessRequestMain()

    Being as you can check out the source of this application I did check it out, but excluded this as a problem fairly early on. We came back to focus on the SQLDMO.dll itself and via some rigorous structured testing in the lab we managed to reproduce the exact error and provide an appropriate solution. (If you wonder what I mean by rigorous structured troubleshooting, I mean that I build a windows 2003 virtual machine in a lab and then applied updates to it sequentially checkpointing everytime something changed, and testing various hypotheses that I had on each checkpoint. When you're working with DLL versioning problems I find this to be a very effective - and in my opinion necessary methodology).

    Anyway the customer web server in question was runing the following version of SQLDMO.DLL on the web server

    Module[945] [C:\PROGRAM FILES\WINDOWS\SYSTEM32\SQLDMO.DLL]
      Company Name:      Microsoft Corporation
      File Description:  Distributed Management Objects OLE DLL for SQL Enterprise Manager
      Product Version:   (8.0:760.0)
      File Version:      (2000.80:760.0)
      File Size (bytes): 4215360
      File Date:         ti joulu 17 19:25:22 2002
        Module TimeDateStamp = 0x3dffbfbc - Wed Dec 18 02:22:20 2002
        Module Checksum      = 0x00411b8a
        Module SizeOfImage   = 0x00405000
      Module Pointer to PDB = [dll\sqldmo.pdb]
        Module PDB Signature = 0x3dff938b
        Module PDB Age = 0x9e

    (I was running MPSReports during my tests to extract and save exact configurations). This is a build from SQL Server 2000 SP3a, a fairly recognisable version number for those of us who have been around SQL for a few years!

    To cut a long story short all SQL 2000 versions of this DLL are incompatible with higher major versions of SQL Server, and you need to patch this DLL. There are a number of articles about installing SQLDMO on the web, but I didn;t find any of them to be particularly 100% reliable for all circumstances, which is why I tested it so thoroughly. I wanted to be sure that I could patch this DLL and only this DLL without changing anything else on a production machine. (For example it was hypothesized that installing SQL2005 or higher would resolve the problem - this in fact turned out to be true, but we didn;t want to do this. I also didn't want to be manually registering DLLs.)

    We instead used the backward compatibility MSI from SQL 2005, which can be downloaded and run indepently here:

    Feature Pack for Microsoft SQL Server 2005 - November 2005

    When running this we removed all the features apart from SQLDMO, and this ensured that we correctly upgraded SQLDMO.DLL (and its related objects such as SQLDMO.DLL) in the SQL 2000 directory here

    C:\Program Files\Microsoft SQL Server\80\Tools\Binn

    This upgrades SQLDMO.DLL to a version of this format

    8.05.2312

    The 05 being the key identifier here. Everything will then work fine.

    The main point being here that one can install and run a version of DMO which will be compatible with all major versions of SQL higher than 2000, and you can do it in a controlled and easy manner, without resorting to the manual registrations listed in older KBs like these

    http://support.microsoft.com/kb/326613
    http://support.microsoft.com/kb/248241
    http://msdn.microsoft.com/en-us/library/aa274817(v=sql.80).aspx

    Hope this helps.

    Graham

     

  • SQL in Stockholm

    Extended Event to track and trace syntax errors

    • 0 Comments

    This might sounds like a strange requirement, but it can be converted to do a number of different things. The scenario is that for reason I won't discuss here, I want to track any user / application which issues an SQL batch which contains invalid syntax. The value here is that this would be extremely difficult to do on this system with tools like profiler due to the large volumes going through the server. This XEVENT will happily sit and monitor all batches for the error codes 102, 447 and 448 which are various derivatives of invalid syntax or invalid language usage. The output is written to a binary file which can then be queried using standard TSQL / XQuery. Alternatively if you don't like the scripting part of this you could look to use the Extended Events SSMS Add in from code plex.

    http://extendedeventmanager.codeplex.com/

    and here is the piece of code

    -- XEvent Session looking for errors 447,448 and 102
    -- log examples to files so can be correlated back to netmon traces

    if exists(select * from sys.server_event_sessions where name='SharePoint_Errors')
         DROP EVENT SESSION SharePoint_Errors ON SERVER
    go

    CREATE EVENT SESSION SharePoint_Errors ON SERVER
    ADD EVENT sqlserver.error_reported
    (
    ACTION
    (
    sqlserver.session_id,
    sqlserver.tsql_stack,
    sqlserver.sql_text,
    sqlserver.client_hostname,
    sqlserver.client_pid
    )
    WHERE error = 102
    OR error = 447
    OR error = 448
    )
    ADD TARGET package0.asynchronous_file_target(
    SET filename='c:\temp\SharePoint_Errors.xet', metadatafile='c:\temp\SharePoint_Errors.xem')
    WITH (MAX_MEMORY = 4096KB,
    EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 1 SECONDS,
    MAX_EVENT_SIZE = 0KB,
    MEMORY_PARTITION_MODE = NONE,
    TRACK_CAUSALITY = OFF,
    STARTUP_STATE = OFF)

    go

    ALTER EVENT SESSION SharePoint_Errors ON SERVER
    state=start
    go

    --stop the session if necessary
    --ALTER EVENT SESSION SharePoint_Errors ON SERVER
    --state=stop

    -- Cleanup the session
    --drop EVENT SESSION SharePoint_Errors ON SERVER

    ********
    and here is the sample script to query the output files
    ********

    --get values from XET binary files

    --straight select, view the XML manually
    select CAST(event_data as XML) , *
    from sys.fn_xe_file_target_read_file
    (
    'C:\temp\SharePoint_Errors_0_129291097895820000.xet',
    'C:\temp\SharePoint_Errors_0_129291097896520000.xem',
    null,
    null
    )

    --details

    SELECT --*
    xdata.value('(/event/data[3]/@name)[1]', 'varchar(100)') AS [name],
    xdata.value('(/event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [name]

    --insert other XML nodes from the output as required here
    --you can then add condition checks on their values as well
    FROM
     (
      select CAST(event_data as XML)
      from sys.fn_xe_file_target_read_file
      (
      'C:\temp\SharePoint_Errors_0_129291097895820000.xet',
      'C:\temp\SharePoint_Errors_0_129291097896520000.xem',
      null,
      null
      )
     ) as xmlr(xdata)

     
    ********* 
    remember to change the file unique identifiers to those of your output files
    *********

  • SQL in Stockholm

    SQL Server performance problems due to high privileged CPU time when running Windows Server 2008 R2

    • 1 Comments

    It's very rare that I tell people to blindy apply service packs or hotfixes (especially windows ones) without first analyzing where their performance problem lies, but if you are running SQL Server 2008 on Windows Server 2008 R2 (specifically this OS only) then you should absolutely ensure that you've installed the following fix:

    976700  An application stops responding, experiences low performance, or experiences high privileged CPU usage if many large I/O operations are performed in Windows 7 or Windows Server 2008 R2
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;976700

    In this scenario, the key symptom which you normally ought to notice is that you are getting high kernel times, which can you see in the followiing perfmon counter:

    Process(sqlservr)\% Privileged Time

    We've seen a trend of this issue in recent months, so it's something to look to apply in your next maintenance window if you can.

     

  • SQL in Stockholm

    The Forefront TMG/UAG team in Stockholm is hiring

    • 0 Comments

    Following the success of my previous post about recruitment, we're advertising another position from a different support group also located here in Stockholm. You can read the details here

    http://blogs.technet.com/b/isablog/archive/2010/09/15/forefront-tmg-uag-help-wanted-at-microsoft-in-stockholm.aspx

    ....and you might notice the common language to my previous post :-)

    Please don't contact me about this one though, ensure you email the contact listed in the new article.

     

     

  • SQL in Stockholm

    The SQL Server Support Team in Stockholm is hiring

    • 0 Comments

    Update 23th August

    This position has now been filled - thanks for everything who submitted their CVs.

    The Microsoft SQL Server support team in Stockholm is looking to hire a contractor to help out over the next year or so, to cover some staff absences. If you’d be interested to come and work with us and learn more about SQL Server internals than you probably would anywhere else in the world (apart form Redmond maybe), then this might be the contract for you.

    Although you’ll get some training and access to innumerable amounts of internal only documentation, we’re looking for someone very experienced who can hit the ground running and could be expected to work the majority of our easier support cases without too much assistance. Therefore you’d need a minimum of 5 years experience (preferably more) working with SQL Server day to day in an administrative capacity (probably as a production DBA). Specialities of particular interest to us would be:

    Setup / configuration / migration

    Administration of HA solutions (clusters, DBM, replication etc)

    Performance

    Crashes, Hangs and Unexpected behaviour

    Beyond that if you have experience of other Microsoft server products (Windows, SharePoint, BizTalk, SCCM, Navision), windbg, c# or c++, it won’t do you any harm.

    If you think you’d be interested, then please send a brief summary (it’ll come out in plain text so keep it simple) of your experience via the “email blog author” section (to reduce my spam levels!). If we’re interested then we’ll get back to you with proper email details so you can send in a full CV and talk to our agency. You need to be resident in Sweden (Swedish language or other Nordic languages would be of use, but are not essential) and available to work full time in Kista / Akalla from some point around August.

  • SQL in Stockholm

    Script to run in SQL Server Agent to monitor DMVs, log results and take action depending upon results

    • 0 Comments

    Here’s a script to play with which is useful for simple management tasks. I wrote it to help someone with a TokenAndPermUserStore cache issue (http://support.microsoft.com/kb/927396/) which are relatively uncommon nowadays. I actually sent it to someone today which is why it came to mind, but it occurred to me that the template of the script itself is quite useful for any such TSQL SQL Agent based task, i.e. it does these tasks:

    Runs a DMV query

    Checks results of query and if appropriate takes an action

    Writes what it did to the SQL error log as an info message

    I use the theory for many different tasks. Feel free to use and alter it as you will, subject to the usual legal caveats of course!

  • SQL in Stockholm

    Access Denied error when running SQL Server Transfer Objects Task with a Proxy Account

    • 0 Comments

    Should you try to configure an SSIS package which uses the Transfer Objects Task, and then execute this package through SQL Agent, when you choose to run the task as a proxy credential, as opposed to the normal SQL Agent user, you may encounter a failure. The typical full text of the error message from the SQL Agent history is shown here.

    Executed as user: GRAHAMK20\SSIS_Proxy. Microsoft (R) SQL Server Execute Package Utility  Version 10.0.2531.0 for 64-bit  Copyright (C) Microsoft Corp 1984-2005. All rights reserved.    Started:  4:55:19 PM  Error: 2010-02-04 16:55:19.65     Code: 0xC0016016     Source:       Description: Failed to decrypt protected XML node "PassWord" with error 0x80070002 "The system cannot find the file specified.". You may not be authorized to access this information. This error occurs when there is a cryptographic error. Verify that the correct key is available.  End Error  Error: 2010-02-04 16:55:19.67     Code: 0xC0016016     Source:       Description: Failed to decrypt protected XML node "PassWord" with error 0x80070002 "The system cannot find the file specified.". You may not be authorized to access this information. This error occurs when there is a cryptographic error. Verify that the correct key is available.  End Error  Error: 2010-02-04 16:55:20.00     Code: 0xC002F325     Source: Transfer SQL Server Objects Task Transfer SQL Server Objects Task     Description: Execution failed with the following error: "Access to the path 'C:\Users\Default\AppData\Local\Microsoft\SQL Server\Smo' is denied.".  End Error  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  4:55:19 PM  Finished: 4:55:20 PM  Elapsed:  0.562 seconds.  The package execution failed.  The step failed.

    Depending on how you view this message, you could decide it as being a number of different causes, but the actual important point to note is the red highlighted section. This is the lowest level error and the actual cause of the failure, the other errors are thrown further up the stack. You can confirm this by running Process Monitor on the server where the package is executing. You’ll see failures here

    - C:\Users\Default\AppData\Local\Microsoft\SQL Server (create file)
    - C:\Users\Default\AppData\Local\Microsoft\SQL Server\Smo (create file)

    However if you run the same package, but in the context of the SQL Agent account, you will notice that the same create file runs on a different directory:

    - C:\Users\<sqlagentuserid>\AppData\Local\Microsoft\SQL Server\Smo

    The package requires a temporary location to process some files when it prepared to run and there is a bug in the way the proxy account credential decides where it should place these files. This has been passed to the product group and should be fixed in SQL 11.

    The reason that it is not being fixed immediately is that there is a relatively simple workaround which is to manually create the directory structure in the default user profile directory, and grant write modify permissions to the user or group which represents the proxy credential account. So in my specific example I logged on with an adminstrator account to the server, created the following directory

    C:\Users\Default\AppData\Local\Microsoft\SQL Server\Smo

    and then granted the users group (of which my proxy credential account is a member of) the following permissions:

    Modify, Write, Read & Execute

    The package will then execute successfully under the proxy credential. This problem occurs on both SQL 2005 and SQL 2008.

  • SQL in Stockholm

    Troubleshooting SQL Server Management Studio with SQL Profiler and Debugging Tools – Part 3

    • 0 Comments

    (An alternate title for this post could be “How to debug SSMS managed code”)

    In the first 2 posts of this topic we used SQL Profiler and some common sense to diagnose a problem within SSMS. Then we used DebugDiag to collect a memory dump when a specific exception was thrown. Now in the 3rd part we’ll analyze that dump used windbg and public symbols from the Microsoft symbols server.

    This post assumes a certain familiarity with managed debugging using windbg. If you need some background on this topic, a good place to start is by reading the blog of Tess.

    To understand the context of this debugging, you should really read the first 2 posts in the series, but as a brief reminder, we collect a dump of the SSMS process (ssms.exe in SQL 2008 and sqlwb.exe in SQL 2005) when a specific condition occurs – in this case an exception thrown when trying to view SQL Agent job history. The dump below is from a sqlwb.exe 2005 client, but the theory is identical. The main point which this post demonstrates is that unlike when debugging the database engine process of SQL Server (sqlservr.exe), using purely public symbols , interesting information can be gleaned from the dump because most of the interesting structures are in managed code which can be interpretted with the windbg extension sos.dll.

    My base setup for this debugging:

    A blank hyper-V machine running Windows 7 Ultimate, connected to the internet

    A default installation of debugging tools for windows downloaded from here:

    http://www.microsoft.com/whdc/devtools/debugging/installx86.Mspx

    A symbol path set exactly as per the example “getting started” page here:

    http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

    i.e. the symbol path used is:

    SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

    So here’s the debugging output with notes:

    * First off we load the public sos.dll so we can do managed debugging

     

    0:018> .load c:\windows\microsoft.net\framework\v2.0.50727\sos.dll

     

    * We’ve opened directly to the thread where the exception occurred

    * let’s look at the CLRSTACK

     

    0:018> !clrstack

    OS Thread Id: 0x6ac (18)

    ESP       EIP    

    0822ef3c 77e4bef7 [HelperMethodFrame: 0822ef3c]

    0822efe0 799d766a System.DateTime.TimeToTicks(Int32, Int32, Int32)

    0822eff0 2ec8e92c Microsoft.SqlServer.Management.Smo.PostProcessCreateDateTime.GetDateTime(System.Object, System.Object)

    0822f010 2ec91584 Microsoft.SqlServer.Management.Smo.PostProcessJobActivity.GetColumnData(System.String, System.Object, Microsoft.SqlServer.Management.Smo.DataProvider)

    0822f064 2ec86ced Microsoft.SqlServer.Management.Smo.DataProvider.ManipulateRowDataPostProcess()

    0822f07c 2ec872b9 Microsoft.SqlServer.Management.Smo.DataProvider.GetTable()

    0822f08c 2ec84436 Microsoft.SqlServer.Management.Smo.ExecuteSql.ExecuteWithResults(System.Collections.Specialized.StringCollection, System.Object, Microsoft.SqlServer.Management.Smo.StatementBuilder)

    0822f098 2ec804db Microsoft.SqlServer.Management.Smo.SqlObjectBase.FillData(Microsoft.SqlServer.Management.Smo.ResultType, System.Collections.Specialized.StringCollection, System.Object, Microsoft.SqlServer.Management.Smo.StatementBuilder)

    0822f0b8 2ec803c4 Microsoft.SqlServer.Management.Smo.SqlObjectBase.FillDataWithUseFailure(Microsoft.SqlServer.Management.Smo.SqlEnumResult, Microsoft.SqlServer.Management.Smo.ResultType)

    0822f0f8 2ec80339 Microsoft.SqlServer.Management.Smo.SqlObjectBase.BuildResult(Microsoft.SqlServer.Management.Smo.EnumResult)

    0822f138 2ec7fea3 Microsoft.SqlServer.Management.Smo.SqlObjectBase.GetData(Microsoft.SqlServer.Management.Smo.EnumResult)

    0822f148 2eb5daf8 Microsoft.SqlServer.Management.Smo.Environment.GetData()

    0822f178 2eb5da5c Microsoft.SqlServer.Management.Smo.Environment.GetData(Microsoft.SqlServer.Management.Smo.Request, System.Object)

    0822f18c 2eb5caef Microsoft.SqlServer.Management.Smo.Enumerator.GetData(System.Object, Microsoft.SqlServer.Management.Smo.Request)

    0822f1b0 2eb5ccd6 Microsoft.SqlServer.Management.Smo.Enumerator.Process(System.Object, Microsoft.SqlServer.Management.Smo.Request)

    0822f1e4 30983b38 Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher.FetchJobs(System.String)

    0822f200 3098349a Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher.FetchJobs(Microsoft.SqlServer.Management.SqlManagerUI.JobActivityFilter)

    0822f214 3098230e Microsoft.SqlServer.Management.SqlManagerUI.JobsPanel.RefreshJobs()

    0822f240 3097f028 Microsoft.SqlServer.Management.SqlManagerUI.JobActivityMonitor.Microsoft.SqlServer.Management.UI.ActivityMonitor.IActivityMonitor.RefreshSnapshot()

    0822f244 053f08f7 Microsoft.SqlServer.Management.UI.ActivityMonitor.ActivityMonitorUserControl.WorkerEntryPoint(System.Object)

    0822f280 792c9e4f System.Threading._ThreadPoolWaitCallback.WaitCallback_Context(System.Object)

    0822f288 792f5681 System.Threading.ExecutionContext.runTryCode(System.Object)

    0822f6b8 79e71b4c [HelperMethodFrame_PROTECTOBJ: 0822f6b8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)

    0822f720 792f5577 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)

    0822f73c 792e01c5 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)

    0822f754 792ca3b3 System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal(System.Threading._ThreadPoolWaitCallback)

    0822f768 792ca249 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(System.Object)

    0822f8f8 79e71b4c [GCFrame: 0822f8f8]

     

    * this shows us some fairly expected output that we have a stack which indicates we are using SMO to populate some data to do with SQL Agent jobs (Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher.FetchJobs)

     

    * let’s look at the exception to check it’s what we wanted

    * it shows the details of the exception which SSMS reported (Hour, Minute, and Second parameters describe an un-representable DateTime)

     

    0:018> !pe

    Exception object: 016588b4

    Exception type: System.ArgumentOutOfRangeException

    Message: Hour, Minute, and Second parameters describe an un-representable DateTime.

    InnerException: <none>

    StackTrace (generated):

    <none>

    StackTraceString: <none>

    HResult: 80131502

     

    * now we want to look at what these functions were actually doing

    * dump all objects on the stack to look back through recent history (most recent is at the top)

    * scanning down here to look for anything of interest

    * we see that we have a System.DBNull sitting next to a System.String named NextRunDate – this might be of interest

    * in addition to that we can see the server connection object which will contain the details of the connection and also the actual TSQL executed

     

    0:018> !dso

    OS Thread Id: 0x6ac (18)

    ESP/REG  Object   Name

    0822eed0 016588b4 System.ArgumentOutOfRangeException

    0822ef1c 016588b4 System.ArgumentOutOfRangeException

    0822ef60 016588b4 System.ArgumentOutOfRangeException

    0822efbc 0156b85c System.String    ArgumentOutOfRange_BadHourMinuteSecond

    0822efc4 016588b4 System.ArgumentOutOfRangeException

    0822efe0 016588b4 System.ArgumentOutOfRangeException

    0822eff8 015cd46c Microsoft.SqlServer.Management.Smo.PostProcessJobActivity

    0822effc 0165889c System.Int32

    0822f00c 016588a8 System.Int32

    0822f010 01658880 System.Data.RBTree`1+RBTreeEnumerator[[System.Data.DataRow, System.Data]]

    0822f014 01603e84 System.Data.DataRow

    0822f018 015cc064 System.String    NextRunDate

    0822f01c 015cd46c Microsoft.SqlServer.Management.Smo.PostProcessJobActivity

    0822f044 015d34f0 System.Data.DataRow

    0822f048 015cd46c Microsoft.SqlServer.Management.Smo.PostProcessJobActivity

    0822f04c 015cdf84 Microsoft.SqlServer.Management.Smo.DataProvider

    0822f05c 015cdf84 Microsoft.SqlServer.Management.Smo.DataProvider

    0822f060 011ee95c System.DBNull

    0822f064 015cc064 System.String    NextRunDate

    0822f068 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f06c 015cdf84 Microsoft.SqlServer.Management.Smo.DataProvider

    0822f094 015ccd18 Microsoft.SqlServer.Management.Smo.StatementBuilder

    0822f0b0 01594dbc Microsoft.SqlServer.Management.Common.ServerConnection

    0822f0c0 015cd964 System.Collections.Specialized.StringCollection

    0822f0c4 015cca78 Microsoft.SqlServer.Management.Smo.SqlEnumResult

    0822f0c8 015c8a00 Microsoft.SqlServer.Management.Smo.SqlObject

    0822f0e0 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f124 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f148 015cc7bc System.Collections.ArrayList+ArrayListEnumeratorSimple

    0822f14c 015cca78 Microsoft.SqlServer.Management.Smo.SqlEnumResult

    0822f164 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f168 015c462c Microsoft.SqlServer.Management.Smo.Environment

    0822f188 01594dbc Microsoft.SqlServer.Management.Common.ServerConnection

    0822f194 01594dbc Microsoft.SqlServer.Management.Common.ServerConnection

    0822f19c 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f1cc 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f1e0 015c4590 Microsoft.SqlServer.Management.Smo.Request

    0822f1e4 0150cb08 System.String    /Job

    0822f1e8 011d1198 System.String   

    0822f1ec 0158101c Microsoft.SqlServer.Management.SqlManagerUI.JobActivityFilter

    0822f1f0 01597a6c Microsoft.SqlServer.Management.SqlManagerUI.JobFetcher

    0822f204 01591e78 Microsoft.SqlServer.Management.SqlManagerUI.JobsPanel

    0822f214 01591e78 Microsoft.SqlServer.Management.SqlManagerUI.JobsPanel

    0822f21c 01217510 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode

    0822f224 014f9c50 System.Threading.ExecutionContext

    0822f250 015a5174 Microsoft.SqlServer.Management.UI.ActivityMonitor.ActivityMonitorUserControl+BackgroundRefreshJobDescription

    0822f254 01581054 Microsoft.SqlServer.Management.UI.ActivityMonitor.ActivityMonitorUserControl

    0822f344 012174f0 System.Runtime.CompilerServices.RuntimeHelpers+TryCode

    0822f410 01217510 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode

    0822f4b8 012174f0 System.Runtime.CompilerServices.RuntimeHelpers+TryCode

    0822f638 01217510 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode

    0822f6f4 012174f0 System.Runtime.CompilerServices.RuntimeHelpers+TryCode

    0822f6f8 01217510 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode

    0822f6fc 015a561c System.Threading.ExecutionContext+ExecutionContextRunData

    0822f724 011f4ecc System.Threading.ContextCallback

    0822f72c 014f9c50 System.Threading.ExecutionContext

    0822f738 015a51cc System.Threading._ThreadPoolWaitCallback

    0822f750 015a51cc System.Threading._ThreadPoolWaitCallback

     

    * dump the server connection object – this is always a useful object to be able to locate and look at, because within it you can see all manner of properties about the connection and the batch executing

     

    0:018> !do 01594dbc

    Name: Microsoft.SqlServer.Management.Common.ServerConnection

    MethodTable: 2e2bda28

    EEClass: 2e283d48

    Size: 144(0x90) bytes

     (C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SqlServer.ConnectionInfo\9.0.242.0__89845dcd8080cc91\Microsoft.SqlServer.ConnectionInfo.dll)

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    793044cc  4000036       40       System.Boolean  1 instance        1 m_BlockUpdates

    79330a00  4000037        4        System.String  0 instance 01594eac m_ServerInstance

    79330a00  4000038        8        System.String  0 instance 011d1198 m_Login

    7995d420  4000039        c ...rity.SecureString  0 instance 00000000 m_Password

    793044cc  400003a       41       System.Boolean  1 instance        1 m_LoginSecure

    79330a00  400003b       10        System.String  0 instance 011d1198 m_ConnectAsUserName

    79330a00  400003c       14        System.String  0 instance 011d1198 m_ConnectAsUserPassword

    793044cc  400003d       42       System.Boolean  1 instance        0 m_ConnectAsUser

    793044cc  400003e       43       System.Boolean  1 instance        1 m_NonPooledConnection

    79332c4c  400003f       28         System.Int32  1 instance        0 m_PooledConnectionLifetime

    79332c4c  4000040       2c         System.Int32  1 instance        0 m_MinPoolSize

    79332c4c  4000041       30         System.Int32  1 instance      100 m_MaxPoolSize

    79332c4c  4000042       34         System.Int32  1 instance       15 m_ConnectTimeout

    2e2bc5b4  4000043       38         System.Int32  1 instance        8 m_NetworkProtocol

    79330a00  4000044       18        System.String  0 instance 01594e4c m_ApplicationName

    79330a00  4000045       1c        System.String  0 instance 011d1198 m_WorkstationId

    79330a00  4000046       20        System.String  0 instance 011d1198 m_DatabaseName

    79332c4c  4000047       3c         System.Int32  1 instance     4096 m_PacketSize

    7995d420  4000048       24 ...rity.SecureString  0 instance 00000000 m_ConnectionString

    793044cc  4000049       44       System.Boolean  1 instance        0 m_MultipleActiveResultSets

    793044cc  400004a       45       System.Boolean  1 instance        0 shouldEncryptConnection

    6523d7a0  400004b       48 ...ent.SqlConnection  0 instance 01594f10 m_SqlConnectionObject

    793044cc  400004c       46       System.Boolean  1 instance        1 m_InUse

    79332c4c  400004d       68         System.Int32  1 instance        0 m_LoginFailedClients

    2e2bc50c  400004e       4c ...ssageEventHandler  0 instance 00000000 m_RemoteLoginFailedHandler

    656629a8  400004f       50 ...ssageEventHandler  0 instance 00000000 m_SqlInfoMessageHandler

    2e2bd89c  4000050       54 ...ommon.CapturedSql  0 instance 01594ee0 m_CapuredSQL

    2e2bc8d4  4000051       6c         System.Int32  1 instance        1 m_AutoDisconnectMode

    793044cc  4000052       47       System.Boolean  1 instance        0 bIsUserConnected

    2e2be0b0  4000053       58 ...mon.ServerVersion  0 instance 015c172c m_serverVersion

    79332c4c  4000054       70         System.Int32  1 instance       10 lockTimeout

    2e2bc50c  4000055       5c ...ssageEventHandler  0 instance 00000000 ServerMessageInternal

    2e2bc480  4000056       60 ...ementEventHandler  0 instance 00000000 StatementExecuted

    2e2bc50c  4000057       64 ...ssageEventHandler  0 instance 00000000 RemoteLoginFailedInternal

    79332c4c  400005b       80         System.Int32  1 instance        0 m_StatementTimeout

    79330a00  400005c       74        System.String  0 instance 01428dcc m_BatchSeparator

    79332c4c  400005d       84         System.Int32  1 instance        0 m_TransactionDepth

    2e2bc6f4  400005e       88         System.Int32  1 instance        1 m_ExecutionMode

    6523db4c  400005f       78 ...Client.SqlCommand  0 instance 01594f48 m_SqlCommand

    79330a00  4000060       7c        System.String  0 instance 015c1a70 m_TrueName

     

    *dump the SqlCommand object

     

    0:018> !do 01594f48

    Name: System.Data.SqlClient.SqlCommand

    MethodTable: 6523db4c

    EEClass: 6515c250

    Size: 132(0x84) bytes

     (C:\WINDOWS\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    7933061c  400018a        4        System.Object  0 instance 00000000 __identity

    7a5d2a58  40008c3        8 ...ponentModel.ISite  0 instance 00000000 site

    7a5ecd2c  40008c4        c ....EventHandlerList  0 instance 00000000 events

    7933061c  40008c2      108        System.Object  0   static 01201278 EventDisposed

    79332c4c  40016e3       58         System.Int32  1 instance       58 ObjectID

    79330a00  40016e4       10        System.String  0 instance 014e3850 _commandText

    6523e698  40016e5       5c         System.Int32  1 instance        0 _commandType

    79332c4c  40016e6       60         System.Int32  1 instance        0 _commandTimeout

    6565989c  40016e7       64         System.Int32  1 instance        3 _updatedRowSource

    793044cc  40016e8       78       System.Boolean  1 instance        0 _designTimeInvisible

    6567ecdc  40016e9       14 ...ent.SqlDependency  0 instance 00000000 _sqlDep

    793044cc  40016ea       79       System.Boolean  1 instance        0 _inPrepare

    79332c4c  40016eb       68         System.Int32  1 instance       -1 _prepareHandle

    793044cc  40016ec       7a       System.Boolean  1 instance        0 _hiddenPrepare

    6523e8b4  40016ed       18 ...rameterCollection  0 instance 00000000 _parameters

    6523d7a0  40016ee       1c ...ent.SqlConnection  0 instance 01594f10 _activeConnection

    793044cc  40016ef       7b       System.Boolean  1 instance        0 _dirty

    656616a4  40016f0       6c         System.Int32  1 instance        0 _execType

    793041d0  40016f1       20      System.Object[]  0 instance 00000000 _rpcArrayOf1

    6524233c  40016f2       24 ...t._SqlMetaDataSet  0 instance 015f9100 _cachedMetaData

    65241e94  40016f3       28 ...+CachedAsyncState  0 instance 015c19fc _cachedAsyncState

    79332c4c  40016f4       70         System.Int32  1 instance       -1 _rowsAffected

    6567d4b8  40016f5       2c ...tificationRequest  0 instance 00000000 _notification

    793044cc  40016f6       7c       System.Boolean  1 instance        1 _notificationAutoEnlist

    6567da8c  40016f7       30 ...nt.SqlTransaction  0 instance 00000000 _transaction

    65659328  40016f8       34 ...letedEventHandler  0 instance 00000000 _statementCompletedEventHandler

    65241814  40016f9       38 ...ParserStateObject  0 instance 00000000 _stateObj

    793044cc  40016fa       7d       System.Boolean  1 instance        0 _pendingCancel

    793044cc  40016fb       7e       System.Boolean  1 instance        0 _batchRPCMode

    00000000  40016fc       3c                       0 instance 00000000 _RPCList

    793041d0  40016fd       40      System.Object[]  0 instance 00000000 _SqlRPCBatchArray

    00000000  40016fe       44                       0 instance 00000000 _parameterCollectionList

    79332c4c  40016ff       74         System.Int32  1 instance        0 _currentlyExecutingBatch

    656732b0  4001700       48 ...miRequestExecutor  0 instance 00000000 _smiRequest

    65672df8  4001701       4c ...Server.SmiContext  0 instance 00000000 _smiRequestContext

    6567e144  4001702       50 ...+CommandEventSink  0 instance 00000000 _smiEventSink

    65672f4c  4001703       54 ...DeferedProcessing  0 instance 00000000 _outParamEventSink

    79332c4c  40016e2      878         System.Int32  1   static       59 _objectTypeCount

    793041d0  4001704      78c      System.Object[]  0   static 01427968 PreKatmaiProcParamsNames

    793041d0  4001705      790      System.Object[]  0   static 014279b4 KatmaiProcParamsNames

     

    * dump the command text

     

    0:018> !do 014e3850

    Name: System.String

    MethodTable: 79330a00

    EEClass: 790ed64c

    Size: 68(0x44) bytes

     (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)

    String: exec msdb.dbo.sp_help_job

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    79332c4c  4000096        4         System.Int32  1 instance       26 m_arrayLength

    79332c4c  4000097        8         System.Int32  1 instance       25 m_stringLength

    793316e0  4000098        c          System.Char  1 instance       65 m_firstChar

    79330a00  4000099       10        System.String  0   shared   static Empty

        >> Domain:Value  001831e0:011d1198 <<

    79331630  400009a       14        System.Char[]  0   shared   static WhitespaceChars

        >> Domain:Value  001831e0:011d17d4 <<

     

    * and we have now managed to get the actual TSQL batch executed by SSMS which caused the exception to happen

    * this would be enough information to then leave the debugging and go and investigate this from another direction.

    * this is effectively what we did in post of the series, by tracing the command through SQL Profiler

     

    So we've seen that we've managed to get the code which is executing, and we've also noted that there is a System.DBNull on the stack next to the NextRunDate object. The point here really though is that because we're dealing with the managed code within SSMS alot more is exposed to us using a memory dump. In this case you could continue the debugging to look at the actual data table which is returned by the stored procedure, and then read through it see which row of the data table contains the Null, which then leads you right back to the source data corruption.

    The thing that should be really occurring to you now though is that you can get all others sorts of information out , from this simple example, or from any dump which you might trigger of SSMS, if you ever get unexpected behavior.

    For example, imagine the problem you had captured a dump for was more complex involving connections to multiple servers, you can review some of the other structures in the Connection object.

    * dump the connection object again to remind us of its contents

     

    0:018> !do 01594dbc

    Name: Microsoft.SqlServer.Management.Common.ServerConnection

    MethodTable: 2e2bda28

    EEClass: 2e283d48

    Size: 144(0x90) bytes

     (C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SqlServer.ConnectionInfo\9.0.242.0__89845dcd8080cc91\Microsoft.SqlServer.ConnectionInfo.dll)

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    793044cc  4000036       40       System.Boolean  1 instance        1 m_BlockUpdates

    79330a00  4000037        4        System.String  0 instance 01594eac m_ServerInstance

    79330a00  4000038        8        System.String  0 instance 011d1198 m_Login

    7995d420  4000039        c ...rity.SecureString  0 instance 00000000 m_Password

    793044cc  400003a       41       System.Boolean  1 instance        1 m_LoginSecure

    79330a00  400003b       10        System.String  0 instance 011d1198 m_ConnectAsUserName

    79330a00  400003c       14        System.String  0 instance 011d1198 m_ConnectAsUserPassword

    793044cc  400003d       42       System.Boolean  1 instance        0 m_ConnectAsUser

    793044cc  400003e       43       System.Boolean  1 instance        1 m_NonPooledConnection

    79332c4c  400003f       28         System.Int32  1 instance        0 m_PooledConnectionLifetime

    79332c4c  4000040       2c         System.Int32  1 instance        0 m_MinPoolSize

    79332c4c  4000041       30         System.Int32  1 instance      100 m_MaxPoolSize

    79332c4c  4000042       34         System.Int32  1 instance       15 m_ConnectTimeout

    2e2bc5b4  4000043       38         System.Int32  1 instance        8 m_NetworkProtocol

    79330a00  4000044       18        System.String  0 instance 01594e4c m_ApplicationName

    79330a00  4000045       1c        System.String  0 instance 011d1198 m_WorkstationId

    79330a00  4000046       20        System.String  0 instance 011d1198 m_DatabaseName

    79332c4c  4000047       3c         System.Int32  1 instance     4096 m_PacketSize

    7995d420  4000048       24 ...rity.SecureString  0 instance 00000000 m_ConnectionString

    793044cc  4000049       44       System.Boolean  1 instance        0 m_MultipleActiveResultSets

    793044cc  400004a       45       System.Boolean  1 instance        0 shouldEncryptConnection

    6523d7a0  400004b       48 ...ent.SqlConnection  0 instance 01594f10 m_SqlConnectionObject

    793044cc  400004c       46       System.Boolean  1 instance        1 m_InUse

    79332c4c  400004d       68         System.Int32  1 instance        0 m_LoginFailedClients

    2e2bc50c  400004e       4c ...ssageEventHandler  0 instance 00000000 m_RemoteLoginFailedHandler

    656629a8  400004f       50 ...ssageEventHandler  0 instance 00000000 m_SqlInfoMessageHandler

    2e2bd89c  4000050       54 ...ommon.CapturedSql  0 instance 01594ee0 m_CapuredSQL

    2e2bc8d4  4000051       6c         System.Int32  1 instance        1 m_AutoDisconnectMode

    793044cc  4000052       47       System.Boolean  1 instance        0 bIsUserConnected

    2e2be0b0  4000053       58 ...mon.ServerVersion  0 instance 015c172c m_serverVersion

    79332c4c  4000054       70         System.Int32  1 instance       10 lockTimeout

    2e2bc50c  4000055       5c ...ssageEventHandler  0 instance 00000000 ServerMessageInternal

    2e2bc480  4000056       60 ...ementEventHandler  0 instance 00000000 StatementExecuted

    2e2bc50c  4000057       64 ...ssageEventHandler  0 instance 00000000 RemoteLoginFailedInternal

    79332c4c  400005b       80         System.Int32  1 instance        0 m_StatementTimeout

    79330a00  400005c       74        System.String  0 instance 01428dcc m_BatchSeparator

    79332c4c  400005d       84         System.Int32  1 instance        0 m_TransactionDepth

    2e2bc6f4  400005e       88         System.Int32  1 instance        1 m_ExecutionMode

    6523db4c  400005f       78 ...Client.SqlCommand  0 instance 01594f48 m_SqlCommand

    79330a00  4000060       7c        System.String  0 instance 015c1a70 m_TrueName

     

    * let’s assume we wanted to check and confirm what server we were actually connecting to

    * and what version it was running

     

    0:018> * dump the m_serverVersion object

    0:018> !do 015c172c

    Name: Microsoft.SqlServer.Management.Common.ServerVersion

    MethodTable: 2e2be0b0

    EEClass: 2e28425c

    Size: 20(0x14) bytes

     (C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SqlServer.ConnectionInfo\9.0.242.0__89845dcd8080cc91\Microsoft.SqlServer.ConnectionInfo.dll)

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    79332c4c  400008c        4         System.Int32  1 instance        9 m_nMajor

    79332c4c  400008d        8         System.Int32  1 instance        0 m_nMinor

    79332c4c  400008e        c         System.Int32  1 instance     4262 m_nBuildNumber

     

    * now dump the m_ServerInstance object

     

    0:018> !do 01594eac

    Name: System.String

    MethodTable: 79330a00

    EEClass: 790ed64c

    Size: 50(0x32) bytes

     (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)

    String: grahamk20\gk2005

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    79332c4c  4000096        4         System.Int32  1 instance       17 m_arrayLength

    79332c4c  4000097        8         System.Int32  1 instance       16 m_stringLength

    793316e0  4000098        c          System.Char  1 instance       67 m_firstChar

    79330a00  4000099       10        System.String  0   shared   static Empty

        >> Domain:Value  001831e0:011d1198 <<

    79331630  400009a       14        System.Char[]  0   shared   static WhitespaceChars

        >> Domain:Value  001831e0:011d17d4 <<

     

    * and you can see the name of the server I ran this command against

     

    and of course you can follow this theory on and on depending on the actual problem you are troubleshooting. I hope this provides some insights into some of things you can do when looking at SSMS problems. It’s not something I deal with that often as mostly I focus on the databas engine, but sometimes its interesting to do some managed debugging as opposed to native, and of course its always easier to demonstrate something when you can see real values and references with public symbols alone.

  • SQL in Stockholm

    Changed behaviour of OPTION RECOMPILE syntax in SQL Server 2008 SP1 cumulative update #5

    • 0 Comments

    On Monday we released Cumulative update package 5 for SQL Server 2008 Service Pack 1, under KB975977. Contained in this was a very interesting fix to a long running issue with regards to the option recompile syntax and its changed behaviour in SQL Server 2008. This fix has its own KB article at KB976603.

    In SQL Server 2008 RTM we introduced behaviour that allowed potentially better plans to be created when using the OPTION RECOMPILE syntax. The way this worked in principal was that if the OPTION RECOMPILE syntax was used, SQL could assume that the plan for the statement in question would not need to re-used, since it was always going to recompile. Therefore the query optimizer could safely use the actual parameter values passed to the statement when choosing a plan, as opposed to just using a parameterised template. In certain circumstances this can lead to dramatic performance gains, especially when you have parameter values which vary wildly and need different plans dependent upon the values passed.

    The well known MVP Erland Sommarskog demonstrates these techniques (along with many others) in his excellent article

    Dynamic Search Conditions in T-SQL

    However currently this article has been edited to remove reference to this technique (but I believe that Erland will soon add it back). We were forced to withdraw this functionality as part of Cumulative update package 4 for SQL Server 2008. The reason for this was that on rare occasions batches using this syntax could return incorrect results if multiple connections were running the same batch simultaneously, due to the fact that the query optimizer associated such plans with the specific thread upon which they were compiled, whereas the actual execution of the batch could occur subsequently on a different thread.

    Therefore if you run SQL Server 2008 RTM CU#4 or above, or SQL Server 2008 SP1 between release and CU#4, the OPTION RECOMPILE syntax works as it did in SQL Server 2005.

    However this latest update releases a full and thorough fix for the original problem. It has re-instated the enhanced functionality of parameter embedding optimization, and resolves the incorrect results problem seen earlier. The reason that this fix took a while to come out in this form was down to the fact that the changes required in the query optimizer were relatively complex. Initially it was not viable to release them quickly due to the amount of testing required, so this was why we were forced to just withdraw the functionality, because the potential for incorrect results is one of the most serious types of problem SQL Server can encounter.

    In summary, here is a matrix showing the different builds and functionality:

    SQL 2005 – no parameter embedding optimization

    SQL 2008 RTM to SQL 2008 CU#3 – new functionality , but potential for incorrect results

    SQL 2008 CU#4 onwards - no parameter embedding optimization

    SQL 2008 SP1 RTM to SQL 2008 SP1 CU#4 - no parameter embedding optimization

    SQL 2008 SP1 CU#5 onwards – new functionality enabled again, no possibility of incorrect results

    Remember that RTM and SP1 are separate branches, so even though the fix is now out in the SP1 branch, it is not in the latest cumulative update on the RTM branch, and it is unlikely to ever be put in there.

  • SQL in Stockholm

    Troubleshooting SQL Server Management Studio with SQL Profiler and Debugging Tools – Part 2

    • 0 Comments

    This is the second in a series of 3 posts which discuss troubleshooting SQL Server Management Studio when it encounters errors or unexpected behaviour.

    In this post we collect a memory dump of SSMS based upon specific .NET exception codes, for subsequent analysis.

    As a reminder, here are the symptoms of the problem we are looking at:

    1. Unable to view Job Activity in the SQL Server Agent Job Activity Monitor

    2. Progress bar shows “Cannot display activity data”

    3. Click on more details throws a message box with the error

    Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)

    with the additional details

    Hour, Minute, and Second parameters describe an un-representable DateTime. (mscorlib)

    It actually looks like this (click to enlarge)

    image

    4. The error repros with all versions of SSMS (2005 and 2008)

    5. The error repros for all users whatever workstation they connect from.

    The debugging approach

    Initially I was thinking that since I had a .NET error, I could run adplus and use a config file to configure the debugger to capture the .NET errors. This would work and I’ve done it before, but then Tess reminded me that I could use DebugDiag and it would probably be easier and quicker. So I tried this:

    Firstly I installed the DebugDiag tool on the machine where I was repro’ing the problem. I started it up and created a new rule, the properties of which are shown below. Being as I didn’t know exactly what was going on at this stage, I just elected to log everything to a file to begin with, rather than creating dumps at first, so I could get a better idea of what was really happening in the process. I expected that I was getting some .NET errors based upon the message box shown, but if I just created a dump on all .NET exceptions, I might miss something else that came before it. The rule looked like this:

    1. Type = Crash

    2. Process = ssms.exe (i was using SQL 2008 tools, if I’d been using SQL 2005 tools the process name would be sqlwb.exe)

    3. Action type for unconfigured first chance exceptions = Log Stack Trace

    4. Exceptions - CLR (.NET) Exception - All Exception Types = Log Stack Trace

    5. Action Limits = 0 (since I’m only logging to file at this stage)

    Here’s what it looks like in the GUI (click to enlarge)

    image

    I saved the rule, enabled it straight away, and then repro’d the problem again. I then disabled the rule and browsed for the log file which in my case was held here:

    C:\Program Files (x86)\DebugDiag\Logs\

    it was called:

    Ssms__PID__2912__Date__10_19_2009__Time_03_35_46PM__898__Log.txt

    Review of this file showed that actually 3 different exceptions had occurred in this order:

    [10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
    [10/19/2009 3:35:55 PM] CLR Exception Type - 'System.ArgumentOutOfRangeException'

    then

    [10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
    [10/19/2009 3:35:55 PM] CLR Exception Type - 'Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException'

    then finally

    [10/19/2009 3:35:55 PM] First chance exception - 0xe0434f4d caused by thread with system id 5824
    [10/19/2009 3:35:55 PM] CLR Exception Type - 'Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException'

    It may be in this case that all these errors are related, however the theory of only logging information first proved valuable, as I want to create a dump of when the very first error occurs. Therefore I now know that I can edit my rule to tell DebugDiag to create a full user dump, when it encounters a CLR Exception Type - 'System.ArgumentOutOfRangeException', and that it should only do the dump once. The theory of logging first is also backed up by the fact that repro’ing this problem on a x86 machine throws a different chain of .NET exceptions (although the first one is still 'System.ArgumentOutOfRangeException')

    That said, I’m now ready to capture a full user dump of the ssms.exe process, based upon the above rule, so I can debug it properly, and see the true root of the problem. I can edit my rule created earlier, and change the Exceptions dialogue box to add an additional Exception sub-rule, which looks like this (click to enlarge)

    image

    and has the properties

    1. Exceptions - CLR (.NET) Exception - .NET Exception Types = System.ArgumentOutOfRangeException

    2. Action Type = Full Userdump

    3. Action Limit = 1

    I add this exception rule, click OK a couple of times and re-enable the rule. Then run the repro again, and I notice the SSMS dialogue hang for a couple of seconds whilst it opens. This is because DebugDiag is creating the memory dump.

    I disable the rule again, and then open the log file.The output is similar to before, except that after the first exception is logged, and after the stack trace of this exception, I note like the following:

    [10/20/2009 3:14:08 PM] Created dump file C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of Ssms.exe\Ssms__PID__2912__Date__10_20_2009__Time_03_14_02PM__806__First Chance System.ArgumentOutOfRangeException.dmp
    [10/20/2009 3:14:08 PM] Action limit of 1 reached for Exception 0xE0434F4D:SYSTEM.ARGUMENTOUTOFRANGEEXCEPTION.

    This confirms that I have successfully created a dump, and I have created it at the point when the first exception was encountered. This technique ought to work for any CLR Exception that SSMS can encounter. As I mentioned above it would also be just as acceptable to use adplus with a config file to create the dump.

    in Part 3, it’s time to analyze that dump.

  • SQL in Stockholm

    Troubleshooting SQL Server Management Studio with SQL Profiler and Debugging Tools – Part 1

    • 0 Comments

    This is the first in a series of 3 posts which discuss troubleshooting SQL Server Management Studio when it encounters errors or unexpected behaviour.

    In this post we look at using profiler, in the 2nd and 3rd we debug SSMS using public Microsoft debugging tools.

    Here are the symptoms of the problem we are looking at:

    1. Unable to view Job Activity in the SQL Server Agent Job Activity Monitor

    2. Progress bar shows “Cannot display activity data”

    3. Click on more details throws a message box with the error

    Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)

    with the additional details

    Hour, Minute, and Second parameters describe an un-representable DateTime. (mscorlib)

    It actually looks like this

    image

    4. The error repros with all versions of SSMS (2005 and 2008)

    5. The error repros for all users whatever workstation they connect from.

    These last 2 points are particularly relevant. Whenever you are looking at an SSMS problem, you always need to be asking these 2 questions. Straight at the off, you have potentially then eliminated whole areas of research (such as corruptions of .NET framework on specific workstations, incompatibilities in processor architecture specific versions of the framework, and localising whether the problem is with the SSMS client or the server instance being managed, or a combination of both)

    The initial approach – using SQL Profiler and some common sense

    When I initially came to this problem, the customer was experiencing the errors above. Normally when I get any errors or problems in SSMS, rather than reaching for debugger, I reach for SQL Profiler first (having asked the above 2 questions about where and when the issue repros), and this is a good rule of thumb. At the end of the day SSMS is just a GUI which issues TSQL commands to achieve functions in SQL Server. Whenever I get unexpected results in SSMS, the first thing I therefore do is to run a profiler trace against the server which SSMS is connected to. This means that you can see the TSQL which is issued, and you can sometimes find the problem straight off, if for example it issues some piece of bad code which obviously causes your error.

    In this case if you run the SQL profiler GUI tool (you can simply point it at the appropriate server instance and use the default options), you’ll note that the SSMS issues the following SQL batches.

    SET LOCK_TIMEOUT 10000
    select SERVERPROPERTY(N'servername')
    declare @RegPathParams sysname
                    declare @Arg sysname
                    declare @Param sysname
                    declare @MasterPath nvarchar(512)
     
                    declare @LogPath nvarchar(512)
                    declare @ErrorLogPath nvarchar(512)
                    declare @n int
     
                    select @n=0
                    select @RegPathParams=N'Software\Microsoft\MSSQLServer\MSSQLServer'+'\Parameters'
                    select @Param='dummy'
                    while(not @Param is null)
                    begin
                        select @Param=null
                        select @Arg='SqlArg'+convert(nvarchar,@n)
     
                        exec master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', @RegPathParams, @Arg, @Param OUTPUT
                        if(@Param like '-d%')
                        begin
                            select @Param=substring(@Param, 3, 255)
                            select @MasterPath=substring(@Param, 1, len(@Param) - charindex('\', reverse(@Param)))
                        end
                        else if(@Param like '-l%')
                        begin
                            select @Param=substring(@Param, 3, 255)
                            select @LogPath=substring(@Param, 1, len(@Param) - charindex('\', reverse(@Param)))
                        end
                        else if(@Param like '-e%')
                        begin
                            select @Param=substring(@Param, 3, 255)
                            select @ErrorLogPath=substring(@Param, 1, len(@Param) - charindex('\', reverse(@Param)))
                        end
                        
                        select @n=@n+1
                    end
                
     
     
                    declare @SmoRoot nvarchar(512)
                    exec master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\Setup', N'SQLPath', @SmoRoot OUTPUT
                
     
     
    SELECT
    CAST(case when 'a' <> 'A' then 1 else 0 end AS bit) AS [IsCaseSensitive],
    CAST(FULLTEXTSERVICEPROPERTY('IsFullTextInstalled') AS bit) AS [IsFullTextInstalled],
    @LogPath AS [MasterDBLogPath],
    @MasterPath AS [MasterDBPath],
    @ErrorLogPath AS [ErrorLogPath],
    @@MAX_PRECISION AS [MaxPrecision],
    @SmoRoot AS [RootDirectory],
    SERVERPROPERTY(N'ProductVersion') AS [VersionString],
    CAST(SERVERPROPERTY(N'MachineName') AS sysname) AS [NetName],
    CAST(SERVERPROPERTY(N'Edition') AS sysname) AS [Edition],
    CAST(SERVERPROPERTY(N'ProductLevel') AS sysname) AS [ProductLevel],
    CAST(SERVERPROPERTY('IsClustered') AS bit) AS [IsClustered],
    CAST(SERVERPROPERTY('IsSingleUser') AS bit) AS [IsSingleUser],
    CAST(SERVERPROPERTY('EngineEdition') AS int) AS [EngineEdition],
    SERVERPROPERTY(N'ResourceVersion') AS [ResourceVersionString],
    SERVERPROPERTY(N'ResourceLastUpdateDateTime') AS [ResourceLastUpdateDateTime],
    SERVERPROPERTY(N'BuildClrVersion') AS [BuildClrVersionString],
    SERVERPROPERTY(N'ComputerNamePhysicalNetBIOS') AS [ComputerNamePhysicalNetBIOS],
    SERVERPROPERTY(N'CollationID') AS [CollationID],
    SERVERPROPERTY(N'ComparisonStyle') AS [ComparisonStyle],
    SERVERPROPERTY(N'SqlCharSet') AS [SqlCharSet],
    SERVERPROPERTY(N'SqlCharSetName') AS [SqlCharSetName],
    SERVERPROPERTY(N'SqlSortOrder') AS [SqlSortOrder],
    SERVERPROPERTY(N'SqlSortOrderName') AS [SqlSortOrderName],
    convert(sysname, serverproperty(N'collation')) AS [Collation]
    exec sp_executesql N'SELECT
    dtb.collation_name AS [Collation],
    dtb.name AS [DatabaseName2]
    FROM
    master.sys.databases AS dtb
    WHERE
    (dtb.name=@_msparam_0)',N'@_msparam_0 nvarchar(4000)',@_msparam_0=N'msdb'    Microsoft SQL Server Management Studio    grahamk    EUROPE\grahamk    0    2    0    0    2912    52    2009-10-19 14:28:13.143    2009-10-19 14:28:13.143    0X00000000040000001A00730070005F006500780065006300750074006500730071006C002E01000082001A00E7206E0076006100720063006800610072002800310033003100290006010000530045004C004500430054000A006400740062002E006300    
    create table #tmp_sp_help_category
    (category_id int null, category_type tinyint null, name nvarchar(128) null)
    insert into #tmp_sp_help_category exec msdb.dbo.sp_help_category
    SELECT
    sv.name AS [Name],
    CAST(sv.enabled AS bit) AS [IsEnabled],
    tshc.name AS [Category],
    sv.category_id AS [CategoryID],
    tshc.category_type AS [CategoryType],
    null AS [CurrentRunStatus],
    null AS [CurrentRunStep],
    null AS [HasSchedule],
    null AS [HasStep],
    null AS [HasServer],
    null AS [LastRunDate],
    null AS [NextRunDate],
    null AS [LastRunOutcome],
    CAST(sv.job_id AS nvarchar(100)) AS [job_id]
    FROM
    msdb.dbo.sysjobs_view AS sv
    INNER JOIN #tmp_sp_help_category AS tshc ON sv.category_id = tshc.category_id
     
    drop table #tmp_sp_help_category
            

    and finally:

    exec msdb.dbo.sp_help_job

     

    These batches don’t appear to return any errors or warnings to profiler. The logical next step was to take a look at what this code does. Once you read through the various batches, you notice 2 types of batch. The first 5 batches are just required to get certain properties about the server instance and to populate properties of the tree view in SSMS Object Explorer. Whenever you profile SSMS you see lots of batches like this, which tend to precede the “real” work.

    The you see the final single batch, which executes msdb.dbo.sp_help_job. This is the real work that is required to populate the dialogue box that SSMS is opening. Since these batches are all just queries, and nothing is being updated, we can easily re-run them in SSMS query windows to see if anything else is thrown up, or see if weird results are returned that might explain our strange date time error.

    We tried this and it returned a wide record set of several hundred rows, which on initial glance was all well and good. At this stage, the pragmatists amongst you may feel it’s appropriate to just scan through this data and see if there are any glaring inconsistencies which can be noticed which might cause a DateTime related error. The truth is that there was such a piece of data, if you could be bothered to read all the data manually, but it was only a hypothesis. The stored procedure in question returns some columns that are related to dates and times in a true DateTime representation. However it also returns some columns as text representations of date parts. In this record set, these columns should always be populated with values, but in this case there were a number of rows which showed the value “0” in these columns. We therefore made the guess that this data was the problem. We removed this data and tried the repro again, and everything worked ok.

    We were lucky in this situation that the record set to review was only a few hundreds rows, and that the data corruption was obvious. On the basis that neither of these situations is always true, part 2 will show how to continue to troubleshoot this with debugging tools, to categorically prove what the problem was.

  • SQL in Stockholm

    SQL Server Database Mirroring Pre-requisites and guidelines

    • 0 Comments

    I noticed that we have posted a really good summary of the above points through the new fast publish system for KBs. (Note the disclaimer at the bottom which is not normally shown on full KBs). There are some really good points here to consider about setting up and configuring a mirroring environment.

    Things to consider when setting up database mirroring in SQL Server.

    In addition to this of course, you should remember my previous points made on this blog about Virtual Log File fragmentation which I recently saw in a production environment again.

    1413 error when starting Database Mirroring – How many virtual log files is too many?

    Slow recovery times and slow performance due to large numbers of Virtual Log Files

  • SQL in Stockholm

    Using Xquery to query Extended Events asynchronous file target results

    • 1 Comments

    This is the follow up (part 2) to the following post:

    http://blogs.msdn.com/grahamk/archive/2009/09/23/realistic-troubleshooting-example-of-extended-events-xevent-usage-in-sql-server-2008-part-1.aspx

    In the original post we configured an extended event session and issued a very simple query to review the results as XML. Here are some more intelligent XQuery examples. I’m documenting these purely because I hate XQuery so much, and it takes so damn long to do (well it does for me anyway). Please feel free to post in the comments section if you think XQuery is the best damn query language ever invented :-)

    Hopefully the code comment explain everything, I have attempted to use very descriptive alias names as well

    -- get information from XET binary files representing extended event session date
     
    -- as per part 1 blog post:
    -- straight select, view the XML manually in an SSMS window by clicking on the XML results column
    select CAST(event_data as XML) , * 
    from sys.fn_xe_file_target_read_file
    (
    'C:\temp\ArithmeticErrors_0_128981071730370000.xet', 
    'C:\temp\ArithmeticErrors_0_128981071730390000.xem', 
    null, 
    null
    )
     
    -- Get only specific nodes
    -- 2 different methods shown of the xquery, you can specify the name of the node
    -- or you can specify its location in the structure
    -- this means that you return every row for every event captured in the file
    -- the results are then converted to SQL data types
     
    SELECT
    xdata.value('(/event/data[4]/value)[1]', 'varchar(100)') AS [error],
    xdata.value('(/event/data[@name="severity"]/value)[1]', 'int') AS [severity],
    xdata.value('(/event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text]
     
    FROM
        (
            select CAST(event_data as XML)
            from sys.fn_xe_file_target_read_file
            (
            'C:\temp\ArithmeticErrors_0_128981071730370000.xet', 
            'C:\temp\ArithmeticErrors_0_128981071730390000.xem', 
            null, 
            null
            )
        ) as xmlr(xdata)
     
    -- using the above technique, you can now wrap the entire query and give it an alias
    -- and then perform more useful T-SQL language queries on the results
    -- here is an example of an aggregate
     
    SELECT
    convert(varchar(100), outer_results.[error]) as 'error',
    convert(int, outer_results.[severity]) as 'severity',
    convert(varchar(max), outer_results.[sql_text]) as 'sql_text',
    count(*) as [count_occurances]
    FROM
        (
        SELECT
        xdata.value('(/event/data[4]/value)[1]', 'varchar(100)') AS [error],
        xdata.value('(/event/data[@name="severity"]/value)[1]', 'int') AS [severity],
        xdata.value('(/event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text]
     
        FROM
            (
                select CAST(event_data as XML)
                from sys.fn_xe_file_target_read_file
                (
                'C:\temp\ArithmeticErrors_0_128981071730370000.xet', 
                'C:\temp\ArithmeticErrors_0_128981071730390000.xem', 
                null, 
                null
                )
            ) as xmlr(xdata)
        ) as outer_results
    group by error, severity,sql_text
     
    -- finally using the same technique as above , here we have simple predicates as well
     
    -- add a predicate into the outer query
    SELECT
    convert(varchar(100), outer_results.[error]) as 'error',
    convert(int, outer_results.[severity]) as 'severity',
    convert(varchar(max), outer_results.[sql_text]) as 'sql_text',
    count(*) as [count_occurances]
    FROM
        (
        SELECT
        xdata.value('(/event/data[4]/value)[1]', 'varchar(100)') AS [error],
        xdata.value('(/event/data[@name="severity"]/value)[1]', 'int') AS [severity],
        xdata.value('(/event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text]
     
        FROM
            (
                select CAST(event_data as XML)
                from sys.fn_xe_file_target_read_file
                (
                'C:\temp\ArithmeticErrors_0_128981071730370000.xet', 
                'C:\temp\ArithmeticErrors_0_128981071730390000.xem', 
                null, 
                null
                )
            ) as xmlr(xdata)
        ) as outer_results
     
    -- here is the extra predicate we have added
    where outer_results.severity < 17
    group by error, severity,sql_text
     

    I fully accept that this may not be the most reasonable way, or the most efficient way to do this, but it’s the way that I’ll be using for now.

  • SQL in Stockholm

    SQL Server 2008 SDK directory does not contain the opends60.lib file required for the creation of extended stored procedures

    • 0 Comments

    If you are trying to create extended stored procedures in SQL Server 2008, you will probably have read the BOL article here:

    Creating Extended Stored Procedures

    Hopefully you will have noticed that this feature is deprecated, and so firstly you’ll only be using this functionality for legacy requirements. That said you may have noticed that you will require the file opends60.lib to be able to create your extended stored procedure DLL. You may have noticed that you cannot find this after installing SQL Server 2008 (even if you have installed the SDK).

    The default location for the SDK directory in SQL Server 2008 is here:

    C:\Program Files\Microsoft SQL Server\100\SDK\

    However it does not contain the opends60.lib. If you need to acquire a copy of this file, currently you need to obtain it from an alternate source, namely a previous version of SQL Server SDK or from an install of a previous version of visual studio. For example the x64 version of the file can be found in the SQL Server 2005 SDK directory here:

    C:\Program Files\Microsoft SQL Server\90\SDK\Lib\x64

    You should be able to use one of these other versions instead.

  • SQL in Stockholm

    SQL Server DMV script for monitoring USERSTORE_TOKENPERM size by token type and number

    • 0 Comments

    Here’s a script I was using last week. I wrote this so that it could be used in a custom PSSDIAG / SQLDIAG task, but you can also run it in other automated methods as well.

    To include it in SQLDIAG / PSSDIAG I added a custom group section into the XML config file (see previous posts on this blog if you don’t know how to do this), and saved the file in the root of the SQLDIAG directory as exec_userstore1.sql.

    <CustomGroup name="userstore" enabled="true" />
    <CustomTask enabled="true" groupname="userstore" taskname="detailed_scripts" type="TSQL_Script" point="Startup" wait="No" cmd="exec_userstore1.sql"/>

    Here’s the script itself:

    /*
    detailed USERSTORE_TOKENPERM analysis for SQL 2005
    script designed to be packaged into PSSDIAG and run over long time frame to monitor
    but can be run individually at a fixed point to capture a snapshot
    can be used in association with script number 2 which is verbose and does a snapshot
    of all logins and token in cache, so can be HUGE.
    graham kent - microsoft css - 15th September 2009
    */
     
    SET NOCOUNT ON
    SET QUOTED_IDENTIFIER ON
    GO
     
    -- get the token type distribution
     
    WHILE 1=1
    BEGIN
     
    select 'start time:', GETDATE();
     
    SELECT 
    convert(varchar(50),[name]) as 'SOS StoreName', 
    convert(varchar(50),[TokenName]) as 'TokenName',
    convert(varchar(50),[Class]) as 'Class',
    convert(varchar(50),[SubClass]) as 'SubClass',
    count(*) as [Num Entries]
    FROM
    (SELECT name,
    x.value('(//@name)[1]', 'varchar (100)') AS [TokenName],
    x.value('(//@class)[1]', 'varchar (100)') AS [Class],
    x.value('(//@subclass)[1]', 'varchar (100)') AS [SubClass]
    FROM
    (SELECT CAST (entry_data as xml),name
    FROM sys.dm_os_memory_cache_entries
    WHERE type = 'USERSTORE_TOKENPERM') 
    AS R(x,name)
    ) a
    group by a.name,a.TokenName,a.Class,a.SubClass
     
    -- loop on 1 minute basis
    WAITFOR DELAY '00:01:00'
     
    END

    When run in this format you’ll get a text file created in the output directory used by SQLDIAG which will contain the results from the script. You can then monitor the individual items in the cache over long periods of time to see if you’re encountering any of the known issues around this security cache.

  • SQL in Stockholm

    SQL Server DMV script for searching the plan cache by query text, plan handle or execution time

    • 0 Comments

    I guess everyone has their own set of DMV scripts to hand when working with SQL Server nowadays. I thought I’d post a few of mine. Pretty much as the title says: comment in whichever clauses suit you and then run the query. Returns a result set with an XML column that can be clicked on if you want, to view the plan itself in SSMS. Works with 2005 and 2008.

    select 
    bucketid,
    a.plan_handle,
    refcounts, 
    usecounts,
    execution_count,
    size_in_bytes,
    cacheobjtype,
    objtype,
    text,
    query_plan,
    creation_time,
    last_execution_time,
    execution_count,
    total_elapsed_time,
    last_elapsed_time
     
    from sys.dm_exec_cached_plans a 
          inner join sys.dm_exec_query_stats b on a.plan_handle=b.plan_handle
        cross apply sys.dm_exec_sql_text(b.sql_handle) as sql_text
        cross apply sys.dm_exec_query_plan(a.plan_handle) as query_plan
    where 1=1
    and text like '%mybadproc%'
    -- and a.plan_handle = 0x06000B00C96DEC2AB8A16D06000000000000000000000000
    and b.last_execution_time between '2009-09-24 09:00' and '2009-09-25 17:00'
    order by last_execution_time desc
  • SQL in Stockholm

    Realistic troubleshooting example of extended events (XEvent) usage in SQL Server 2008 – Part 1

    • 0 Comments

    Here’s an example of how to collect and consume extended events in SQL Server 2008. It’s something that we in CSS are starting to use more and more frequently with the uptake of SQL 2008, and I believe it’s something that everyone who troubleshoots SQL can gain benefit from.

    Here’s a relatively simple example which is based upon real world experiences of looking at this bug in Connect, which I was looking at recently. The main thing I want to show here is how to review the results of collection yourself, as obviously the setup of the actual event collection is already described on the connect site.

    The scenario is relatively simple, and a typical one for extended event usage. You have a process which produces an arithmetic overflow at an unknown point. In the example of this bug, it’s within the SSIS package managing data upload for the performance data warehouse. It could be any large process application or tool though, which would prove more difficult to trace with something like profiler or any of the other troubleshooting tools.

    From the connect site, a colleague in the product group has asked the reporting customer to configure and run the following event session:

    -- XEvent Session Definition Arithmetic error, Log additional info
     
    -- Sample Query: select convert(bigint,21568194 * 100)
    if exists(select * from sys.server_event_sessions where name='ArithmeticError')
         DROP EVENT SESSION ArithmeticError ON SERVER
    go
     
    CREATE EVENT SESSION ArithmeticError ON SERVER
    ADD EVENT sqlos.exception_ring_buffer_recorded (
    ACTION (sqlserver.tsql_stack, sqlserver.sql_text)
                     WHERE (error = 8115 ))
    ADD TARGET package0.ring_buffer,
    ADD TARGET package0.asynchronous_file_target(
    SET filename='c:\temp\ArithmeticErrors.xet', metadatafile='c:\temp\ArithmeticErrors.xem')
    WITH (MAX_MEMORY = 4096KB, 
    EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS, 
    MAX_DISPATCH_LATENCY = 1 SECONDS, 
    MAX_EVENT_SIZE = 0KB, 
    MEMORY_PARTITION_MODE = NONE, 
    TRACK_CAUSALITY = OFF, 
    STARTUP_STATE = OFF)
     
    go
     
    ALTER EVENT SESSION ArithmeticError ON SERVER
    state=start
    go
     
    -- Cleanup the session 
    -- drop EVENT SESSION ArithmeticError ON SERVER
     
     

    You can run this code as is on a test server to try out the demo. You’ll need to ensure that you have a c:\temp directory for the results to output to a file. There are several different ways that you can consume or record the output from an extended event, but here we are writing to a file as said files can then easily be uploaded to us here at Microsoft. You can read about some of the other alternatives in the Books Online link above if you want.

    Once you have the event set up, you can trigger the 8115 error, by un-commenting the “sample query” line at the top of the script. When you run it in SSMS, you should encounter the following error:

    Msg 8115, Level 16, State 2, Line 1
    Arithmetic overflow error converting expression to data type int.

    If you then look in the c:\temp directory, you ought to notice the files the session is writing to have been created, however at this stage, the .xet file (which actually holds the event data as opposed to the xem file which holds the event meta data required to decode the event itself) is actually empty. This is because the session is still active and the data has not been flushed to the file. If you run the following to stop (but not drop) the event

     
    ALTER EVENT SESSION ArithmeticError ON SERVER
    state=stop
    go
    Then the data will be written to the file. You’ve now capture enough data for the purposes of this test, so you can drop the event session, the syntax for which is
    drop EVENT SESSION ArithmeticError ON SERVER

    So the question then comes, what to do with the files. I have to admit it took me a while to work this out, as it wasn’t obviously documented. I also needed the help of my good friend and pool partner, Spike, to write some XQuery in a nice way, as I don’t really like XQuery. The function is documented in Books Online though to be fair (although not the XQuery!)

    There are 2 different ways you can go about reading this type of file in my opinion. Firstly you can just run the following query in SSMS and you’ll get an XML column in the results pain, if you output to grid. You can then just click on the column and you’ll get to look at the XML document in a new SSMS window. Quick and undeniably dirty yes, but it works quite effectively for a simple scenario like this:

    -- straight select, view the XML manually
    select CAST(event_data as XML) , * 
    from sys.fn_xe_file_target_read_file
    (
    'C:\temp\ArithmeticErrors_0_128981030796910000.xet', 
    'C:\temp\ArithmeticErrors_0_128981030796920000.xem', 
    null, 
    null
    )

    Note that the actual file names have been modified automatically by the event engine to guarantee uniqueness. When you run this, you should get an output grid which looks like this:

    image

    If you then click on the first XML column, you’ll open up a new tab with the XML document for viewing, which will look something like this:

    <event name="exception_ring_buffer_recorded" package="sqlos" id="21" version="1" timestamp="2009-09-22T14:33:06.537Z">
      <data name="id">
        <value>0</value>
        <text />
      </data>
      <data name="timestamp">
        <value>0</value>
        <text />
      </data>
      <data name="task_address">
        <value>0x00000000000ffdc8</value>
        <text />
      </data>
      <data name="error">
        <value>8115</value>
        <text />
      </data>
      <data name="severity">
        <value>16</value>
        <text />
      </data>
      <data name="state">
        <value>2</value>
        <text />
      </data>
      <data name="user_defined">
        <value>false</value>
        <text />
      </data>
      <data name="call_stack">
        <value />
        <text />
      </data>
      <action name="tsql_stack" package="sqlserver">
        <value>&lt;frame level='1' handle='0x01000100FC030E3850E97786000000000000000000000000' line='1' offsetStart='0' offsetEnd='0'/&gt;</value>
        <text />
      </action>
      <action name="sql_text" package="sqlserver">
        <value>select convert(bigint,21568194 * 100)</value>
        <text />
      </action>
    </event>

    Most of this XML ought to be self explanatory. The bit that is most interesting to us here is the “sql_text” section which shows the actual text being executed. In a real world scenario where we didn’t know what code was causing our error, we’ve now tracked it down.

    The tsql_stack section is also useful. You can pass the handle you get back from this section into a DMV to get further details about the object or statement being executed (if it’s part of a stored procedure for example). A typical syntax for this would be

    SELECT * FROM sys.dm_exec_sql_text(0x01000100FC030E3850E97786000000000000000000000000)

    However, you should note that in this very simple example, this will return nothing. This is because a plan has not been cached for our simple select statement that only does a mathematical  calculation. This is of course possible in other scenarios where the plan which was used at the time the event captured it, is no longer cached, due to it being aged out, or the server being re-started.

    If you want to follow this example and be able to look at the sql_text plan handle, then you need to write an actual query which fetches data and has an arithmetic overflow. For example, simple procedure like this

    create procedure mybadproc
     
    as
     
    select *
    from sys.servers
     
    select convert(bigint,999999999 * 100)

    will cache a plan when executed, and will be captured by the same event session we set up above. If you execute this, or other more complex statements / procedures, you can get back multiple items in the tsql_stack, each with separate handles which you can then pass into DMV based queries which accept plan handles. If you capture the execution of the above procedure in this event session, you’ll get back a slightly different XML document, where the tsql_stack has multiple values and the sql_text is now the name of the procedure that we executed. You can then pass any of the handles into DMV queries which accept such parameters.

    <action name="tsql_stack" package="sqlserver">
       <value>&lt;frame level='1' handle='0x03000D00E7A4787D959020018C9C00000100000000000000' line='8' offsetStart='132' offsetEnd='-1'/&gt;
              &lt;frame level='2' handle='0x01000D00D8E5380F60E77786000000000000000000000000' line='1' offsetStart='0' offsetEnd='-1'/&gt;</value>
       <text />
     </action>
     <action name="sql_text" package="sqlserver">
       <value>mybadproc</value>
       <text />
     </action>
    /event>

    So to summarise part 1 of this post, we’ve looked at how to setup an extended event session and capture the output to a file. We’ve then seen a very simple way of looking at the output to manually identify a condition captured by our session. In the next part I’ll show some XQuery to review the output in more detail.

  • SQL in Stockholm

    Troubleshooting SQL Server performance problems associated with low memory conditions – should I use sys.dm_os_ring_buffers

    • 0 Comments

    I recently received a question through this blog about the use of the sys.dm_os_ring_buffers DMV with regard to troubleshooting performance problems and low memory conditions. Here are some excerpts – edits from my reply which might provide some ideas for you.

    Here's an interesting article you might like to read about monitoring memory usage with ring buffers:

    http://blogs.msdn.com/slavao/archive/2005/02/19/376714.aspx

    In terms of my personal preferences regarding performance monitoring in SQL Server 2005, I find it rare that I have to use the ring buffers. They are of course undocumented externally (which is why you are having trouble finding information on them!), and so I tend to discuss with customers the usage of published documented tools as in 99% of cases these are enough to diagnose memory problems (and most others).

    I would potentially take a different approach. As I often write in my blog I like to use the SQLDIAG tool to collect a variety of data when I am experiencing / monitoring a problem. In this case it might be useful to follow my post about collecting the DBCC MEMORYSTATUS output, in addition to the standard data when you are experiencing this problem. If you have experiencing a memory issue somewhere, this will give you the breakdown to diagnose it pretty much in all cases, and all the results of this output are publically documented. Alternatively though you could change the script which collects DBCC MEMORYSTATUS to collect some other hybrid selection of DMV based output, or even the ring buffers output.

    The other benefit of this approach is that you collect a whole set of data which allows you to keep an open mind (which I like to do) about what the cause of the performance problem really is.

    So in summary, should a customer come to me with a rough issue description like this (I have performance problems and I believe I am experiencing low memory conditions, my initial action plan would be as follows:

    Run SQLDIAG for 30 minutes when the problem was occurring

    Collect the SQL 2005 perfstats scripts

    Collect DBCC MEMORYSTATUS add in

    Collect a lightweight profiler (do not include statement level events or showplans)

    Collect perfmon with all SQL Server counters like Buffer Manager added in .

    And then I would review all the data. You could of course consider using SQLNEXUS to do this, or alternatively review the output manually as its all from documented in sources.

    Here is a particularly good article which covers a lot of these documented issues, and has an excellent section of memory troubleshooting.

    http://technet.microsoft.com/en-us/library/cc966540.aspx

    Finally I just remembered that there is this public article as well which does document a good use for the sy.dm_os_ring_buffer output for a specific condition.

    How to reduce paging of buffer pool memory in the 64-bit version of SQL Server

    Update: another more detailed post about resource manager and the ring buffer records it writes is now available here:

    http://blogs.msdn.com/psssql/archive/2009/09/17/how-it-works-what-are-the-ring-buffer-resource-monitor-telling-me.aspx

    That said my comments still remain the same when you are going into a more generic performance problem.

    Graham

  • SQL in Stockholm

    Fixed - incorrect CPU time reported by sys.dm_exec_query_stats for parallel plans

    • 0 Comments

    I was discussing with Elisabeth back in this blog post about the above problem, and made comments that there were no plans currently to backport this to SQL 2005 or 2008. This changed following feedback both from customers and from us within CSS, and the fix was made part of SQL Server 2008 SP1. You'll note if you look at the KB article for SP1, that it doesn't have it's own KB, but it does get a mention in the fix list.

    Enjoy!

  • SQL in Stockholm

    Troubleshooting SQL Server deadlocks and performance when running SharePoint

    • 2 Comments

    I’ve worked on a lot of cases this year involving SharePoint. Here are some example problems I encountered recently. If you want to read the background or the formal advice first, here are some good starting links

    Planning and Monitoring SQL Server Storage for Office SharePoint Server: Performance Recommendations and Best Practices (white paper)

    Plan for performance and capacity (Office SharePoint Server)

    In addition to this, you need to consider what you can and can’t do from a SQL Server perspective, if you’re trying to optimize or troubleshoot your SharePoint deployment. You need to pay detailed attention to these articles:

    http://support.microsoft.com/kb/932744/

    http://support.microsoft.com/kb/841057/

    In simple terms, don’t go amending the schema or the objects! So back to my main point…

    Things to do first:

    1. Collect some decent SQL Server diagnostic data – my preferred option,as ever, SQLDIAG with perfstats scripts

    2. Turn on trace flag 1222 to output deadlock information to the errorlog, or capture deadlock graphs in your profiler traces.

    3. Collect this data at period of both good and bad performance, so you have a benchmark to compare against

    4. Aggregate the profiler output. Personally I use readtrace.exe directly from RMLUtilities, but other like to use SQLNexus as well.

    Here are the 2 real examples of problems I encountered recently, and diagnosed using the above data collection:

    1. Are you experiencing significant parallel waits – SQL Server installations will have a default Max Degree of Parallelism set to 0. I have observed certain MOSS deployments this year where this is not the best MAXDOP, dependent upon the specifics of the data distribution and deployment. If you look in the perfstats output you might see evidence like this:

    -- requests --
    runtime                        session_id request_id  ecid        blocking_session_id blocking_ecid task_state      wait_type
    ------------------------------ ---------- ----------- ----------- ------------------- ------------- --------------- ---------
    2009-05-27T15:06:38.060                68           0           4                   0             0 RUNNING         CXPACKET 
    2009-05-27T15:06:38.060                68           0           0                   0             4 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060                96           0           0                   0             0 SUSPENDED       WAITFOR  
    2009-05-27T15:06:38.060               116           0           0                   0             0 RUNNING                  
    2009-05-27T15:06:38.060               128           0          12                   0             0 RUNNING         CXPACKET 
    2009-05-27T15:06:38.060               128           0           9                   0             0 RUNNABLE        CXPACKET 
    2009-05-27T15:06:38.060               128           0           0                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           5                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0          11                   0             0 RUNNABLE        CXPACKET 
    2009-05-27T15:06:38.060               128           0           6                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           2                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           3                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           4                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           7                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           8                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0           1                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               128           0          10                   0             0 RUNNABLE        CXPACKET 
    2009-05-27T15:06:38.060               138           0          10                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0          12                   0             0 RUNNABLE        CXPACKET 
    2009-05-27T15:06:38.060               138           0          11                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           9                   0             0 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           0                   0             3 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           3                   0             8 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           1                   0             8 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           2                   0             8 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           4                   0             8 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           7                   0            12 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           5                   0            12 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           6                   0            12 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               138           0           8                   0            12 SUSPENDED       CXPACKET 
    2009-05-27T15:06:38.060               163           0           0                   0             0 RUNNABLE                 
    2009-05-27T15:06:38.060               170           0           0                 163             0 SUSPENDED       LCK_M_X  

    This is a classic example, where a long blocking chain starts to build due to CXPACKET waits. Note that the same sessionid is reported on multiple lines, but the ecid column changes. This indicates that the batch (sessionid) is being processed in parallel by more than one worked (ecid). This extract is very early on in the problem, from here the chains grow and grow, and eventually can result in multiple deadlocks. In this scenario the batches being executed by sessions 128 and 138 (which are both running PROC_LISTUNGHOSTEDFIELDTEMPLATESINLIST ) are picking parallel plans inappropriately. We worked around this by changing MAXDOP on the server to a value of 1. This is definitely something that you should test in your environment, since it’s a server wide setting. Currently according to the above KB articles there is no fixed recommendation about MAXDOP for MOSS deployments, but things can change, so you should check the fixed requirements of your version. In this scenario we saw significant other number of procedures which encountered the same issue, procedures such as proc_CheckDirectoryTree, proc_DeplGetListItemLinks and PROC_DEPLGETLISTITEMANDRESERVEID amongst others. Typical errors you might see in the application logs for SharePoint would be

    Unexpected query execution failure, error code 1205. Additional error information from SQL Server is included below. "Transaction (Process ID 163) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."

    A global change of MAXDOP = 1 resolved this specific situation.

    2. Are you encountering long blocking chains accessing the AllDocs table? This object is commonly accessed by a number of different stored procedures, and I’ve seen some scenarios where it becomes a point of contention. The most problematic one I came across recently resulted in numerous deadlocks, which firstly throw the above 1205 error to the SharePoint application log, and then in addition they throw the following error:

    InitializePublishingWebDefaults failed. Microsoft.SharePoint.SPException: Operation aborted (Exception from HRESULT: 0x80004004 (E_ABORT)) ---> System.Runtime.InteropServices.COMException (0x80004004): Operation aborted (Exception from HRESULT: 0x80004004 (E_ABORT))     at Microsoft.SharePoint.Library.SPRequestInternalClass.SetWebProps

    This second error occurs as SharePoint does not roll back properly after encountering the deadlock. If you have collected your SQLDIAG data as noted above, you’ll see symptoms like this in the perfstats script output:

    a) long blocking chains waiting on LCK_M_U

    -- requests --
    runtime                        session_id request_id  ecid        blocking_session_id blocking_ecid task_state      wait_type                                          wait_duration_ms     
    ------------------------------ ---------- ----------- ----------- ------------------- ------------- --------------- -------------------------------------------------- -------------------- 
    2009-05-28T10:01:40.973                60           0           0                 197             0 SUSPENDED       LCK_M_X                                                            3201 
    2009-05-28T10:01:40.973                72           0           0                  60             0 SUSPENDED       LCK_M_S                                                            3224 
    2009-05-28T10:01:40.973                77           0           0                   0             0 SUSPENDED       WAITFOR                                                            9087 
    2009-05-28T10:01:40.973                84           0           0                  72             0 SUSPENDED       LCK_M_S                                                            1018 
    2009-05-28T10:01:40.973               197           0           0                   0             0 RUNNING                                                                               0 
    2009-05-28T10:01:40.973               268           0           0                 270             0 SUSPENDED       LCK_M_U                                                           36299 
    2009-05-28T10:01:40.973               270           0           0                 197             0 SUSPENDED       LCK_M_U                                                           96753 
    2009-05-28T10:01:40.973               277           0           0                 270             0 SUSPENDED       LCK_M_U                                                           44290 
    2009-05-28T10:01:40.973               280           0           0                 270             0 SUSPENDED       LCK_M_U                                                           60461 
    2009-05-28T10:01:40.973               292           0           0                 270             0 SUSPENDED       LCK_M_U                                                           52389 

    b) the notableactivequeries part of the perfstats script will potentially note that the plan_total_exec_count is very low. This indicates that you appear to be running a freshly compiled plan for the blocking statement. This will be all the more likely if your system was running well, and then suddenly started degrading performance and encountering errors

    -- notableactivequeries --
    runtime                        session_id request_id  plan_total_exec_count plan_total_cpu_ms    plan_total_duration_ms plan_total_physical_reads plan_total_logical_writes plan_total_logical_reads 
    ------------------------------ ---------- ----------- --------------------- -------------------- ---------------------- ------------------------- ------------------------- ------------------------ 
    2009-05-28T10:01:40.973               197           0                     5                33057                 406079                      2550                         0                  2352015 
    2009-05-28T10:01:40.973               268           0                     5                33057                 406079                      2550                         0                  2352015 
    2009-05-28T10:01:40.973               270           0                     5                33057                 406079                      2550                         0                  2352015 
    2009-05-28T10:01:40.973               277           0                     5                33057                 406079                      2550                         0                  2352015 
    2009-05-28T10:01:40.973               280           0                     5                33057                 406079                      2550                         0                  2352015 
    2009-05-28T10:01:40.973               292           0                     5                33057                 406079                      2550                         0                  2352015 

     

     

     

     

    c) when reviewing the error logs you note that the same statement shown below, being executed within the procedure proc_UpdateTpWebMetaData, is always involved in the deadlocking

    UPDATE
        Docs
    SET
        Docs.BuildDependencySet = NULL
    FROM
        Docs INNER JOIN BuildDependencies
    ON
        BuildDependencies.SiteId = Docs.SiteId AND
        BuildDependencies.LeafName = Docs.LeafName AND
        BuildDependencies.DirName = Docs.DirName
    WHERE
        BuildDependencies.SiteId = @SiteId AND
        BuildDependencies.TargetDirName = @PreviousCustomMasterDir AND
        BuildDependencies.TargetLeafName = @PreviousCustomMasterLeaf AND
        Docs.BuildDependencySet IS NOT NULL

    The reason for this problem, in this particular scenario, is that a new plan has been compiled recently for this statement which is using a HASH join, as opposed to a nested loop join which it was using earlier. This is caused by a lack of up to date statistics on the AllDocs table. We noted scenarios where the regular update statistics jobs (issued by SharePoint as part of the changes in 2007 SP2) were not thorough enough, and we were forced to run update statistics with the fullscan option on the AllDocs table.

    The bottom line here being that although SharePoint does now do database maintenance, as shown in KB932744, there are still times potentially where you will need to do more detailed targeted maintenance.

  • SQL in Stockholm

    Download location for SQLDIAG Perfstats DMV scripts

    • 1 Comments

    I refer to this many times in my other posts, so here’s a direct link to the download location of the perfstats scripts for SQLDIAG which you can use yourself. I use this everyday, you should too!

    http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-2005-performance-statistics-script.aspx

  • SQL in Stockholm

    SQL Server 2005 Error: 701, Severity: 17, State: 193

    • 1 Comments

    Here’s another interesting example which we worked recently in the Stockholm office, which relates also back to one of my previous entries. We had a customer whose error log was filled with a variety of critical errors such as these:

    Error: 701, Severity: 17, State: 193. There is insufficient system memory to run this query.

    Error: 1204, Severity: 19, State: 2. The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.

    Error: 17803, Severity: 20, State: 13. There was a memory allocation failure during connection establishment. Reduce nonessential memory load, or increase system memory. The connection has been closed. [CLIENT: <local machine>]

    This was a fairly high end system with 64GB RAM, 16 logical CPUs and was generally well configured in terms of its Max Server Memory and other basic sp_configure settings. As you may well be aware, there are innumerable causes that can actually cause you to receive a memory related error such as this, and some of the more basic troubleshooting tips are shown in the following MSDN article

    http://msdn.microsoft.com/en-us/library/aa337311(SQL.90).aspx

    However in this case none of the basics were really of any help. Typical starting questions to ask yourself in this scenario are:

    1. Is there a pattern to when this occurs

    2. Do I know of any particular workloads which are running when I see this problem

    …just to understand if you have any particular potential causes in your own code. Our customer in this scenario was not aware of any, and unfortunately we didn’t have any dumps created which might allow us to see retrospectively what batches were running when the problem occurred.

    However we did have the partial output of from DBCC MEMORYSTATUS which SQL had kindly written to the error log for us. An examination of this showed the following suspicious looking section:

    OBJECTSTORE_LOCK_MANAGER (node 2)
        VM Reserved = 0 KB
        VM Committed = 0 KB
        AWE Allocated = 0 KB
        SM Reserved = 0 KB
        SM Committed = 0 KB
        SinglePage Allocator = 29398856 KB
        MultiPage Allocator = 0 KB

    Global Memory Objects:
            Resource=    709
            Locks=    3676149
            SE Schema Mgr=    1646
            SQLCache=    261
            Replication=    2
            ServerGlobal=    109
            XPGlobal=    2
            Xact=    101
            SETLS=    16
            DatasetMemObjs=    32
            SubpDescPmos=    16
            SortTables=    2

    From this we can infer that we seemed to experience an equivalent problem to the one I wrote about here

    http://blogs.msdn.com/grahamk/archive/2009/02/03/compilation-bottlenecks-error-8628-severity-17-state-0-part-1.aspx

    although manifesting distinctly different symptoms.

    At the end of the day one of the NUMA nodes has taken up about 29GB of memory just managing lock resources, and this is confirmed by the number of locks which are reported in the global memory objects. Remember that the majority of the output from DBCC MEMORYSTATUS is publically documented here:

    How to use the DBCC MEMORYSTATUS command to monitor memory usage on SQL Server 2005

    http://support.microsoft.com/kb/907877

    This 29GB of memory purely managing locks appears to be placing an undue load of the memory resources of the system, and what we appear to have here is a runaway or huge batch of some type, or maybe a very large open transaction.

    The way to resolve this problem is to identify the batch and stop it from running. One good way to identify this problem proactively might be to use the tiny DMV query I wrote in my previous post about this type of situation and attach a monitor to it, warning you if the value goes above 1 million say, or some other high number of your choice.

    select request_session_id, count(*)
    from sys.dm_tran_locks
    group by request_session_id

    You could then easily identify the problem session_id before it got completely out of hand. There are of course many different alternative ways of capturing such data or doing such monitoring. Maybe you’d prefer to create a custom SQLDIAG package to monitor your server in more detail.

    The moral of this story being though, that even in the most critical system failures such as this, there may well be enough info to get a hint of where the problem lies, if you make sure you go through the error logs in detail. Remember not to be overwhelmed by the sheer volume of errors that might be reported; ensure you go through all the lines removing the duplicates to see whether something of interest is hidden amongst all the noise.

  • SQL in Stockholm

    SQL Server Error: 7320 Cannot execute the query "xxx" against OLE DB provider "DB2OLEDB" for linked server "xxx"

    • 0 Comments

    Here's a little something that me and my colleagues were looking at this week. Our customer was executing queries against a DB2 linked server using the latest provider and occasionally got these errors. One of my colleagues suggested to set a dump trigger on the error 7320 so that we could capture a sqlservr.exe dump when the problem next occurred. We achieved this by running

    dbcc dumptrigger('SET',7320)

    on the server. By default this will collect a mini-dump, unless you have enabled any other trace flags previously. When we had encountered the error and analyzed the dump, it showed up the following stack

    sqlservr!ex_raise+0x52
    sqlservr!COledbError::HandleExecuteError+0x11b
    sqlservr!COledbRangeRowset::FDoPerformSchemaCheckForQuery+0xc7
    sqlservr!COledbRangeRowset::FPerformSchemaCheck+0x1fb
    sqlservr!COledbRangeRowset::FCheckSchema+0x2a
    sqlservr!CEnvOledbRange::XretSchemaChanged+0x16
    sqlservr!CXStmtDML::XretDMLExecute+0x41037b
    sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0xace
    sqlservr!CMsqlExecContext::FExecute+0x439
    sqlservr!CSQLSource::Execute+0x355
    sqlservr!CStmtExecStr::XretExecStrExecute+0x48d
    sqlservr!CXStmtExecStr::XretExecute+0x20
    sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0x39e
    sqlservr!CMsqlExecContext::FExecute+0x439
    sqlservr!CSQLSource::Execute+0x355
    sqlservr!CStmtExecProc::XretLocalExec+0x125
    sqlservr!CStmtExecProc::XretExecExecute+0x132
    sqlservr!CXStmtExec::XretExecute+0x20
    sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0x39e
    sqlservr!CMsqlExecContext::FExecute+0x439
    sqlservr!CSQLSource::Execute+0x355
    sqlservr!process_request+0x312
    sqlservr!process_commands+0x3fd

     

    Before going into any in depth debugging which would involve monitoring the remote server and the components in between, my colleague thought it would be sensible to check the setting surrounding Schema Validation as we were obviously throwing the error in a routine related to this functionality (FDoPerformSchemaCheckForQuery). So we changed the option of the linked server to say enable "Lazy Schema Validation", using this syntax:

    exec sp_serveroption xxxxx, 'lazy schema validation', true

    If you look this option up in MSDN under "Configuring OLE DB Providers for Distributed Queries" you'll note the following regarding this option:

    "If this option is set to false, the default value, SQL Server checks for schema changes that have occurred since compilation in remote tables. This check occurs before query execution. If there is a change in the schema, SQL Server recompiles the query with the new schema.

    If this option is set to true, schema checking of remote tables is delayed until execution. This can cause a distributed query to fail with an error if the schema of a remote table has changed between query compilation and execution.

    You may want to set this option to true when distributed partitioned views are being used against a linked server that is running SQL Server. A specified table that participates in the partitioned view may not be actually used in a specific execution of a query against the view. Therefore, deferring the schema validation may improve performance."

    Now in all truth this was quite a pragmatic approach, in that we hadn't fully debugged the problem, we were just taking an educated guess, that we were experiencing a problem surrounding schema validation, and so far, since implementing this option, the errors have not re-occurred, so we've managed to workaround the problem.

    For reference the customer in question was also experiencing other errors in the 7300 to 7320 range, which have also stopped occurring since this option was set. Errors such as

    Error: 7350 Cannot get the column information from OLE DB provider "DB2OLEDB" for linked server "xxx"

    Error: 7321 An error occurred while preparing the query "xxx" for execution against OLE DB provider "DB2OLEDB" for linked server "xxx"

    Error: 7303 Cannot initialize the data source object of OLE DB provider "DB2OLEDB" for linked server "xxx"

    The moral of this story is that linked servers can be very difficult and time consuming to troubleshoot, due to the number of distributed components, including ones outside of the SQL Server area completely. This is one time where I am a fan of testing some of the different settings in the provider to see if the problem can be worked around initially, before doing some in depth analysis. In most other cases I'd far prefer to know why, and we may yet revisit this particular situation to fully debug it, but sometimes a workaround and an error free production environment is also nice for a quick win.

    For future reference another interesting quick diagnostic that can be performed on linked servers is to run Process Monitor on the server in question to see whether you are having any obscure permission of registry problems. I've seen this give some quick wins as well.

Page 1 of 3 (64 items) 123