SQL in Stockholm

SQL Server thoughts from the support team in Stockholm

SQL Server troubleshooting thoughts from Graham Kent, (ex) Senior Support Escalation Engineer at Microsoft Sweden.

  • SQL in Stockholm

    SQL Server Assertions and Dumps – a typical example and some proactive help

    • 0 Comments

    I recently worked with a customer who was seeing the following error in their error log:

    SQL Server Assertion: File: <qxcntxt.cpp>, line=1069 Failed Assertion = 'cref == 0'
    Could not find an entry for table or index with partition ID 491897996509184 in database 2

    As you would expect a mini dump was produced at the same time, which detailed the stored procedure that was being run when the assertion occurred.

    In the case of such errors I would always advise you to contact us here at PSS so that we can analyze the dump and provide an explanation or solution. However this particular case reminded me of an interesting side issue where you can proactively help yourself, before contacting PSS.

    You should consider running SQL Server Best Practice Analyzer on your server. You can download it from here. We regularly see cases where the running of the tool could have identified known issues in advance. When you run the tool initially, it checks for updates to the tool itself, but more importantly it checks for updated configuration files. For your reference (and if you're nosey like me) you can look at the configuration file it has downloaded here:

    C:\Program Files (x86)\Microsoft SQL Server 2005 Best Practices Analyzer\EN\SqlBPA.config.XML

    (removing the x86 from the directory path  if you don’t have a 64 bit installation of windows)

    Within PSS nowadays we add rules to this tool when we find interesting new issues or where we find common causes of problems that our customers experience. If you run the tool regularly you’ll pick up our latest updates, things that we in PSS think are important you should check for, and check for them automatically. A large number of the support cases that I work on contain known issues of configuration and the like which could be avoided or at least highlighted by this tool.

    The reason that I mention this now is that the tool currently contains a rule for a known issue with a driver from Dell which can cause random data corruptions which show up as assertions and errors similar to the one above.

    For reference the driver in question is LSI_SAS.SYS where the version is less than 1.21.26.01. If you notice that you are running an environment that has the matching configuration options ( >4GB of system memory and on servers with a SAS backplane, Microsoft Windows 2000 or 2003 and Dell OpenManage Server Administrator Storage Management Software) to be susceptible to this problem, you can download the update which resolves it here.

    It actually turned out that this was not the cause in this case, but it’s always good to check these things, and the BPA is a great way of doing it.

    The key to actually finding the cause of this particular error was to note the rest of the SQL Server error logs, which were littered with errors 602, 676, and 677. For example:

    Error: 602, Severity: 21, State: 30.
    Could not find an entry for table or index with partition ID 491897996509184 in database 2. This error can occur if a stored procedure references a dropped table, or metadata is corrupted. Drop and re-create the stored procedure, or execute DBCC CHECKDB. 

    Error 676
    Error occurred while attempting to drop worktable with partition ID 491897996509184

    Error 677
    Unable to drop worktable with partition ID 491897996509184 after repeated attempts. Worktable is marked for deferred drop. This is an informational message only. No user action is required.

    In themselves, these errors are not completely conclusive, as further memory dumps may be produced, which again would also need to be analyzed. However sometimes they are caused by this problem http://support.microsoft.com/default.aspx?scid=kb;EN-US;937343

    If you are getting errors such as these, or others like them, you should consider what the errors themselves recommend. You should run DBCC CHECKDB to check your data integrity (a data corruption can cause all manner of different assertions depending upon its nature) and you should check your hardware for possible problems as well.

    I guess what I’m trying to say in conclusion is that there doesn’t have to be one single simple cause when you look at particular errors such as these. The basics still always apply;

    You should run DBCC CHECKDB regularly (this is another BPA rule it will highlight it if you haven’t)
    You should keep an eye on the overall health of your environment, especially the storage
    You should keep an eye on your SQL Server error logs and consider all the events together.

    The answer to this actual problem? Well it was the hotfix noted above, but we had to collect some dumps to prove this definitively, and that’s something that you do need PSS involvement with.

  • SQL in Stockholm

    SQL Server Performance Assessment – What are my top 20 expensive batches

    • 1 Comments

    Previously here , I discussed the nature of collecting data when you are experiencing a SQL Server performance problem. A performance problem sometimes intimidates people as they are not sure where to start. This post details one of my preferred ways to take a quick overview of what the situation is.

    Let’s assume that using one of the techniques described, you have some SQL Profiler data. One of the first things that I like to do is aggregate the data together into a database so that I can then simply query it for both an overview of aggregated problems or fetch details of individual batches quickly and easily, using TSQL.

    My current preferred method of doing this is to use the readtrace tool that installs as part of RML Utilities. You might choose to run readtrace.exe from the command line (as I do), or you might prefer to also install SQLNexus and set the options there to have it run readtrace for you. Either way at the end of it you’ll be left with a database that contains the readtrace schema.

    What I like to discover first off, is answer the question

    “Are there actually SQL batches that are taking a long time to run?”

    This may sound obvious, but considering the number of different types of performance problems, and the different sub sets of data you might want to analyze, this provides an immediate split of the problem into separate areas. With this data you can follow a line of questioning like this:

    1. Do I have long running SQL batches?
    2. If yes – is it the same batches that are long running all the time?
    3. Is it a small number or even a single batch that is long running?
    4. Can I recognize a pattern to this behavior? Is the result of a recent application change?
    5. Is this a crazy query being issued manually by a user?

    And so on….the point being as well that if the answer to the first question is “no”, then maybe you should be looking elsewhere within your application architecture for the problem, or maybe you should be collecting additional or different diagnostics. There’s no hard and fast rule here, but it’s always nice to have some sort of an idea of what direction you might wish to proceed in.

    Consider the following example. You make a small change to a previously healthy application and suddenly all your users start complaining of unbearably slow performance. You collect the profiler data and run it through readtrace. When you query the readtrace data you find that a single SQL batch is always at the top of every list of the most expensive batches, under every category. This is a stored procedure which acts as a common entry point for all application functionality. Knowing this you can now work out that every user is experiencing application slowness because this common stored procedure always runs slowly. You can now focus on why this specific piece of code is going slow and deal with it as necessary.

    So anyway, the script is attached to this post. It’s based upon one of the old scripts which we used to use internally before the advent of SQLNexus, and has been amended by me to suit my requirements. I often use it to send customers summaries of their badly performing code by email, and so it’s designed to have results output to text. This is also why it only prints the initial 100 characters of each batch, but you can change this easily enough if you have long SQL batches.
    You might note that it only does batches, and not statements. This is because first off when I have a random performance problem, I don’t tend to collect statement level data in profiler as it’s so much more expensive to collect and sometimes can exaggerate the situation. I work on the basis of taking a summary collection and then going back for more targeted data if need be. readtrace does know how to analyze statement level data if you collect it (in fact you’ll note that if you don’t collect it, it warns you) so you can always write your own version on statement level data by following this principal and looking at the readtrace schema which comes with the documentation.

    Finally you might ask, why not do this with DMVs and look at the data real time. Well you can do this, and sometimes I would, but that’s not what this post is about! There are both advantages and disadvantages to this method, and I do have an archive of DMV queries which I’m ready to run should I ever do online troubleshooting of this nature. These are the type of queries which you can also see if you download the performance dashboard reports. But as I say, I really like readtrace because once you have the data in this format, you can run your stock queries like the one attached here, but if you find something out of the ordinary, then you can write something custom to analyze the data in more detail.

Page 1 of 1 (2 items)