Welcome to MSDN Blogs Sign in | Join | Help

I hear the complaint a lot that after a time period of no calls to Reporting Services, the first call to the Reporting Services instance is very slow.  This is totally expected because the application pool generally gets spun down and we need to restart everything.

If you wire up some basic web service calls, you can see that (at least in my testing) the RS2005 web service takes approximately 30 seconds to start up.  The only database work done here is the bare minimum necessary to retrieve the encryption keys and configuration settings stored in the catalog database.  There is no access to actual datasources or reports.

First, I wrote a  basic web method that just returns an empty dataset:

Public Function ReturnBlankDataSet() As Data.DataSet

     Dim m_ds As New Data.DataSet
     Return m_ds

End Function

Then, I wrote some code that had the ability to make a call to a web service and timed how long it took to return from the web method call.  I ran the test against each web service five times.  In between each test, I did an IIS reset to ensure a cold start. 

My test harness code looked like this:

        Dim sr As New IO.StreamWriter("c:\ReportingServicesMetrics.txt")

        'let's loop through 5 times
        For i As Integer = 0 To 4
            Dim ts As DateTime
            Dim te As DateTime


            'let's reset IIS to make sure everything has to start from scratch
            sr.WriteLine("Doing an IISReset - Loop #" & (i + 1).ToString)
            Dim proc As New Diagnostics.Process
            proc.StartInfo.FileName = "iisreset"
            proc.Start()
            proc.WaitForExit()
            sr.WriteLine("IIS has been reset")

            'RS
            sr.WriteLine("Getting ready to instantiate the RS web service")
            ts = Now()
            Dim rs As New RS2005.ReportingService2005
            Dim creds As New Net.NetworkCredential
            creds = Net.CredentialCache.DefaultCredentials
            rs.Credentials = creds
            rs.Url = "http://servername/reportserver2005/reportservice2005.asmx"
            rs.ListChildren("/", False)
            sr.WriteLine("Instantiated the RS web service")
            te = Now()
            sr.WriteLine("RS took " & te.Subtract(ts).TotalMilliseconds.ToString)

            'dummy web service
            sr.WriteLine("Getting ready to instantiate the dummy web service")
            ts = Now()
            Dim dws As New DummyWebService.Service
            dws.ReturnBlankDataSet()
            sr.WriteLine("Instantiated the dummy web service")
            te = Now()
            sr.WriteLine("The dummy web service took " & te.Subtract(ts).TotalMilliseconds.ToString)

        Next
        sr.Close()


I did a FileMon on the startup for each process.  The dummy web service did about 130 file reads, while Reporting Services did almost 17000 reads.  This is because Reporting Services must load up the traditional ASP.NET assemblies, plus all of the specialized Reporting Services assemblies.  It also loads up all the localization files for all the various supported languages.


Reporting Services does a good bit of logging for both ReportServer, Report Manager, and the Report Server Windows Service.  I did not do test with logging turned off because we ship with a default level of logging, you could save some start up time by turning off all logging.  However, it is  not recommended to turn off logging  because of the value traditionally derived from standard logging.

Other things that cause additional overhead when Reporting Services starts up:

  1. Every time the SRS 2005 web service loads, it also has to read and decrypt the rsreportserver.config file
  2. Since there are no connections in the connection pool, we have to physically open up a socket connection between the two servers, plus log into the database instance
  3. The web service has to make RPC calls into the Windows Service to get the symmetric encryption key

This explanation is to try and provide an overview of some of the things that are going on during Reporting Services initial start up.  Remember, most of these things probably do not happen in traditional web applications. Again, all of this is completely expected behavior.

As discussed earlier, if this behavior causes some business issues, you could consider modifying the recycle options on your IIS process.  You can either increase the recycle time (causing them to be recycled less frequently) or schedule the recycle to occur at a non-peak time.  You could then combine this second option with a "ping" process that hits the process shortly after the recycle.  This will "wake" the ReportServer processes so that your initial customer doesn’t see the initialization time.  If you combine these options with turning off the idle worker process shutdown, you can significantly minimize the instances where a user would run into the startup delay. 

The only downside I can see to not idling the worker process is that the process could consume more resources than necessary.  Recycling from time to time is recommended because it cleans up the worker process if you have any leaked memory or fragmentation but it is not necessary from a technical perspective.

Unfortunately, none of this information is documented in any KB articles.  However, some of it is addressed in www.microsoft.com/technet/prodtechnol/sql/2005/pspsqlrs.mspx.  Beyond the counters in the performance document, you could also track the performance of your Reporting Services instance using some execution log reports (http://msdn2.microsoft.com/en-us/library/ms161561.aspx).

Posted By: Evan Basalik

An Analysis Services 2005 query against a calculation on a cell may report the error:  "The expression contains a function that cannot operate on a set with more than 4,294,967,296 tuples."  This may be because there are simply more underlying cells in the query than the supported maximum, but in some cases, the query can be rewritten to avoid the error by breaking out parts of a clause into other clauses.  Specifically, an IIF() statement apparently considers the total number of cells for calculation from both action clauses when counting the number of underlying cells.  But two IIF statements run within the same query are not counted together when the server code checks for this limit.  Each appear to be evaluated seperately, and so long as neither exceeds the 4.2B tuple limit, the query will pass the check and not fail in this way.

 

Here is an example of a calculation that produced this error when run for a high level cell in a relatively large database:

 

IIF(

MDX Condition=true,

MDX Expression 1,

MDX Expression 2

)

 

Replacing the query with the following achieved the same results, but avoided the error:

 

IIF (

MDX Condition = true,

NULL,

MDX Expression 2

)

+

IIF(

MDX Condition = true,

MDX Expression 1,

NULL

)

 

This approach allows Analysis Services to take advantage of internal optimizations since it can eliminate all cells for the first condition with NULL results, regardless of the context of the calculation.  The server is able to use "block mode" evaluation in the formula engine that does not require each result to be evaluated on a cell by cell basis.  This optimization may also improve performance in some cases for the same reason.

 

Posted By: Jon Burchel

 

Many times, the key to solving these errors is a network packet trace.  The problem is that these errors are often thrown intermittently.  This can obviously make it problematic to capture a network trace. 

 

Network Monitor 3 (AKA Netmon3) solves this problem quite nicely.  Netmon3 has the built-in ability to capture rolling traces.  This allows you to start the trace and then walk away and be sure that you will have captured the error.

 

Here are my steps for configuring Netmon3 for a rolling trace:

 

Please see the steps below in order to configure Network Monitor:

 

1.    Download Network Monitor 3

a.    (http://www.microsoft.com/downloads/details.aspx?FamilyID=18b1d59d-f4d8-4213-8d17-2f6dde7d7aac&DisplayLang=en)

 

2.    Install Network Monitor on both the client and server experiencing the problem.  If you are not able to install it on the server you can install it on another machine that is connected to the same hub as the server or to the admin port of the switch that the server is on.  If you go this route you must synchronize the time between the client, server, and this third machine.

 

3.    Synchronize the clocks on the Client and SQL machines

a.    At one of the machines bring up a command prompt and execute the following:

 

                                         i.    Net time <\\machinenamewewanttosynchwith> /set /yes

 

4.    Turn off TCP Chimney if any of the machines are Windows 2003

a.    bring up a command prompt and execute the following:

 

                                         i.    Netsh int ip set chimney DISABLED

 

5.    Using the NMCap command-line utility, start the chained trace (chained = create a new capture file after reaching the “Temporary capture file size” rather than overwriting the current capture)

a.    [Netmon 3 installation folder]\NMCap /network * /capture /file test.chn:100M  (creates 100 MB chained files)

 

                                         i.    NOTE:  YOU MUST USE THE .CHN EXTENSION TO CAPTURE CHAINED TRACES

 

                                        ii.    NOTE:  The target folder must exist for NMCap to create the trace file    

 

b.    More information can be found:

                                         i.    http://blogs.technet.com/netmon/archive/2006/10/24/nmcap-the-easy-way-to-automate-capturing.aspx

                                        ii.    Or by running the command-line switch /examples

                                       iii.    Or by running the command-line switch /?

 

6.    Get the IP addresses of the client and the server

a.    ipconfig /all on the client > client.txt

b.    ipconfig /all on the server > server.txt

 

7.    Reproduce the problem, please note the exact time the problem reproduces.  Also, please note the exact error message.

 

8.    Stop the trace by hitting Ctrl-c in the command-line window

 

Please note that Netmon3 does not run on Windows 2000.  Also, step #6 is probably the single most important step (besides actually starting the trace).  If you don’t know the timestamp of the error, it is extremely difficult to find the offending packets in the trace.

 

Happy captures!

 

Posted By: Evan Basalik

In this series of posts, we’re going to take a look at some of the tools we at Microsoft Developer Support -- Database commonly use, how to use them, and the types of issues they can be used to resolve. 

 

In this article, Enamul Khaleque starts down the road of using WinDbg and AdPlus. Future articles will move beyond identification, and discuss step-by-step approach of using WinDbg to solve wide array of issue including but not limited to: memory leaks, high & low CPU hangs, heap corruption, race conditions and crashes/exceptions in both managed and unmanaged environments.

 

The Purpose

You may be wondering why I am talking about WinDbg and AdPlus while you already have Visual Studio debugger. If you ask your network/system admin to install Visual Studio in a production environment, the odds of hearing a big NO is 99.99% if not 100%. That’s because Visual Studio is an extensive development environment and not suited for a production environment. Fortunately, we can capture runtime state in a file (called a memory dump or simply dump) using light-weight tools such as AdPlus and later analyze them using WinDbg. For this article, I will assume you have no prior experience in WinDbg or AdPlus.

 

 

Installation

 

WinDbg and AdPlus are part of Debugging Tools for Windows. Both 32-bit and 64-bit versions are downloadable from following URL:

http://www.microsoft.com/whdc/devtools/debugging/default.mspx

 

 

AdPlus Basics

 

Imagine that after upgrading your application in production, you are now seeing significant response delay. You are not able to reproduce this behavior in a Dev, Test or QA environment, so your only option is to capture the application state while the latency is occurring at runtime. With AdPlus, you can capture a “hang mode” memory dump to determine what the process is waiting on.

 

Instead of seeing delays, let’s say this application is crashing in the production and you can’t determine why. With AdPlus, you can also capture “crash mode” memory dump to determine what causes the process to crash.

 

Typically in a hang or a memory leaks scenario, you take multiple dumps in few minutes interval. This helps to understand and analyze the state of the application over a period of time.  One thing you need to keep in mind that a dump file contains only the state of the process at the time the dump was taken, so multiple dumps are useful when we want to see the process state over a time span.

 

Live debugging should be your first option since dump is limited to show conditions only of what already happened. Also you can’t set conditions in dump as you can do in live debugging. You capture dump when application is running in an environment live debugging is not at all possible. Here is a list of typical scenarios when we want to capture a dump file:

 

·         Application hangs with high or low CPU cycle

·         Application is leaking memory

·         Application is crashing or throwing exceptions

·         Heap corruption

 

The ADPlus.vbs script itself does not require any specific configuration to use it, however you might be prompted to change your default script interpreter from WScript.exe to CScript.exe. Allowing ADPlus.vbs to set CScript as the default script interpreter is strongly recommended. If you continue to use WScript as your default script engine, then you will see a separate window when you run a command against ADPlus.vbs. Optionally, you can place CScript at the beginning of your command line, but you must include the ‘.vbs’ extension on ADPlus.vbs.

 

AdPlus creates dump files. These dumps contain stack information about the process(s) you are monitoring. This dump file and stack information relies on symbols (.pdb) to resolve that stack information. If a symbol path is not specified in the command line, it will use the _NT_SYMBOL_PATH environment variable if one is set, otherwise stack information in the dump will be limited. Here is how you can set the symbol path at the machine level. 

Set  _NT_SYMBOL_PATH=SRV*http://msdl.microsoft.com/download/symbols

To use AdPlus, you must specify a series of command-line options or arguments to the script. At a minimum, AdPlus requires two command-line options: one that specifies the mode of operation, and one that specifies a target process to operate against.

C:\> Adplus.vbs –hang –p 123 –o C:\Dumps

First, a run mode is required when passing arguments; here we used –hang. The run mode has two options –hang and –crash. The second mandatory parameter is the process to monitor. Here we are using the –p switch to tell ADPlus.vbs to capture a hang dump of a process with the process ID of 123. When AdPlus is running in hang mode, AdPlus must be started after the process hangs or is consuming high CPU utilization. 

C:\> Adplus.vbs –crash –p 123 –o C:\Dumps

Crash mode is supported in a Terminal Server session on Windows XP and later versions of Windows. Hang mode will work inside a Terminal Server session on any platform. When AdPlus is running in crash mode, AdPlus must be started before the process crashes or becomes unstable.

 

When ADPlus.vbs is running in crash mode, a debugger remains attached to each process that is specified on the command line for the lifetime of that process. To manually detach the debugger from the process, you must maximize the debugger window and press CTRL+C to break into the debugger.

 

WinDbg Basics

 

WinDbg is the ultimate debugger both for User mode and Kernel mode debugging. WinDbg is used mostly for analyzing dumps, yet you can capture dumps with this just like AdPlus.

 

In our next article on this series, we will see step-by-step examples of capturing dumps with AdPlus and then analyzing the dump with WinDbg.

 

What’s Next

 

In future posts, we will deal with this vast subject with one scenario at a time including but not limited to memory leak, High & low CPU hang, Heap corruption, Race condition and crash/exception both managed and unmanaged environments. We will also see how to leverage WinDbg with extension DLLs to make our debugging more pleasant.

 

References

 

HOWTO: Use AdPlus to troubleshoot "hangs" and "crashes" http://support.microsoft.com/default.aspx?scid=kb;en-us;Q286350

 

Posted By: Enamul Khaleque

How often should I expect to hear from my Support Engineer?

This will vary based on the issue and its impact to your business.  If you have specific preferences for frequency of contact, time of day, or method (prefer phone vs. e-mail), please discuss this with your support engineer.

 

 

My Support Engineer is not available and I need help now.

If you need immediate assistance and your engineer is unavailable, please contact the engineer’s manager.  Manager contact information is in the engineer’s signature block and in the initial e-mail. 

 

If you are unable to reach the manager, you can call the Microsoft Support line (800-936-5800), provide your case number, and request to speak with the next available support engineer.  You will be routed to the phone queue for that specialty and connected to an engineer.  The Support Engineer receiving the call may need time to review the case history and steps taken to date, depending on the duration and complexity of the issue.

 

 

My support issue is not progressing well.

Each Support Engineer should provide a management contact in his/her signature block and in the initial e-mail.  If you have:

 

·         Schedule conflicts with your assigned engineer’s availability.

·         Communications issues.

·         Need a faster resolution.

·         Or are otherwise dissatisfied with the support provided, please engage the engineer’s manager. 

 

We strive to deliver outstanding customer service, and we appreciate you letting us know if we are not meeting your expectations.

 

 

I need a different engineer.

As mentioned in the preceding section, “My support issue is not progressing well”, you may contact the engineer’s manager and request a different engineer.

 

 

My Support Engineer has suggested that we archive the case.  What does this mean?

If there is going to be an extended delay before you can implement a change or collect requested data, we may ask to archive your case.  This means that the case will be closed on our end.  You can reopen the case if you encounter the same error on the same instance/server again, or if you have the data needed to continue troubleshooting.

 

In order to reopen the case, contact the support engineer or his/her manager directly.  You may be asked to provide feedback on support.  Please assess the support based on the service received to date, with the understanding that Microsoft does not consider an archived case resolved.

 

 

We thought that our issue was solved, but the problem has recurred.  Can I re-open my support case?

As long as the problem and the server are the same as in the case, you can re-open the case.  To do this, first try to reach the previous engineer or his/her manager.  If unable to reach them, contact the Microsoft Support line (800-936-5800), provide your case number, and request that Microsoft re-open the case. 

 

Note that in some cases, the same symptoms can be caused by multiple issues.  For example, if a server experienced slow performance and the disk I/O subsystem was fixed and performance restored, and later performance degraded again, this could be due to out of date statistics or other issues.  In these ambiguous cases, you may be asked to open a new case.

 

 

What do the Microsoft Case Severities mean?

A – Critical – 1 hour initial response goal

·         High-impact problem in which production, operations, or development is proceeding but is severely impacted, or where production and/or profitability will be severely impacted within days.

 

B – Urgent – 4 hour initial response goal

·         High-impact problem where production is proceeding, but in a significantly impaired fashion.

·         Time sensitive issue important to long-term productivity that is not causing an immediate work stoppage.

 

C – Important - 24 hour initial response goal

·         Important issue which does not have significant current productivity impact for the customer.

 

D – Monitor

·         This severity is used for support issues that are waiting, usually to confirm a provided solution or when waiting for an intermittent problem to recur, in order to collect needed data.

 

The Support Engineer for my support issue said that I need to open an additional support incident.  Why should I have to open another incident?

Microsoft defines an incident or case as:

·         A single support issue and the commercially reasonable efforts to resolve it

·         A problem that cannot be broken down into subordinate problems

 

NOTE:  The Customer and the Support Engineer must agree on the case resolution.  Also be aware that Microsoft does not charge for code defects, documentation errors, and product “wishes”.

 

As documented on Microsoft’s public website at http://support.microsoft.com/?LN=en-us&sd=tech&scid=gp%3Ben-us%3Bofferprophone&x=7&y=16#faq607

 

EXAMPLE:  A support issue is opened for slow performance of an application with a SQL Server back-end.  Troubleshooting identifies a specific stored procedure as the problem, and updates to statistics and an increase in covering indexes resolves the issue. 

 

However, the same server is also experiencing slow performance for merge replication.  This is considered a different issue, which is unrelated to the original issue for which the support case was opened.

 

 

I’m interested in a higher level of service and a closer relationship with Microsoft.

Microsoft offers a managed support relationship known as Premier Support, designed to meet the needs of an enterprise customer.  For an overview of the service elements, please see http://www.microsoft.com/services/microsoftservices/srv_prem.mspx  or call 1-800-936-3500 to request contact by a Premier Support representative.

 

 

Advisory vs. Break-Fix

Microsoft provides support services for customers with problems encountered implementing or using Microsoft products.  These are commonly known as “break-fix issues”.  Occasionally customers need short-term (less than 40 hours) consultative assistance, as opposed to the standard break-fix service.

 

This short-term consultative assistance is known as Advisory service.  This is available to Microsoft’s Premier customers as an element in their support contracts.  It is also available to other customers on an hourly charge basis.

 

For more information on Advisory Services, please see the Advisory Services section at http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone or go to the Advisory Services page, http://support.microsoft.com/gp/advisoryservice .

 

Hope this helps!

 

Posted By: Microsoft SQL Server Support Team

 

In this series of posts, we’re going to take a look at some of the tools we at Microsoft Developer Support -- Database commonly use, how to use them, and the types of issues they can be used to resolve. 

 

In this article, Adam Saxton starts down the long road of double-hop troubleshooting by providing a gentle introduction to Kerberos, and how to recognize Kerberos+SQL related issues.  Future articles will move beyond identification, and discuss the tools and processes we use to troubleshoot such issues.

 

Kerberos

We receive numerous calls where the problem falls under Kerberos.  This will present itself when trying to establish a connection to SQL Server (or a Linked Server).  I’m going to discuss what Kerberos is, how to recognize when there is a problem with it, and go over some of the items that need to be looked at in order to resolve a Kerberos issue with SQL Server.  As there is a lot of information on this topic, I’ll be splitting it up over a few blog posts.

 

 

What is Kerberos?

Kerberos is a secure method for authenticating a request for a service in a computer network, and is used to verify a user or host identity. Kerberos lets a user request an encrypted "ticket" from an authentication process that can then be used to request a particular service from a server. The user's password does not have to pass through the network.

 

Put simply, it is a more secure form of Windows Authentication versus NTLM.  Also, understand that Kerberos is not a Windows specific technology.  Windows will always try to use Kerberos first over TCP.  If that doesn’t work, it will drop down to NTLM.  The exception to this is if you are trying to access a resource that is local.  NTLM will always be used when accessing local resources.  Also note that Kerberos cannot currently be used over Named Pipes or Shared Memory.

 

 

How does Kerberos work?

There are a lot of details into how Kerberos works.  I’m not going to get that deep into it within this blog series.  I’m going to relate this to how it works with SQL.  To do that, I’m going to use the classic example of a Double-hop with a Web Server.  We use the work Double-hop to explain that the IIS Server is using a resource that is located on a different server.  In this case the first “hop” is from the web browser client to the IIS Server (ASP or ASPX Page); the second hop is to the SQL Server on a different machine. The server on the second hop is also referred to as the Backend Server.

 

In order to successfully establish a connection using Kerberos, a SQL ServicePrincipalName (SPN) needs to exist within Active Directory.  I will talk more about SPN’s and where they are located in a later post as the topic is complex enough to deserve its own.  I would say about 80-90% of calls we receive relating to Kerberos involve either missing SPN or duplicate SPN’s.

 

When you log onto your machine, you will get what we call a Kerberos Ticket.  To be more specific, you get a Ticket-Granting Ticket (TGT).  You use the TGT as a master ticket to access all Kerberos services on a network. A Kerberos ticket includes all the user credentials or computer credentials in an encrypted format. These credentials are used to identify a specific user or a specific computer on a network for access to Kerberos services.  When you access a Kerberos service, that service uses the ticket that you received to authenticate you.  After the service verifies your identity and authenticates you, the service issues a service ticket. This is where the SPN’s come into play.  Think of the SPN as a pointer for Kerberos so it knows where to go.  That’s why, if it’s missing or there are duplicates, it doesn’t know what to do.

 

 

How to recognize a Kerberos error with SQL Server:

There are a few errors that we see where we can make an educated guess that a connectivity issue is Kerberos related. 

 

Cannot Generate SSPI Context

 

Login failed for user ‘(null)’

 

Login failed for user ‘NT AUTHORITY\ANONYMOUS’

 

Those are the three biggest errors we see in relation to Kerberos with SQL Server.  These errors occur when the credentials from the first hop cannot be passed to the second hop.  Usually, this is because we are failing over to NTLM or our SPN configuration is not correct.  There are also other settings within Active Directory that come into play as well as certain machine configuration based on how services are setup.  For now, let’s focus on the basics, so I’ll get more into those setting and machine configurations in a later post.

 

One of the last things I’d like to mention is that I’ve seen customers get the following error confused with a Kerberos issue:

 

Login failed for user ‘SomeUser’

 

The SomeUser could be whatever user (SQL or Windows) that you are trying to connect with.  If you see the actual username in the error that means the credentials actually got to the SQL Server.  The issue at that point lies in the SQL Security configuration itself (either the server as a whole, or the database you are trying to connect to).  This is not a Kerberos related issue, but instead is a much simpler login failure.

 

Resources:

 

Kerberos in Windows 2000

 

Kerberos in Windows 2003

How to use Kerberos authentication in SQL Server


Posted By:       Adam Saxton

In this series of posts, we’re going to take a look at some of the tools we at Microsoft Developer Support -- Database commonly use, how to use them, and the types of issues they can be used to resolve.

 

This post might seem like a bit of a stretch.  For one thing, it’s not about a SQL related tool.  That’s not such a big deal – there are plenty of tools that we use that have applications beyond SQL.  You can expect more posts in this series about tools that have applications far beyond troubleshooting databases and database connectivity.

 

However, this isn’t about a specific, pre-existing tool either.  Call it a meta-tool, I guess.  Pseudo-tool.  Something along those lines.  One way or the other, even if it isn’t already built for you, a reproduction can be one of the most invaluable and time-saving tools you’ll use to troubleshoot your most difficult issues.

The Repro

In the Product Support Services world at Microsoft, we refer to a simplified, stripped down series of steps that lead to a specific behavior as a Repro.  This may simply be a step-by-step to recreate an error, or a simple application that recreates an unusual behavior.  When we are working on an issue that is buried in a hulking, production application, we often like to tackle it is by attempting to recreate the issue outside of the in vivo environment.  There are a lot of benefits to investing some time in developing a Repro.