Welcome to MSDN Blogs Sign in | Join | Help

Hey There – It’s been a while since I woke up my blog, and I've seen a pattern of questions in the forums and some of the report writers I work with, so I thought I'd post some suggestions for dealing with tricky TSQL problems in Report Datasets.

Types of problems I see people encounter

Most report writers start off their efforts with relatively straitforward SELECT statements, format it in a table and, voila!, a killer report. Then inevitably, a report consumer will respond with something similar to "Well, it looks pretty good, but I'd like to have it........", and the next thing you know, the reports have long lists of parameters, hundreds of lines of TSQL in report queries, spread across numerous dimensions of linked / dependent reports, parameter customized TSQL and so on; it can get out of control quickly. To be more specific:

Multi Value Parameters

A multi value parameter must be built into a list that is used as "IN (@parmname)" in a dataset query. When SQL Server executes the statement, it needs the list to look very neat and clean similar to (1,2,3,4...) or (a,b,c...). This works fine when you have a nice clean list of multi-parameter values like "1,2,3" or "a,b,c" but when you start populating the multi-value parameter with a query, and the query returns non-int datatypes, nulls, and strings that contain spaces, quotes, or commas, this can become a hairpulling, frustrating experience.

Large TSQL Batches

Several of my reports use queries that join 10+ tables, with derived tables, temp tables, cursors and so on. Given the size of the query, it can become a maintenance nightmare when I need to tweak it a bit. The report designers have pretty primitive TSQL query editors, and I always seem to forget all the places they need to be changed across numerous datasets and reports.

Hierarchical Reports/Queries

I often have a main report that shows aggregated information for a Group, it will have drillthroughs to a report aggregated for a Team, with another drillthrough to the details for an Individual. Considering that these three reports all use the same basic "Large TSQL Batch" I described above, and they only differ in the aggregation level of the query and the grouping design in the report, when a minor tweak needs to be made to the query logic, the maintenance nightmare is compounded by the number of affected/related reports.

Static TSQL Limitations

When the complexity of the report grows with many parameters and sorting/filtering logic requirements, static TSQL queries can be limiting and can start performing very poorly. For example, parameters can be used to simulate ad-hoc query logic similar to "show me the data for the last 7 days or for the last 24 hours", "I want to sort by columns a,b,c or by c,b,a", "I would like all the data between x and y, or > x, or <= y". Trying to support these types of ad-hoc logic in a static query can either require lots of "OR" logic (which is bad for getting good query plans from the SQL Server query optimizer), or may just be impossible.

Tips for dealing with them

Code Your Queries in SQL Management Studio, not in the Report Designers

I've spent a great deal of my database career living with the various TSQL interfaces and editors; first was the DOS prompt OSQL/ISQL command line editors ("They're all I'll EVER need"-yeah right), then the revolutionary Windows-based ISQLw editor that was basically Notepad with the ability to run the query, then the SQL Enterprise Manager allowed you to write queries and manage the database with a GUI, and finally today's SQL Server Management Studio (aka SSMS here) does all that with live, context-sensitive syntax verification, auto-suggested object names, and efficient handling of large result sets.

The reason I mention all this is to point out that my peers in the next building have built some very powerful query writing tools, and while my peers down the hall have built powerful report creation tools, the report designers are not meant to be powerful query writing tools. If you find youself coding a lot of large queries in the report designers, please, get into the habit of copying/pasting the report queries to/from the designer and SSMS. Once you get used to declaring/setting parameter values when running in SSMS, and then commenting them out in the designers, life will become much more pleasant for you.

Pre-Calculating Parameter Values in TSQL Variables

Report parameters are most commonly used in queries similar to

SELECT .... WHERE datecolumn < @maxdateparm"

 In the "show me the data for the last 7 days or for the last 24 hours" example above, this could be coded in the dataset query as:

SELECT ... FROM ... WHERE

(@datenameparm = 'Days' and datecolumn > DATEADD(dd,@datevalueparm,GETDATE())

or

(@datenameparm = 'Hours' and datecolumn > DATEADD(hh,@datevalueparm,GETDATE())

However when you start OR'ing the queries like this they become much less efficient. An alternative would be to declare and set a TSQL variable in the dataset query outside the SELECT statement, and then use it in the SELECT statement, for example:

DECLARE @mindate DATETIME

SELECT @mindate =

CASE WHEN @datenameparm = 'Days' THEN DATEADD(dd,@datevalueparm,GETDATE())

ELSE DATEADD(hh,@datevalueparm,GETDATE()) end

SELECT ... FROM ... WHERE datecolumn >@mindate

Moving Queries into Views and Stored Procedures

Bear in mind that a dataset query is just a batch of TSQL statements that the report processing engine sends blindly to SQL Server. A long as SQL Server returns a result set, the report processing engine doesn’t know or care what is in the batch; it can be anything from a simple SELECT statement, to a complex script with expressions, temp tables, control of flow, insert/update/delete, database modifications and so on. Reports are generally intended for read-only data retrieval query batches, but a valid batch is a valid batch, and you can get really creative with them.

Once your dataset queries become larger, more complex, interdependent, and re-used in more places, it becomes more and more necessary to apply CS principles and abstract the complexity from the dataset queries.

SQL Server supports “Views” which are basically a packaged SELECT statement that can then be queried with SELECT as if it’s a table, and “Stored Procedures” that can contain any TSQL statements but must be called with the EXEC(UTE) statement. SQL also supports functions (static, dynamic, and table valued) and other programmatic constructs, but views and stored procedures usually fit my reporting needs.

In my “hierarchical-reports/queries” example above, I create a set of hierarchic views that are used in hierarchic reports. I start with the lowest level detail query that contains the bulk of the join and filtering logic along with the rest of the columns that will be used for higher level groupings:

CREATE VIEW PersonBugsView as

SELECT g.GroupName, t.TeamName, p.PersonName, b.Priority,‘BugCount’=count(distinct b.BugId)

FROM Bugs b

JOIN Persons p on p.PersonId = b.PersonId

JOIN Teams t on t.TeamId = p.TeamId

JOIN Groups on g.GroupId = t.GroupId

WHERE b.Status = ‘Active’ and b.Type = ‘Code Defect’ and b.Release = ‘Current Release’

GROUP by g.GroupName, t.TeamName, p.PersonName

Then I create the other hierarchical views on top of this detail view:

CREATE VIEW TeamBugsView as

SELECT GroupName, TeamName, Priority,‘BugCount’=sum(Bugcount)

FROM PersonBugsView

GROUP BY GroupName, TeamName, Priority

 

CREATE VIEW GroupBugsView as

SELECT GroupName, Priority, ‘BugCount’=sum(Bugcount)

FROM TeamBugsView

GROUP BY GroupName, Priority

Once the view are created and tested to be correct (in SSMSJ), I create the set of matching reports. The PersonBugsDetail report would have parameters for GroupName and TeamName, and the dataset query would be:

SELECT PersonName, Priority , BugCount

FROM PersonBugsView

WHERE GroupName = @GroupName and TeamName = @TeamName

The TeamBugsSummary report would have a parameter for GroupName and the dataset query would be:

SELECT TeamName, Priority , BugCount

FROM TeamBugsView

WHERE GroupName = @GroupName

And the report table would have a drillthrough action on the BugCount textbox that calls the PersonBugsDetail report with the GroupName and TeamName parameter values.

And finally, the GroupBugsSummary report would have a dataset that queries the GroupBugsView:

SELECT GroupName, Priority, BugCount

FROM GroupBugsView

And again here, the report table would have a drillthrough action on the BugCount textbox that calls the TeamBugsSummary report with the GroupName parameter value.

The value of views here is that you could reuse them across several different reports, and when (not IF) the query logic needs to be changed, you could just change the detail view, and the effects would automatically be applied to all of the views and reports without having to edit each.

There are a few pitfalls and counter-arguments to this approach. Abstracting the logic away from the report makes them more complex; many of my report plagiarizers like to steal my queries or look at the logic used in the report queries, and they don’t know where the view is stored or how to look at it, this can be a good or a bad thingJ. Another argument is that you could communicate the same information in a single report on the detail query, and let the report table’s grouping expressions conditionally display the per-Group/Team information, and it will work fine for relatively small data sets. But if you’ve read my blog posting here http://blogs.msdn.com/deanka/archive/2009/01/13/pet-peeve-slow-reports.aspx, large datasets along with complex grouping can result in slow running reports, and I hate that.

Another suggestion for dealing with complex dataset queries is to use stored procedures, they can be useful in many situations. For example, as I mentioned in the above blog posting, report queries can run for a long time if they join many tables and require “read” locks on several tables that are heavily used by other processes reading and updating the data. In this case, you can split the operations into stages, store the information from each stage in temp tables or table variables, and return the final data from these tables. And many reporting needs cannot be fulfilled by SELECT statements (from views or tables) alone.

I used this method for a set of daily reports used by my team to communicate the labrun verification needs (and ensure that Reporting Services is bug freeJ!) for our group. Our labrun automation database is used continuously for hundreds of testers and lab machines, and hundreds of thousands of tests. In addition to that, my manager requested that the report data be sorted by the 1-n priority of the labruns, not an easy task with SELECT or report expression logic alone.

In this case, I had to split the query operations into chunks and store the data in temp tables, and write a cursor to loop through the data and assign a priority to each. The final TSQL batch was over 300 lines and had numerous temp tables, control of flow, date conversions and calculations and so on. I shudder at the thought of embedding this in each of the reports that need the same basic data.

Note that if you do use #temp tables in stored procedures used in reports, the query designers may give you an error similar to:

Msg 208, Level 16, State 0, Procedure TestProcedure, Line 3

Invalid object name '#temp'.

I’ve discussed the issues and solutions in a forum posting here:

http://social.msdn.microsoft.com/Forums/en-US/sqlreportingservices/thread/4f62de2a-4efb-4cf2-8c84-cf0f62a99a11

but the net recommendation is to use table variables instead.

And finally, remember to GRANT the necessary SELECT or EXECUTE permissions on the views and procedures created for your reports; I’ve spent a lot of time testing views, procedures, and reports locally only to find they fail miserably on the server when the report’s datasource used a different login from mine, and that login didn’t have permissions.

 

Dynamic SQL Statements - EXEC()

Many reporting situations require ad-hoc query type logic that can be difficult or inefficient with static dataset queries. If the logic only affects the filtering of the data in the WHERE clause of the query, you may be able to get away with using complex OR constructs, but the query will be inefficient. If you need to have SORTing specified dynamically at runtime, you may be able to work with interactive or expression-based sorting in the report definition, but these have their limitations, complexities, and inefficiencies as well. In these cases, dynamic SQL statements can be used.

The basic trick is to build up the SELECT statement as a text string based on parameter values, and execute it using the TSQL “EXEC()” statement. The one reporting specific requirement is that the final result set has to have the same set of column names and datatypes with each permutation of your dynamic query. The number of rows, the values in them, and the order can be dynamic, but the names and types must be consistent.

For example, consider a situation where you want the data to be sorted by any combination of three columns, and you want the data to be filtered by any column name and value, all specified by the user at report run time. For a report based on the PersonBugsView mentioned above, you could define 3 report parameters for sorting and a set for filtering:

ParmName     Available values

SortCol1          GroupName, TeamName, PersonName, Priority, BugCount

SortCol2          GroupName, TeamName, PersonName, Priority, BugCount

SortCol3          GroupName, TeamName, PersonName, Priority, BugCount

FilterColName PersonName, Priority, BugCount

FilterOperator <,>,<=,>=,<>

FilterValue      (dynamic, based on the FiltercolName specified)

DECLARE @sql varchar(1000) – be sure it is big enough to hold the whole query!

SELECT @sql =

‘SELECT GroupName, TeamName, PersonName, Priority, BugCount

FROM PersonBugsView

WHERE ‘ + @FilterColName + ‘ ‘ + @ FilterOperator + ‘ ‘ + @ FilterValue + ‘

ORDER BY ‘ + @SortCol1 + ‘, ‘ + @SortCol2 + ‘, ’ + SortCol3

Print @sql – Note that this needs to be commented in the report but is good for debugging in SSMS

EXEC(@sql) – This will run the query, it better be syntactically correct!

There are a few very important considerations when using this technique, first and foremost is SQL Injection attacks; any time you dynamically build a query string based on user provided strings, you run the risk of a hacker figuring out a way to make the query do something far different (and worse) than you intended. There are a few mitigations to this attack; ie only allow available values you specify, not freetext strings, and ensuring that the login used by the report’s datasource only has limited necessary SELECT permissions in the database. But in any event, you should study up on this attack and understand the implications completely before using this technique.

Another consideration is that the dynamic statement must always be syntactically correct for all possible values specified by the users. In my code example above, the @FilterValue may need single quotes for strings and no quotes for numbers, and the statement would be invalid if any of the parameters send blank values to the expression.

And again, be sure to code and test this in SSMS using the PRINT statement to display the built query so you can see what is being executed, and be sure to comment out the PRINT statement when you’re pasting it into the query designer.

Using SQL Profiler to see what query is being sent to SQL Server

SQL Profiler is an extremely valuable tool for reverse engineering what queries an application is sending to SQL Server, and this applies to the report designer tools and the Reporting Services server applications as well.

SQL Profiler is somewhat of a “man in the middle” between the application and the SQL Server executing its queries. It works by registering a set of events you want to trace in SQL Server, and when those events happen, SQL Server returns information associated with the event. The key events in this context are “Stored Procedures-RPC Completed” and “TSQL-SQL:BatchCompleted”, and the event data returned for these is the SQL statement being sent by the application. In addition to events, you can also customize the columns of information returned by the events, and add filters to restrict the rows of data being returned. Fortunately when you start SQL Profiler (via the Tools menu option in SSMS), the default events, columns, and filters will be sufficient for your needs. If the report queries are run in a server with a lot of other query traffic, you may need to filter to just the login name used in the datasource login.

There are a few key situations in reporting where this is most useful. At the top of the list is working with multi-value parameters. As I mentioned above, SQL Server needs to see a nice clean list of values that it can execute like “…WHERE colname IN (1,2,3)”, but depending on the strings in your multi-value parameter list, it can be difficult to build a syntactically correct list of values that SQL Server can understand. In this case, you can build a report with a multivalue parameter and a datasource query that uses it in a “…WHERE colname IN(@parmname)” fashion, if it works fine, GREAT! But if you get errors, it can be really tough to figure out what the report designer is doing with your parameters and expressions without SQL Profiler. But you can also use it for any data set query that is returning errors for reasons you don’t understand.

 I hope the helps, Thanks! - Dean

As usual, "This blog is provided 'AS IS' with no warranties, and confers no rights."

 

Hey There – I've not blogged in a while, and as a Reporting Services admin I’ve recently taken an interest in report caching issues, so I thought I would share some of my learnings and recommendations.

First of all, I’ll give a quick review of the factors you need to consider when making report caching decisions.

Considerations for Caching Reports

How long does the report take to run?

Caching a long running report will allow the report to be rendered faster and give a better experience for the user, but it will not improve the time for reports that are already fast for live executions. Long running reports usually also have a high processing cost for both the report server that is processing the report, as well as the data source server that is executing the report’s queries. Considering this will allow you to determine if a report should be cached.

How “fresh” does the report data need to be?

Some reports, for example, those that reflect information in a fixed prior date range, will always return the exact same data every time the report queries are executed. These are perfect candidates for caching. Some reports, for example those showing current “status” information are only valid with the most current data available. These are the worst candidates for caching. Most other reports fall somewhere in the middle; the value and accuracy of the report is directly related to the age of the data, some latency is ok, but beyond a certain point it really should be refreshed. Considering this will also allow you to determine if a report should be cached, as well as how long the report should stay in the cache.

How often is the report used, and how many people use it?

If a report is used very frequently, the costs of caching the report will be justified, but if the report is not used very often, the cost of caching it will not have any return.

When do users commonly use the report?

Some reports are repeatedly used at predictable times of the day, week, month etc, for example, a morning report showing the activity for the previous day. Other reports see a flurry of activity at non-predictable times, for example, when your manager sends the group an email questioning the status of a large project. Considering this will allow you to determine how and when the report should be cached.

What parameter values are commonly used for the report?

If it is a report without any parameters, or users commonly choose the same parameters every time the report is rendered, the cached version will be valid for most requests. But if users commonly specify parameters that are not the same as those used in cached versions of the report, the cached version will not be used, and the report’s queries will be executed live.

Will users want to go back and see old versions of the report created at various points in the past?

This is common for daily, weekly, or monthly status reports where users want to see older historic versions of the report.

 

All of these options must be considered together to determine if the report should be cached and how it should be cached.

 

Types of Report  Caching

There are two main types of report caching. When you click on the "Properties" tab of a report in Report Manager and choose the “Execution” option on the left side of the page, you will be configuring “Execution snapshots” that will only be kept in the cache for a specified period of time. When you choose the “History” option, you will be configuring “History Snapshots” that will store several cached versions of the report that can be manually chosen from the “History” tab of Report Manager at some time in the future. History snapshots can also store the cached versions of Execution snapshots.

Execution Snapshot Configuration Options

There are two primary options for configuring “Execution Snapshot” report caching. You will see that by default, the “Always run this report with the most recent data” and “Do not cache temporary copies of this report” options are chosen by default. This of course means that the report’s data set queries will be executed each time the report is rendered. You may be able to avoid the execution time experience for the report user, as well as the reducing the processing cost for the report server and the database server the report is based on, by caching the report.

The first option is controlling how the caching is triggered. You can have the report cached when a user renders the report and a compatible cached version is not available. This is good for reports that are commonly used at unpredictable times of the day/week/month. The first person to run the report will see the time delay for the report queries to be executed live, but it will be cached for later users, and when activity slows down again, the report server and data source servers will not pay the price of executing and caching the report. This is a good option if you want to cache several versions of the report with different parameter values. This is the method that will be used when you choose one of the “Cache a temporary copy of the report…” options in Report Manager.

Or you can have the report server automatically cache a new version on a specified interval or schedule, so a reasonably fresh cached version is always available. This is the method that is used when you choose the “Render this report from a report execution snapshot” option in Report Manager. It is a good option if you know when and how often the report will probably be used, but if the report is not used, the cost of executing and caching the report will be unnecessarily spent for no benefit to report users. This option is only good if the report has no parameters, or all the parameters have defaults, because the report server cannot prepare a cached version of the report if it does not know what users will specify when they run the report. One precaution to be made here is that you need to consider how long the report takes to execute live when you are configuring how often it should be triggered. I mentioned in my blog post about report performance a story about a subscription that fired every 15 minutes for a report that ran for over an hour. This resulted in several executions running in parallel and slamming the SQL Server box that was executing the query. The same situation can happen with report snapshots, and the obvious suggestion is to make sure you don't initiate frequent snapshots for long running reports, or you will make enemies in dba and report server admin circles!

The second option is choosing how long the report will stay in the cache. You can have the cached version expired and removed at set interval or a scheduled basis, depending on when the age of the data in the report becomes outdated. This can be specified by choosing one of the “Cache a temporary copy of the report…” options in Report Manager.

History Snapshot Configuration Options 

The options for history snapshots are exposed via the History option of the Properties tab in Report Manager. They are pretty straight forward and similar to execution snapshots in many aspected; they can be created on a scheduled basis or by active live requests from execution snapshots, and they can be automatically be cleaned up on a scheduled basis. There are a couple key differences; they are exposed and visible for viewing by selecting them in the History tab of Report Manager, whereas execution snapshots are automatically used (or not) depending on if they exist or have compatible parameters. You can also manually generate them (assuming the parameters all have defaults) using the "New Snapshot" option on the History tab.

 

Report Design Considerations for Caching Reports

There are a couple considerations to keep in mind for cached reports. The first issue is that it is important for users to know that they are seeing a cached and potentially outdated view of the report. You can accomplish this by adding a textbox near the top of the report with an expression similar to:

="Report Executed at: " + Now().ToShortTimeString

As I’ve mentioned several times above, the cached version of the report is only valid if the parameter values of the cached report are the same as the values chosen by the person running the report. For this reason, if you want to take advantage of report caching, you should ensure that the report parameters have defaults that will be commonly used by the report users.

 

Pre-loading the Execution Snapshot Cache

One of the drawbacks of non-scheduled execution snapshots is that they are only cached after the report was run once, and the first person to run the live version of the report pays the price of building the cached version. One way to avoid this is to have the report executed via a subscription that fires on a scheduled basis, and its execution caches the report without a user having to pay the price. A static subscription can be used if there are no parameters, all the parameters have defaults, or a single set of parameters will fit the needs of most users. You can also use a data driven subscription with a query that returns multiple sets of parameters to cover the needs of various combinations of parameters.

 

Identifying Existing Reports that are Good Candidates for Caching

As I’ve mentioned in another blog posting, and Robert Bruckner covers extensively in his blog, the report server catalog database has a view called ExecutionLog2 that has entries for every report execution. You can use information in this view to identify the reports that are frequently executed live along with the parameter values that are specified, and are good candidates for caching based on the caching considerations listed above. I created a report to serve this purpose, and it is based on the query below.

 

select top 1000

'ReportName'=reverse(substring(reverse(el.ReportPath),1,charindex('/',reverse(el.ReportPath))-1)),

el.ReportPath, 'LastModBy'=u.UserName, 'Parameters'=convert(varchar(max),el.Parameters),

'TotDurationMin'=Sum(datediff(ss,el.TimeStart, el.TimeEnd)/60.0),

'AvgDurationMin'=Avg(datediff(ss,el.TimeStart, el.TimeEnd)/60.0),

'MinDurationMin'=Min(datediff(ss,el.TimeStart, el.TimeEnd)/60.0),

'MaxDurationMin'=Max(datediff(ss,el.TimeStart, el.TimeEnd)/60.0),

'UserCount'=COUNT(distinct el.UserName),

'ExecCount'=COUNT(*)

from            ReportServer.dbo.ExecutionLog2 el

left outer join ReportServer.dbo.Catalog           c  on c.Path = el.ReportPath

left outer join ReportServer.dbo.Users             u  on u.UserId = c.ModifiedByID

where 

-- report parameters

(@reportpath = 'ALL' or el.ReportPath = @reportpath)

and   el.TimeStart     < DATEADD(dd,@daycount,GetDate())

 

and el.Source        = 'Live' -- only show live executions, not cached executions

and   el.ReportAction  = 'Render' –- only include rendering operations

and   el.ReportPath   != 'Unknown' -- exclude reports that have been deleted after executing

and   el.Status        = 'rsSuccess' --

group by el.ReportPath, u.UserName,  convert(varchar(max),el.Parameters)

-- Only show the report executions that happen frequently, report parameter

having COUNT(*) > @minexeccount

order by Avg(datediff(ss,el.TimeStart, el.TimeEnd)/60.0) desc

 

By putting interactive sorting on the report’s table columns, I can look for the longer running reports, those that are executed frequently with a given set of parameters and so on. And I added drillthrough links that allow me to easily filter to just the data for a given report, as well as a MAILTO: URL that lets me easily contact the person who owns the report.

You can also use a similar query to monitor cached report activity. Reports that are executed live will have an ExecutionLog2 entry with RequestType=Interactive or Subscription, Report Action=Render, and Source=Live. Those that are rendered from an execution snapshot will have Source=Snapshot, and those from a history snapshot will have Source=History. If you have created scheduled execution or history snapshots for expensive reports, but don't see anyone actually using them, you may want to consider canceling the snapshot.

Internal Scheduling and Storage of Report Snapshots

Disclaimer: we do not officially support querying all of the Reporting Services tables mentioned below, the columns and information in them may change in the future. 

In my blog post about monitoring and troubleshooting subscriptions, I described how the scheduling of subscriptions is controlled by a SQL Agent Job that places an entry in the Events table, and it is moved to the Notifications table when the report server is processing the subscription. The same scheduling and delivery process is used by the schedules that refresh execution and history snapshots, with the exception that snapshots are stored in the catalog database rather than being sent to a file or email address. You can use the same basic techniques for monitoring, diagnosing, and troubleshooting snapshots.

I mentioned several times above that the benefits of caching does come with a cost. Part of this is the execution cost of the report's dataset queries in the datasource server and part of this is in the report processing cost in the report server. But it also has a processing cost of managing snapshot cache entries, and a storage cost for storing the cached version of the report in the report server catalog database.

The report snapshot cached entries are stored in the Snapshot  table with information regarding when the cached entry was created, when it will expire, the parameter hash values used to determine if the cached report is valid for a new user request and so on.

The actual cached reports are stored in the Chunkdata table, and large report snapshots can require significant storage. For example, the Chunkdata table in my report server has 258 cached snapshot entries, with about 85mb of total data in the Content column, a max size of 14mb, and an average size of 335kb. The server does not use snapshots particularly heavily, at least until I start asking people to use them:), but this can be a significant storage issue in the report server catalog database if you do cache a lot of large reports, so it is something worth montoring from time to time in your server.

I hope this information helps, take care, thanks, happy reporting, and of course:

"This blog is provided 'AS IS' with no warranties, and confers no rights."

Dean

As the owner of several production reports, I have subscriptions that fire on a daily or weekly basis to remind people to close their bugs or verify their labruns. As usual, the password for the account I use in the report datasources will expire, and I invariably forget to update all of them and I'll get the "I've not received the report for days" email. As the owner of the Report Server, I also don't want my box to waste time and resources processing a bunch of invalid reports or subscriptions.

For these reasons, I want to monitor the Report Server and know what is not working in the reports and subscriptions.

Let me start by giving a high level view of how subscriptions are defined and processed, but first:

Disclaimer: we do not officially support querying all of the Reporting Services tables mentioned below, the columns and information in them may change in the future.

Note: This information refers to the SQL 2008 catalog, this info will generally also apply to SQL 2005, but I've not verified it there yet. Also, unless otherwise specified, the tables mentioned below are in your Report Server catalog database with whatever name you gave it, and yes, you need permissions necessary to query them.

Ok, back to the fun stuff. 

 When you create a subscription several things are added to the RS server:

- A row is placed in the Subscriptions table identifying the name of the report, along with parameter settings, data driven query info and so on to process the subscription

- A row is placed in the Schedule and ReportSchedule tables with the timing of the subscription

- A SQL Server Agent job is created to control the scheduled execution of the report, and this is stored in the sysjobs and sysjobsteps of the MSDB database. The agent job name is a guid value that represents the ScheduleId (yes, in case you've not yet noticed, this makes your Agent Job list messy with a bunch of guids that make it hard to work with your non-subscription jobs, we are hearing you:).

This query shows how the subscription information is stored in the various related tables, along with the most useful information you will need to diagnose issues later:

select

'SubnDesc' = s.Description,

'SubnOwner' = us.UserName,

'LastStatus' = s.LastStatus,

'LastRun' = s.LastRunTime,

'ReportPath' = c.Path,

'ReportModifiedBy' = uc.UserName,

'ScheduleId' = rs.ScheduleId,

'SubscriptionId' = s.SubscriptionID

from ReportServer.dbo.Subscriptions s

join ReportServer.dbo.Catalog c on c.ItemID = s.Report_OID

join ReportServer.dbo.ReportSchedule rs on rs.SubscriptionID = s.SubscriptionID

join ReportServer.dbo.Users uc on uc.UserID = c.ModifiedByID

join ReportServer.dbo.Users us on us.UserID = s.OwnerId

join msdb.dbo.sysjobs j on j.name = CONVERT(nvarchar(128),rs.ScheduleId)

 I use a similar query in a report with parameters to filter to certain reports, subscriptions, owners, and last_status values.

When the subscription runs several things happen:

- The SQL Server Agent job fires and puts a row in the Event table in the RS catalog with the settings necessary to process the subscription

- The RS server service has a limited number of threads (2 per CPU) that poll the Event table every few seconds looking for subscriptions to process

- When it finds an event, it puts a row in the Notifications table and starts processing the subscription, which includes executing the data driven subscription query (if present), processing the report, rendering it to the specified format (HTML, PDF, Excel etc), and delivering the final result to either an email address or a file share. Note that there will also be a row in the ExecutionLog table for the report execution.

This  query shows the subscription processing information that you can use to diagnose problems:

select

'Report' = c.Path,

'Subscription' = s.Description,

'SubscriptionOwner' = uo.UserName,

'SubscriptionModBy' = um.UserName,

'SubscriptionModDate' = s.ModifiedDate,

'ProcessStart' = dateadd(hh,DATEDIFF(hh,Getutcdate(),Getdate()),n.ProcessStart),

'NotificationEntered' = dateadd(hh,DATEDIFF(hh,Getutcdate(),Getdate()),n.NotificationEntered),

'ProcessAfter' = dateadd(hh,DATEDIFF(hh,Getutcdate(),Getdate()),n.ProcessAfter),

n.Attempt,

'SubscriptionLastRunTime' = dateadd(hh,DATEDIFF(hh,Getutcdate(),Getdate()),n.SubscriptionLastRunTime),

n.IsDataDriven,

'ProcessHeartbeat' = dateadd(hh,DATEDIFF(hh,Getutcdate(),Getdate()),n.ProcessHeartbeat),

n.Version,

n.SubscriptionID

from Notifications n

join Subscriptions s on n.SubscriptionID = s.SubscriptionID

join Catalog c on c.ItemID = n.ReportID

join Users uo on uo.UserID = s.OwnerID

join Users um on um.UserID = s.ModifiedByID

Common problems with subscription delivery: 

If your subscription emails are not being delivered to Inboxes across the company, there are several different things that can cause this:

- The SQL Server Agent job is not firing

This could be because the service is not running due to a password expiration or some other cause, and RS does not see anything to process. In this case, get the ScheduleId guid from the above query, find the job with that guid as its name, and check the job history to make sure the job fired. You will also see that the Event and Notifications tables are empty.

- RS is not processing the subscription events

The main issue here is that the RS service event processing threads are not processing the events. The main cause here is that the event processing threads are all tied up, and not processing the events as fast as they arrive. You will see that the agent jobs are firing, but the Events and Notifications tables have a large number of entries. The number of threads is fixed, you cannot increase it, so you need to figure out what they are processing and why they are not firing or keeping up. You can get the subscription and report information, and run the report or the data driven query manually to see if they are healthy. One thing I noticed is that a person had scheduled a report execution snapshot to fire every half hour, but the report took over an hour to process, so at any given time, several instances of the event were running at the same time occupying all of the threads, and nothing else was getting executed.

You may also notice that everything is healthy, but there are common hotspot times in the day (for example, 8am or midnight) when a lot of subscriptions are set to fire at the same time, and they may take a long time to complete, so the subscription processing cannot keep up and falls behind. In this case, you may encourage people to run them at different times if possible, or you could consider adding a second reporting server configured as a two node NLB cluster with the other. In this case, both servers will be polling the events table in the RS catalog and processing the events in parallel.

- SMTP is not delivering the emails

 This could be because the email address is bad, or there are problems with the SMTP configuration. I won't go into SMTP here, I'm a database guy, but RS needs it to be working to deliver the report. The Event and Notifications tables will be empty (RS has done its part), the LastStatus column of the Subscriptions table and the subscription status in Report Manager will show an error similar to "Failure sending mail: The user or group name 'domain\username' is not recognized. Mail will not be resent."

- The report or subscription is invalid

Again, this can usually be tested by running the report manually, but there are a myriad of causes that will also show in the subscription status. For example, the datasource may be invalid due to an expired password or an account for a person who has left the company, a subreport may have been removed or renamed, a server, database, or table used in the query may have been removed or renamed and so on. Statistically, this is the source of over 90+% of failed subscriptions I've seen.

So in summary, you can write reports based on the information in these catalog tables to monitor subscriptions, and take action before you get the "I've not received reports for days" emails, and encourage the report and subscription owners to keep their stuff clean and healthy and be good reporting citizens.

In case its not obvious, you may want to host these monitoring reports on a different server because if subscriptions are not firing, you may not get the subscription based monitoring report to tell you that subscriptions are broken!

I hope this helps, please let me know!

Thanks, Dean

As usual, "This blog is provided 'AS IS' with no warranties, and confers no rights."

 

So for my first technical blog posting, I'd like to take on one of the most common issues I encounter when helping people write reports and when I run them myself. In my experience with SQL Server I've heard numerous application developers claim that SQL Server is slow or doesn't scale, and typically my response is that poor designs and/or poorly written queries are slow, SQL Server's perf is largely dependent on this. I've observed the same when people claim RS is slow, and my response is mostly the same. 

Bear in mind that report execution time is directly related to the query execution time, the amount of data returned by the dataset query, and the complexity of the grouping, sorting, filtering, and aggregations in the report definition. Usually I can review the report, query, table design, or indexes and improve them dramatically. More often than not, the report writer does not have the option of changing table designs or indexes as they may not have admin or dbo permissions, or if they do, design changes would require broader application changes. So working on the dataset query and the report definition is your best bet.

To start, I usually extract the query from the report and run it outside the report to see what's going on, mainly, how long does it take to execute and how many rows are being returned. Often times, the report may take 20 minutes to run, and if the standalone query also takes 20 minutes, there is not much opportunity to address this in the report. I'll review the tables and indexes to make sure the queries are good from that perspective, or if the same results can be obtained more quickly with a different approach in the query. Application developers who work with the database regularly may also be able to help.

Another source of perf problems is contention, locking, and blocking in the database if there are a lot of users or processing using the same table. The resource manager in SQL Server Management Studio, 'SELECT * from SYSPROCESS', or SP_WHO will all show how many active processes are running on the box, and if there are blocked processes or large WAITs on lock resources. If there is, you may be able to benefit from breaking up the query.

For example, in our test automation application, there are typically hundreds of processes running concurrently in the database where the automation application is running tests, testers are verifying labruns, and overzealous report writers like me running expensive reports, and all of them are getting and holding locks on the tables throughout the life of the query. The application has one "hot" table that is used by practically every process, as well as 10 or so other tables that are commonly joined with the hot table in queries. I've found that if I get only the necessary information from the hot table with a well indexed query, load this into a #temp table, and then collect the remaining data from the other tables joined with the #temp table in separate queries, the overall data can be returned to the report much more quickly. Now my peers in the relational engine may cringe if they read this, because it is nearly always better to use SELECT only without #temp tables if possible, but when there are several tables and lock contention going on, this can help dramatically.

One of my first big wins in my reporting career (before I joined RS) was when my manager at the time had a labrun status report that took about 30 minutes to run. I was very familiar with the database, so he asked me to take a look and see if I could make it better. When I ran the query outside the report, it ran in a few minutes, but it returned about 20,000 rows. When I looked at the report, I saw that it was only displaying around 100 rows to the user, and I knew I was on to something. I noticed that the data was being displayed in a group section based on several columns in the report's dataset query, and the fields had several =COUNTROWS() and =SUM(columnname) aggregate functions in the report expressions. I took the columns used in the report table's group definition and added them to a GROUP BY clause in the query along with the other non-aggregated columns, and added the COUNT(*) and SUM(columnname) expressions to the column list of the query, and it executed in about 20 seconds! Then I removed the group from the table in the report, pasted the new query in the report dataset, and displayed the result set columns in the detail section of the table. I'd just converted a 30 minute report to a 20 second report, SCORE!!!

When Reporting Services was running the original report, it had to execute the query, cache the result set, build b-trees and calculate the aggregations, and finally, build and render the report. In the new version of the report much of the data processing was done by SQL Server; Reporting Services only had to execute the query, cache a much smaller dataset, and then build and render the final report, which by the way looked exactly like the original report. While we do everything possible to optimize Reporting Service's processing of data, SQL Server is optimized to this and can do it much more efficiently.

There is one flaw with this approach; report users commonly like to see the detail behind aggregates, so you can return the full data set and display it in a detail section that is hidden by default and toggled at the group level if the user wants to see it. To address this need, I still use the approach above, but provide a drillthrough link on the aggregated number to another report that shows the detail for the aggregated group.

Robert Bruckner covers some of this material in his blog post here, but one thing I'd like to repeat is that you can also use the ExecutionLog2 view in the Report Server catalog to identify and isolate these two issues. It has columns to show how much time is being spent in executing the report's dataset query vs the time spent processing the data. If you are a report server administrator, you can monitor the long running reports and provide my suggestions above to the report owner without going through the process yourself.

I hope this helps, please let me know!

As usual, "This blog is provided 'AS IS' with no warranties, and confers no rights."

 

 

Hey There - Welcome to my new blog, before I start posting, I'd like to tell you about myself so you know a bit about how I think and what I'd like to accomplish with this blog.

I'm a true local, born and raised and educated in the Seattle area. I have a bachelor's degree in Business Management and more or less picked up computer skills along the way. My first real job after college was at a company called Microrim that shipped the first truly relational database product called R:Base System V, which was PC Magazine's Product of the Year around 1988 or so. I've been at Microsoft since 1990, and my first role was supporting Windows 3.0 during its first release. Around 1993 I joined the SQL Server product group as a software test engineer, and was the ONLY tester for the SQL Server Engine and the TSQL programming langauge (it takes about 300 people to fill that role now:), and shipped SQL Server 4.2, 6.0, and 6.5. Then I was a technical writer (gag) during SQL 7.0 owning half of the TSQL Reference and conceptual material. Then a tester in the Replication team during the creation of the Merge Replication feature set for the 8.0 release, and was the test lead when I left. Following that I was a test manager for the SQL "Integration Testing" team, a test lead for the Execution Engine through the 9.0 release. Since then I've been a test lead in the Reporting Services team, and own the "Server and Management Tools" components that includes the RS Windows and Web services, the SOAP and WMI APIs, the RS catalog, setup/upgrade, Report Manager, Config tool, Sharepoint Integration, and several other misc components and features.

To the core I'm a database guy and love working with relational database application develeopment and implementation, but my passion for the last 4 years or so is creating reports with Reporting Services. I'm intimately familiar with all of the databases that support the SQL Business unit, primarily the applications for tracking bugs and managing the test automation, and have written hundreds of reports to help us run the business and ship great software. I also own the production Reporting Services server that supports the business unit, and spend a lot of time monitoring the server and helping others in the unit with their reporting needs.

I have many very talented peers who have much more depth in the core Reporting Services technologies, and also love to blog, so I'd like to take a slightly differently approach than them. What I hope to share with you in this blogs is what I've learned creating reports, using the other RS features, managing a production server, and playing with the catalog.

Please feel free to contact me and give me feedback on my blog, other questions you have about these areas, and any disagreements, corrections, or enhancements to my postings.

Take care, thanks, and of course "This blog is provided 'AS IS' with no warranties, and confers no rights."

 

 
Page view tracker