So for my first technical blog posting, I'd like to take on one of the most common issues I encounter when helping people write reports and when I run them myself. In my experience with SQL Server I've heard numerous application developers claim that SQL Server is slow or doesn't scale, and typically my response is that poor designs and/or poorly written queries are slow, SQL Server's perf is largely dependent on this. I've observed the same when people claim RS is slow, and my response is mostly the same.
Bear in mind that report execution time is directly related to the query execution time, the amount of data returned by the dataset query, and the complexity of the grouping, sorting, filtering, and aggregations in the report definition. Usually I can review the report, query, table design, or indexes and improve them dramatically. More often than not, the report writer does not have the option of changing table designs or indexes as they may not have admin or dbo permissions, or if they do, design changes would require broader application changes. So working on the dataset query and the report definition is your best bet.
To start, I usually extract the query from the report and run it outside the report to see what's going on, mainly, how long does it take to execute and how many rows are being returned. Often times, the report may take 20 minutes to run, and if the standalone query also takes 20 minutes, there is not much opportunity to address this in the report. I'll review the tables and indexes to make sure the queries are good from that perspective, or if the same results can be obtained more quickly with a different approach in the query. Application developers who work with the database regularly may also be able to help.
Another source of perf problems is contention, locking, and blocking in the database if there are a lot of users or processing using the same table. The resource manager in SQL Server Management Studio, 'SELECT * from SYSPROCESS', or SP_WHO will all show how many active processes are running on the box, and if there are blocked processes or large WAITs on lock resources. If there is, you may be able to benefit from breaking up the query.
For example, in our test automation application, there are typically hundreds of processes running concurrently in the database where the automation application is running tests, testers are verifying labruns, and overzealous report writers like me running expensive reports, and all of them are getting and holding locks on the tables throughout the life of the query. The application has one "hot" table that is used by practically every process, as well as 10 or so other tables that are commonly joined with the hot table in queries. I've found that if I get only the necessary information from the hot table with a well indexed query, load this into a #temp table, and then collect the remaining data from the other tables joined with the #temp table in separate queries, the overall data can be returned to the report much more quickly. Now my peers in the relational engine may cringe if they read this, because it is nearly always better to use SELECT only without #temp tables if possible, but when there are several tables and lock contention going on, this can help dramatically.
One of my first big wins in my reporting career (before I joined RS) was when my manager at the time had a labrun status report that took about 30 minutes to run. I was very familiar with the database, so he asked me to take a look and see if I could make it better. When I ran the query outside the report, it ran in a few minutes, but it returned about 20,000 rows. When I looked at the report, I saw that it was only displaying around 100 rows to the user, and I knew I was on to something. I noticed that the data was being displayed in a group section based on several columns in the report's dataset query, and the fields had several =COUNTROWS() and =SUM(columnname) aggregate functions in the report expressions. I took the columns used in the report table's group definition and added them to a GROUP BY clause in the query along with the other non-aggregated columns, and added the COUNT(*) and SUM(columnname) expressions to the column list of the query, and it executed in about 20 seconds! Then I removed the group from the table in the report, pasted the new query in the report dataset, and displayed the result set columns in the detail section of the table. I'd just converted a 30 minute report to a 20 second report, SCORE!!!
When Reporting Services was running the original report, it had to execute the query, cache the result set, build b-trees and calculate the aggregations, and finally, build and render the report. In the new version of the report much of the data processing was done by SQL Server; Reporting Services only had to execute the query, cache a much smaller dataset, and then build and render the final report, which by the way looked exactly like the original report. While we do everything possible to optimize Reporting Service's processing of data, SQL Server is optimized to this and can do it much more efficiently.
There is one flaw with this approach; report users commonly like to see the detail behind aggregates, so you can return the full data set and display it in a detail section that is hidden by default and toggled at the group level if the user wants to see it. To address this need, I still use the approach above, but provide a drillthrough link on the aggregated number to another report that shows the detail for the aggregated group.
Robert Bruckner covers some of this material in his blog post here, but one thing I'd like to repeat is that you can also use the ExecutionLog2 view in the Report Server catalog to identify and isolate these two issues. It has columns to show how much time is being spent in executing the report's dataset query vs the time spent processing the data. If you are a report server administrator, you can monitor the long running reports and provide my suggestions above to the report owner without going through the process yourself.
I hope this helps, please let me know!
As usual, "This blog is provided 'AS IS' with no warranties, and confers no rights."