[This article was contributed by the SQL Azure team.]

As a DBA I have done a lot of work improving I/O performance for on-premise SQL Server installations. Usually it involves tweaking the storage system, balancing databases across RAID arrays, or expanding the count of files that the tempdb is using; these are all common techniques of SQL Server DBA. However, how do you improve your I/O performance when you are not in charge of the storage subsystem, like in the case of SQL Azure? You focus on how your queries use the I/O and improve the queries. This blog post will talk about how to detect queries which use a high amount of I/O and how to increase the performance of your I/O on SQL Azure.

Detecting Excessive I/O Usage

With SQL Azure, just like SQL Server, I/O is a bottleneck in getting great query performance. Before you can make any changes, you first thing have to be able to do is detect which queries are having trouble.

This Transact-SQL returns the top 25 slowest queries:

SELECT TOP 25
    q.[text],
    (total_logical_reads/execution_count) AS avg_logical_reads, 
    (total_logical_writes/execution_count) AS avg_logical_writes, 
    (total_physical_reads/execution_count) AS avg_phys_reads, 
     Execution_count
FROM sys.dm_exec_query_stats   
    cross apply sys.dm_exec_sql_text(plan_handle) AS q 
ORDER BY
 (total_logical_reads + total_logical_writes) DESC

The output looks like this:

clip_image001

You can modify the ORDER BY clause in the statement above to get just the slowest queries for writes, or the slowest for reads.

Changing the Query

With SQL Azure, it is Microsoft’s job to maintain the data center, the servers, handle the storage, and optimize the performance of storage. There is nothing you can do to make the I/O faster from a storage subsystem perspective. However, if you can reduce the amount of read and writes to storage, you can increase the performance of your queries.

Reading Too Much Data

One way to overuse I/O is to read data that you are never going to use. A great example of this is:

SELECT * FROM [Table]

This query reads all the columns and all the rows from the [Table]. Here is how you can improve that:

  • Use a WHERE clause to reduce the number of rows to just the ones that you need for your scenario
  • Explicitly name the columns you need from the tables, which hopefully will be less than all of them

Create Covered Indexes

Once you have reduced the number of columns you are returning for each query, you can focus on creating non-clustered covered indexes for the queries that have the most read I/O. Covered indexes are indexes that contain all the columns in the query as part of the index, this includes the columns in the WHERE clause. Note that there might be several covered indexes involved in a single query, since the query might join many tables, each potentially with a covered index. You can determine what columns should go into the index by examining SQL Azure’s execution plan for the index. More information about be found in the MSDN article: SQL Server Optimization.

Just a note, non-clustered indexes (what you make when you do a covered index) reduce the performance of your writes. This is because on insertion or updates, the indexes need to be updated. So you need to balance your covered index creation with the ratio of reads and writes to your database. Databases with a disproportion amount of reads to writes gain more performance from covered indexes.

Summary

I barely touch on the ways to reduce your I/O usage and increase your query performance by modifying your queries. As a rule of thumb the techniques that you find on the MSDN for query optimization for on-premise SQL Server installations should work for SQL Azure. The point I am trying to make is that if your query I/O usage is high, focus on optimizing your queries, Microsoft is doing a great job of optimizing the storage subsystem behind SQL Azure.

Do you have questions, concerns, comments? Post them below and we will try to address them.