As a DBA I have done a lot of work improving I/O performance for on-premise SQL Server installations. Usually it involves tweaking the storage system, balancing databases across RAID arrays, or expanding the count of files that the tempdb is using; these are all common techniques of SQL Server DBA. However, how do you improve your I/O performance when you are not in charge of the storage subsystem, like in the case of SQL Azure? You focus on how your queries use the I/O and improve the queries. This blog post will talk about how to detect queries which use a high amount of I/O and how to increase the performance of your I/O on SQL Azure.
With SQL Azure, just like SQL Server, I/O is a bottleneck in getting great query performance. Before you can make any changes, you first thing have to be able to do is detect which queries are having trouble.
This Transact-SQL returns the top 25 slowest queries:
SELECT TOP 25
(total_logical_reads/execution_count) AS avg_logical_reads,
(total_logical_writes/execution_count) AS avg_logical_writes,
(total_physical_reads/execution_count) AS avg_phys_reads,
cross apply sys.dm_exec_sql_text(plan_handle) AS q
(total_logical_reads + total_logical_writes) DESC
The output looks like this:
You can modify the ORDER BY clause in the statement above to get just the slowest queries for writes, or the slowest for reads.
With SQL Azure, it is Microsoft’s job to maintain the data center, the servers, handle the storage, and optimize the performance of storage. There is nothing you can do to make the I/O faster from a storage subsystem perspective. However, if you can reduce the amount of read and writes to storage, you can increase the performance of your queries.
One way to overuse I/O is to read data that you are never going to use. A great example of this is:
SELECT * FROM [Table]
This query reads all the columns and all the rows from the [Table]. Here is how you can improve that:
Once you have reduced the number of columns you are returning for each query, you can focus on creating non-clustered covered indexes for the queries that have the most read I/O. Covered indexes are indexes that contain all the columns in the query as part of the index, this includes the columns in the WHERE clause. Note that there might be several covered indexes involved in a single query, since the query might join many tables, each potentially with a covered index. You can determine what columns should go into the index by examining SQL Azure’s execution plan for the index. More information about be found in the MSDN article: SQL Server Optimization.
Just a note, non-clustered indexes (what you make when you do a covered index) reduce the performance of your writes. This is because on insertion or updates, the indexes need to be updated. So you need to balance your covered index creation with the ratio of reads and writes to your database. Databases with a disproportion amount of reads to writes gain more performance from covered indexes.
I barely touch on the ways to reduce your I/O usage and increase your query performance by modifying your queries. As a rule of thumb the techniques that you find on the MSDN for query optimization for on-premise SQL Server installations should work for SQL Azure. The point I am trying to make is that if your query I/O usage is high, focus on optimizing your queries, Microsoft is doing a great job of optimizing the storage subsystem behind SQL Azure.
Do you have questions, concerns, comments? Post them below and we will try to address them.
Access to the DMV's data (e.g. sys.dm_exec_query_stats) is denied on the master database on SQL Azure. Our product runs hundreds of queries against the master database. What do you recommend in that situation?
Lee: Obviously this article is addressing the issue of running queries against your own database, and not about improving I/O performance against queries calling the master database. Can you tell me what types of queries, are you calling the DMVs in these queries?
Hi Wayne, we've written a general purpose management tool to work with SQL Azure (Toad for SQL Server to be specific), so we fetch lots of objects from the master database for general purpose editing and exploring. We don't do anything with DMVs yet. Any idea why SQL Azure blocks access to the master DMVs? Seems unnecessary to me.
The covering index is an interesting part of the overall solution that you mentioned in this blog. My question is for example there is one table which is accessed across various scenarios with different sets of columns. I dont think we can create covering index for each of the scenario and column combination. My approach would be to see which specific query is being used the most and create covering index only on that and leave the others. What is your suggestion in this scenario? Creating too many covering indexes on a table would hamper DML operation performance. What is the key to drive this decisino on whether to create covering index or not for a specific scenario? Please suggest.
Phani: These are the same considerations when working with an on-premise SQL Server. It is a balance between write performance and read performance, which depends on your ratio of reads to writes. Usually web sites are very read heavy, and having multiple covered indexes for each of your highest traffic queries is a very valid solution.
My main concern is that the way SQL Azure is billed discourages the use of indexes. A well-indexed database consumes quite a lot of disk space just for the indexes. It's currently better to use as few indexes as possible, saving them for performance-critical sections.
Ryan: There is not much different between SQL Server and SQL Azure in that regard, resources are scarce and there are real costs associated with them. With an on-premise SQL Server having lots of indexes is not free, you have to purchase the hard drives, allocate the RAID, tune the performance, and provide redundancy. You are right, there is a balance between performance and financial cost, however I think you face this regardless if you are on-premise or in the cloud.
Wayne: I agree completely, it's just that when you started talking about covering indexes (which can take a lot of space), a little note about disk space / financial consequences would be appropriate :)
I suspect that much of your audience is using smaller SQL Server installations (maybe just single server) with their databases set to auto-grow and thus space considerations aren't a part of their mentality :)
One thing that surprises me about SQL Azure is that there is no incentive to use indexes for the sake of reducing buffer cache or CPU utilization. Servers aren't free. Knowing that my database could be on the same shared server as some jerk who does table scans all day, impacting my performance, is a concern for me...
Ryan: Sql Azure ensures that all tenants get fair share of the resources. Someone with large amount of IO will be first on our list when throttling. If the server runs low on resources, we’ll ensure each tenant get their fair share.
To put it In simple words - if query is slow then rewrite it or suffer, because in fact there is NO way to improve I/O performance in Azure... So called 'fair share of resources' might be actually very miserable, but customer/developer will never know this.. Unless he/she runs problematic query on local dedicated SQL server.
@ Lee Grissom - it is obvious why DMV access in master db is disabled.. just look at it from '***fair*** share of resources' point of view..
I would like to understand one thing from this conversation.
@Slawek - Are we saying that we need to suffer from performance persepctive, if there is a bad/slow query written by some else? I agree that if my query is slow, I have to rewrite or suffer which is the case even with On-Premise. But it is not fair for me to suffer from performance perspective even if I have the best code inplace but my database is hosted on a server where one of the database is creating high performance bottlenecks.
How is SQL Azure taking care that it allocates resources appropriately to all databases hosted on a single instance of SQL Server? Could you please throw some light on this if the question is correct?
I am only telling that if you go for service like SQL Azure A LOT of things will be out of your control. The point is that you don't know in fact how SQL Azure shares resource. Perhaps MS soon introduce new feature: "boost your performance for a few extra $$$ per month!". Nice perspective for those who do have extra $$% to spent.. but how about everyone else ? They MIGHT be forced at the end of the day to pay extra $$$ for fair performance...
You can and of course you should optimise your queries but at the end of the day there is NOTHING more that you can do. Whether it is due to slowing cloud response, or slowing internet connection.
I can see cloud computing as a very nice solution, from reliability and scalability point of view but only when the entire cloud belongs to you.. Then YOU are in charge, you will not be surprised by 'promotions' offered to you or others sharing cloud services.
As for lower total cost of ownership and lower costs... I cannot get it how sum of infinite constant (hopefully! - no warranty here!) instalments can lead to lower value than one-off expense. It is like using a taxi. It is cheap per trip, but if you use it constantly all day along sooner or later your costs will exceed costs of brand new luxury car + costs of all stuff needed to make it running smoothly. At the end to the day taxi driver will have to carry the same costs as you, but he need to add his profits on the top of that. If you use taxi very much it WILL cost more that having your own car.
In my opinion the same logic applies to the cloud computing (and generally software as a service concept), whether it is SQL Azure or another service.
Got your point now. May be I read your response from a different perspective :(. Now let us leave it to Wayne to elabourate on how Microsoft takes care of this situation.
Your questions are very legitimate and something that we need to address more. One of our goals is to make sure that each customer gets a fair share of the resources available on the machine; we accomplish this through throttling and load balancing. This is a very complex topic, one that I would like to address in an upcoming blog post. If customers with poor performing queries and databases could slow other customers over the long term, then the service wouldn’t have grown to be the success it is. We also believe that for the price you are getting the best value for resources and storage. There are many ways to improve your query performance after optimizing, one of which is to pay for more resources in the form of another database – this allows you to replicate your data for addition speed. I have a blog post coming up on this, we call it elastic performance. My suggestion is that you try and experiment with the service, if you do hit performance issues there are lots of solutions to help you overcome them. If you do need more resources you will be not alone and you will find that we will be adding enhancements and features to address the issue of gaining addition performance without recoding or redesigning going forward.
You can also reduce I/O by using Filtered Indexes that are Covered. Filtered indexes are added to SqlServer 2008. The filtered index size will be less depending on the filter condition and it will cause less I/O as number of index pages decrease.