Authors: Tim Wieman , Sanjay Mishra
Technical Reviewers: Silvano Coriani, Cindy Gross, Chuck Heinzelman, Shep Sheppard, Mike Weiner, Nicholas Dritsas, Steve Howard, James Podgorski, Juergen Thomas, Ewan Fairweather
This blog will demonstrate a real-world customer example of scaling out Microsoft SQL Server TempDB data files across multiple data disks in a Windows Azure Virtual Machine. While this focuses on TempDB, the concepts are the same for scaling out writes to SQL Server data files (MDF / NDF) of other databases as well.
As discussed in the article "Performance Guidance for SQL Server in Windows Azure Virtual Machines", each Windows Azure Virtual Machine disk has I/O capacity limits based on the intrinsic limits of the Windows Azure Storage Blob architecture. As with on-premises Microsoft SQL Server deployments, when you reach capacity limits of a data disk, you can add multiple data disks and scale out the database data files to these multiple data disks.
The customer solution was written with READ COMMITTED SNAPSHOT ISOLATION enabled on the user database, thus the TempDB throughput was very high due to SQL Server maintaining the version store in the TempDB database. We knew we had some database and query tuning to do in order to decrease the lock contention, but before we did that, we took this as a good opportunity to do some TempDB scale-out testing and grab some performance results.
Environment: SQL Server was running on a Windows Azure Extra Large (XL) Virtual Machine (8 cores) running Windows Server 2008 R2 and SQL Server 2012. We used a single, dedicated Windows Azure Storage account for the data disks attached to the virtual machine. We also kept the default setting of geo-redundancy enabled on the storage account.
We started out with standard SQL Server best practices for TempDB (multiple data files, pre-size data files to the same size, etc.). Azure CAT's Cindy Gross does a great job of outlining these here: http://blogs.msdn.com/b/cindygross/archive/2009/11/20/compilation-of-sql-server-tempdb-io-best-practices.aspx. We asked 5 DBAs how many data files to create for TempDB and we got 7 different answers ;-), so we settled on 8 data files; 1 data file for each CPU core on the system. This is consistent with the recommendations in the KB article 2154845, “Recommendations to reduce allocation contention in SQL Server tempdb database”. This also mitigated the PFS (Page Free Space) allocation contention that we saw early in testing. I’ve included links to other good references on TempDB performance tuning in the References section at the end of this post, so we will not rehash all of the TempDB recommendations here.
Now, forget altogether that this was TempDB; just think of this as any SQL Server database that has disk I/O contention running in a Windows Azure Virtual Machine. As we know, the VHDs for the data disks attached to Windows Azure Virtual Machines are stored in Windows Azure Blob storage; each VHD is a Windows Azure storage Blob. From the MSDN topic “Windows Azure Storage Scalability and Performance Targets”, we can find the target throughput for a single blob. At the time of this writing, the target throughput is “Up to 60 MB per second, or up to 500 transactions per second”. Let’s see what our customer scenario was seeing.
1 Data Disk
Disk Write Bytes/sec avg.
SQL Server Batch Requests/sec avg.
As we can see from the write throughput numbers, we’re getting very close to the maximum throughput for a single blob…sounds like we’re maxing out our disk I/O throughput. Let’s see what we can do to alleviate this.
2 Data Disks
Adding the second data disk confirms we were bumping up against our max VHD/blob throughput since we’re still bumping up to the top end of our blob throughput on both disks after adding a second disk.
As you can see from the numbers above, we nearly doubled our write throughput and have nearly perfect write distribution across the data disks. Even though we did not get a doubling of the Batch Requests / sec or the customer application requests / sec, we did get a 45% increase and 43% increase respectively over the single disk configuration.
4 Data Disks
With the data files evenly distributed across 4 data disks, the write throughput to each disk is no longer nearly as close to our maximum VHD/blob throughput. We’ll call this success from a disk I/O perspective! We could have gone further down this path, but we had those pesky locking problems to deal with still, and we knew that alleviating those would also greatly reduce the TempDB usage.
Once again, we have nearly perfect write distribution across the 4 data disks. This time the write throughput has not doubled from the previous 2-disk configuration, *because* we have eliminated the disk I/O bottleneck on the SQL Server data disks! Overall disk I/O for the database has still increased, but at a smaller percentage because the database disk I/O is no longer the main bottleneck. Because of the smaller percentage increase in disk throughput, we also have a smaller increase in the Batch Requests / sec and the customer’s application requests / sec.
The following gives us a graphical display of overall disk I/O throughput and SQL Server throughput (Batch Requests / sec.). The near-linear scaling of disk I/O throughput is a beautiful thing!
Scaling out your SQL Server data file reads and writes applies just as much to SQL Server on Windows Azure Virtual Machines as it does to your on-premises configurations.
This is good. I do have a query about write-order preservation though. Is there anything we need to do to ensure WAL protocol management across these multiple disks? Especially with geographically redundant copies of the virtual disks?
@Arvind: For the disks attached to the Windows Azure Virtual Machine, we guarantee write-order preservation for the "local" disks in the storage account in the local data center. For Geographically Redundant Storage in Windows Azure, we do NOT guarantee write-order preservation for the replicated disks (blobs). So as we mention here, msdn.microsoft.com/.../dn133149.aspx, if you have databases on multiple data disks, you should disable the blob geo-replication.