As of September 1, 2013 we decided to remove SQLCAT.COM site and use MSDN as the primary vehicle to post new SQL Server content. This was done to minimize reader confusion and to streamline content publication. MSDN SQLCAT blogs already includes most SQLCAT.COM Content and will continue to be updated with more SQLCAT learnings. You can also find a collection of our work in SQLCAT Guidance eBooks.
To make the transition to MSDN smoother, we are in the process of reposting a few of SQLCAT.COM’s blogs that are still being very actively viewed. Following is reposting of one of the SQLCAT.Com blogs that still draws huge interest. You can find this, and other guidance in the SQLCAT's Guide to Relational Engine eBook.
Also, Follow us on Twitter as we normally use our Twitter handles @SQLCAT and @WinAzureCAT to announce news and new content.
Author: Steve Howard
Technical Reviewers: Silvano Coriani, Sanjay Mishra, James Podgorski, Mark Simms
· Do not create artificial keys with IDENTITY when it is not necessary. This creates a point of contention on heavy, parallel insert operations.
· If old data key values will not expire, use a MERGE operation instead of DELETE and INSERT. This minimizes data operations; rebalancing and page splits, and the amount of data that must be replicated. If old data key values will expire, then test two operations of MERGE followed by a deletion of only the expired keys rather than a DELETE and INSERT of the full data set.
· If not all the data will be changed, modify the “WHEN MATCHED” portion of the MERGE statement to also check that the data that may change has changed, and only update the data that is actually changed. This minimizes the number of rows of data that are actually modified, and thus minimizes the amount of data that must be replicated to secondaries in Windows Azure SQL Database environments.
In a recent engagement, a problem was encountered in performance of inserting and updating data in Windows Azure SQL Database. The scenario was:
· Data from hundreds of thousands of devices needs to be stored in Windows Azure SQL Database
· Each device stores approximately 8000 rows of configuration data across three tables in the database
· Data for each device is updated approximately once per day
· Only the most current data is stored in Windows Azure SQL Database
· The data must be processed at a sustained rate of six devices per second (Approximately 48,000 rows per second)
Several optimizations were made to this process.
Performance was tested by running multiple concurrent processes to process data. Elapsed time was measured as only the time it took to execute the three stored procedures for the in-processing of the new data. In different tests, the amount of data changed in each incoming data set varied so that measurements could be taken with 10%, 18%, 25%, or 100% modified data. Since 10% was determined to be the most that would ever be seen on any particular processing day, the changed percentage of 10% was used as the main indicator of the amount of improvement the optimizations would yield, and other percentages were used to give an indication of what might happen should an exceptional day produce much different data.
Original Stored Procedures and Tables
Optimized Stored Procedures and Tables
The question of whether to use SqlBulkCopy or TVP is not always a question of which operates faster. When not all the data that is received actually changes data in the table, using a TVP as a parameter for a stored procedure, and optimizing appropriately can lead to very significant performance advantages over other methods that must delete and insert full sets only.
You could have added a filter on DeviceId to both the source and target to get a merge join with only small parts of the index scanned. Then you call the statement once for every device of which I understand there are only a few per second.
Nice - well written article.
I know it is hard-labor and you deservingly can\should take a lot of pride on working so studiously to get the best outcome for you clients.
Thanks for making world wide publicly available.
I agree with xor88, surely if you know the data is from a single DeviceId then define a param of @DeviceId, drop the DeviceId column from the table type (PK on your table type is now just SubCondition) then use:
WITH MyData AS (
FROM dbo.Table 1
WHERE DeviceId = @DeviceId
MERGE INTO MyData t
USING @TableParam s ON s.SubCondition = t.SubCondition...
Performance aside, this allows you to use WHEN NOT MATCHED BY SOURCE THEN DELETE to delete any rows that match @DeviceId but are not present in the TVP. Thus a single merge can INSERT, UPDATE and DELETE from your defined range of data (where DeviceId = @DeviceId in this case).