Very Slow Data Repartitioning in SQL Server Replication with Precomputed Partitions

Chris Skorlinski
Microsoft SQL Server Escalation Services

Here is a great posting by Kevin Locke on a problem he encountered when data modifications occurred near the top of a complicated replication filter configuration.  These changes can cause “partition change” which triggers re-evaluation of all child data; also known as “repartitioning”.  Kevin’s solution was to add an index to the Merge tracking tables, perform the data changes, then remove the indexes.  This looks like a great solution to what should be a rare incident (major repartitioning with 800K and 500K child rows).  We’re continuing to investigate ways to improve “repartitioning”.

I also like his troubleshooting technique.  He investigated Merge Replication performance much like you would any SQL based application.  He identified high IO queries, reviewed top consumer in the query plan, explored indexes to improve query plan, tested new indexes.

--------------Kevin’s BLOG----------

By Kevin Locke
klocke@digitalenginesoftware.com

(reposting with permission)
http://www.digitalenginesoftware.com/blog/archives/65-Very-Slow-Data-Repartitioning-in-SQL-Server-Replication-with-Precomputed-Partitions.html

I am using Merge Replication with SQL Server 2008 R2 (server and clients). I have setup a publication with a rather complex filtering hierarchy rooted in a table with Parameterized Row Filters extended many times through Join Filters. Making modifications to data in a table near the root of this hierarchy would trigger repartitioning of the replicated data which would never complete and would cause deadlock errors in the other connections to the database while it ran (I let it run overnight once in single user mode, but had to kill it after 13 hours...).

Investigation Technical Details

After a lot of watching in SQL Profiler and digging around in the triggers/procedures which perform the repartitioning I found the culprit. The replication DML trigger on the table (MSMerge_op_GUID) called the repartitioning procedure (MSmerge_expand_sp_GUID) which included several queries with the following subquery:

SELECT partition_id
FROM dbo.MSmerge_current_partition_mappings cpm WITH (ROWLOCK)
INNER JOIN dbo.MSmerge_contents mc2 WITH (ROWLOCK) ON cpm.rowguid = mc2.rowguid AND mc2.marker = @marker

Looking at the execution plan for any of the queries showed that this subquery was responsible for at least 40% of the total query cost. Both of these tables are quite large (~800k and ~425k rows respectively in my DB) and neither had indexes to cover this (sub-)query.

Solution

So, of course, I added the following indexes (with naming conventions to match the existing indexes):

CREATE INDEX nc2MSmerge_current_partition_mappings
ON dbo.MSmerge_current_partition_mappings (rowguid, partition_id);
CREATE INDEX nc6MSmerge_contents
ON dbo.MSmerge_contents (marker, rowguid);

After adding these indexes, the repartitioning operations completed in under 20 minutes!

Caveat

Both of these tables are heavily used and often modified (depending on the workload), so adding more indexes may not be the best solution for databases with high-performance requirements where repartitioning is rare and non-repartitioning operations are the norm. If that is the case, I suggest creating the above indexes before major repartitioning operations and removing them once the repartition is complete. However, that being said, I have been able to leave these indexes on the tables with no noticeable performance impact and a significant reduction in execution time for data modifications which involve repartitioning data.