Welcome to MSDN Blogs Sign in | Join | Help

Merge replication and complex filters

I'm currently doing a design review for a SQL CE + SQL 2000 mobile system developed by Thomas Granath and Anders Lundstedt that uses merge replication as the basis for sending data to the device. The exercise was prompted by slow perf seen during replication. It’s a work order processing solution for field engineers dealing with routine maintenance and adhoc breakdown of factory equipment across several European countries. I wanted to share this because I learned a lot of very useful information about the workings of merge replication and specific mobile considerations.

 

Just to give you some idea of the size: The main DB is around ~20GB and has ~30 tables; the biggest is around 400k rows. The structure is not particularly complex with maybe 5 main fact tables including things like Technician, Assignment, Customer. The publication includes most tables and is horizontally and vertically partitioned to reduce the data set for each device. The filtering is very interesting: it uses a dynamic filter on the technician table from HOST_NAME() and uses join filters to pretty much every other table. Using these dynamic filters provides a very powerful way of restricting data size synchronized to a subscriber to exactly what is needed. It results in a typical DB size of around 10MB on the device. When using a dynamic filter there is no initial snapshot used to sync with the device as each user receives different data, so each device sync must pull the initial and subsequent merge data through a set of views that are created by the publication wizard. Further views are created for each table join used in the publication – more later.

 

The customer is seeing the server becoming CPU bound with only a few active replications taking place - the server is a quad proc, 4 gb proliant with next to nothing else running. Average replication time is 4 minutes with around 200 – 400 rows updates being sent to the subscriber, and 10 or so updates inbound from the subscriber. The load appears to be independent of db size (mostly), but is exponentially (its maybe more of a non linear growth rather than true exponential) affected by the number of active subscriptions, which is == to the number of technicians registered. Even when no data is being returned to the client, the merge takes the same length of time. An average technician connects maybe 3 or 4 times a day.

 

The tables are using GUID’s (UniqueIdentifiers columns) rather than identity columns. This does have an impact on perf as a guid requires 16 bytes and a bit more processing compared to a 4 byte integer, but benefits from significantly simpler setup for data inserts – identity column replication requires partitioning of the int range to ensure no clash, but guids are unique for free. Additionally I could see one or two badly placed clustered indexes but nothing to fully account for the drop in perf per technician.

 

I initially suspected the culprit for bad perf to be the complex filters used on the publication, cascading from the technician table and including quite a few joins. With a joined filter the generated view uses a FROM <previousview> clause to join the two views. The replication views that are generated end up looking pretty horrendous. In this case the worst join ended up being 4 views deep.

 

Looking a little deeper at the operation of the join filters in the merge these proved not to be the source of perf issues. We could deduce this because when an initial replication takes place its only using 5 to 10 seconds to send around 40k rows of data! Compare this with a merge replication taking 4 to 6 minutes and updates maybe 300 rows and its quite apparent the views in isolation cant be causing all that delay. What caused some confusion is these long merges only occur as the number of subscriptions rises to around 200 users – still with only ~4 active sync’s taking place at a time. With fewer users the individual merge times drop dramatically.

 

There is a ‘magic’ option in SQL 2000 called Optimize Subscription designed to minimize data transferred to a client - ideal when the client is a Windows Mobile device using a 9.6k link. This option creates 'before image' tables to assist tracking data moving between partitions (e.g. updating a row causes that data to become visible / removed for different subscribers - for example if a customer moved house and data was partitioned on geography). When an article has a change that has the potential to cause a partition change, the merge agent must send a delete row request to every relevant subscriber in order to ensure the rows are correctly removed. Before image tables allow the merge agent to track where data was prior to a partition change and so filter the deletes and remove much of the redundancy prior to sending them to a client. Before image tables are organized by ‘generation’. A generation is a sort of checkpoint in the data that is used to identify where the merge agent needs to start looking for changes for each of the synchronizations. They work like this:

  • lets start with generation 10. OLTP changes are added to the before image tables with that generation
  • when a subscriber replicates data it sends its generation to the merge agent that combines all changes for each subsequent generation since then.
  • When the replication is complete, a new generation is used. Any updates from the merge subscriber are added with the new generation.

This is really the only feature that can be affected directly by the addition of new subscriptions and so has to be the cause of merge slow down – books online has a specific warning about using Optimized Subscription setting for fast changing data for this very reason.

 

When a SQL CE device replicates it makes a single call the SQL CE ISAPI extension running under IIS. That code opens a connection with the database and eventually calls a procedure sp_MSsetupBelongs with a list of generation numbers to process. This is repeated until all generations have been processed. The sp_MSsetupBelongs procedure is moving over the before image tables generating a change table (temporary) that eventually contains all the data to be sent to the client. Obviously the performance of this iterative process is directly affected by the number of generations. It’s also effected by the number of changes per generation, but less so. The change table is updates repeatedly for each generation since the last merge, with rows being added and removed as appropriate.

Each new subscriber to the system will cause 4 or 5 merge processes per day. Each merge causes a new generation to be created and increases the load on every subscriber that subsequently merges thus increasing the delay in a non linear fashion.

 

So the obvious option is to turn off the Optimized Subscription option and stop all this before image table iteration and improve performance. But that takes us back to the original problem that the replication agent cant efficiently identify partition changes and so has to send lots of deletes to every client! The delete command is only 16 byte GUID key (+padding) but when there are ~4000 deletes being generated that’s ~64k per merge, over a 9.6k link gives about a minute just to send the delete commands to the client. The customer is going to investigate if saving in merge time makes this option viable in the short term.

 

The challenge here is they are too slow during merge if Optimization is on and too slow in data transfer if it’s off!

 

So what’s the answer?

 

We talked around several data relation changes to their DB and application to simplify the merge process, specifically looking at reducing the number of partition changes caused by normal processing. For example when a job is complete, don’t physically delete it from the device but instead mark it deleted and leave it on the client. Although the number of partition changes could be significantly improved there are two effects: increased database size and the need for some sort of batch update process.

 

The good news is that SQL 2005 and SQL Mobile bring a few improvements specifically for these problems:

  • Much improved filter performance on SQL 2005 – I haven’t found exact details yet. I will post my finding when I get some
  • Data compression from the IIS agent and the device, so the 64k of delete commands should be significantly improved.
  • Use ‘download only’ tables. Using download only will give them the option of removing several fast changing tables from the merge process and in combination with a delete flag and web service call has the potential to massively reduce partition changes and therefore delete request.
  • SQL Mobile supports multiple db connections giving the option to remove the sync process from user interaction and run it automatically in the background – this will remove (minimize) the user perception of delay which is nearly as good as making it really go fasterJ

It’s worth noting that the system design they currently have to solve their specific business problem is about right with maybe one or two minor modifications I could suggest. Using merge replication is an extremely powerful solution and radically simplifies the amount of custom code needed to build a solution of this complexity. It might be possible to achieve better merge perf using a different approach to the database schema by reducing partition changes but that would come at a significant dev cost for changing back office systems and the device application, and in the end can only become a compromise.

 

The one factor that stands out from the overall design is just how visible the synchronization process is to the user. The current system requires the user to manually initiate and wait for sync to complete – during the sync process the whole application is effectively locked from use. The outcome is that delays in sync due to network, device or server performance are fully visible to the user and form a big part of negative feedback. Ideally a mobile application should ‘just work’, hiding as far as possible the internal workings of network connection, data transmission and retrieval. The current design used in this system is partly historic but also SQL CE 2.0 limitation can impede the developers desire to hide sync operations from the user because of the single db connection limitation. Its still possible to protect the user from much of the network plumbing but requires careful thought and potential design alterations to fully succeed. SQL Mobile brings the welcome improvement of multi connection’s and associated data locking strategies.

 

So all that left for me to do is produce the design report doc for the customer… that means the real fun is over, I hate writing formal docs L

 

Marcus

Published Wednesday, August 24, 2005 9:16 AM by marcpe

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Merge replication and complex filters

Excellent article!

One of the things we ran into and as a result decided not to use replication for our sync system was that updating database table structures is a real pain because once a database is in replication mode it seems almost impossible to add tables/fields to a table without disabling the replication and re publishing.

Is there a way around this? Optimally I would think that replication should allow the primary database to have tables/fields added and those new fields should be pushed directly to all subscribers automatically. (and should avoid having to do a row push on those new items because of default values that are automatically added etc.)

Is there any way to do this?

Thanks, I know this isn't really on the topic that you're outlining, but it's of critical importance to us and we've never been able to find a work around...
Thursday, August 25, 2005 10:49 AM by James Hancock

# The Mobile Minute 111

..yet another TMM without any&amp;nbsp;banter.&amp;nbsp; Silence&amp;nbsp;is not such a bad thing. . .&amp;nbsp;
Software...
Sunday, August 28, 2005 6:27 PM by Nino.Mobile

# The Mobile Minute 111

..yet another TMM without any&amp;nbsp;banter.&amp;nbsp; Silence&amp;nbsp;is not such a bad thing. . .&amp;nbsp;
Software...
Sunday, August 28, 2005 6:29 PM by Nino.Mobile

# re: Merge replication and complex filters

James, I'm pretty sure SQL Mobile will publish schema changes to merge subscribers. I will check the details and post again on this.

Marcus
Thursday, September 01, 2005 8:07 AM by marcpe

# re: Merge replication and complex filters

This is a very good documentation that author wrote.  It is very similar to what we are experiencing now.  Keep up the good work, and this article is very helpful.
Wednesday, February 15, 2006 10:33 AM by JSuarez

# re: Merge replication and complex filters

Excellent description of the issue.  We have the same configuration you describe and experienced the same situation and solution.

We'll use your description as a reference for future developers working on our system.

Tuesday, February 27, 2007 9:24 AM by MPaolini

# re: Merge replication and complex filters

I have been struggling with the row filter thing for a while. Say I have two tables:

Table Task

{

   TaskID int (identity not null PK)

   JobName nvarchar(50)

}

Task UserTask

{

   UserID nvarchar(50),

   TaskID int (no FK to Task table, but holds the values from Task.TaskID)

}

Task's rowfilter is something like SELECT * FROM Task where TaskID in (SELECT TaskID FROM UserTask WHRER UserID=SUSER_SNAME())

When I do a reinitialize subscription, the Task rows were filtered correctly, but if just do regular synchronization, the filter doesn't seem to work since new rows to UserTask don't have any effect on Task rows to the subscriber.

FYI, it is SQL server and CE 2005. Any input will be appreciated.

Thursday, October 25, 2007 10:20 AM by Roy Luo (rluo@liquidframeworks.com)

# re: Merge replication and complex filters

Excellent article.

Thanks....

what is ur update on SQL 2005 merge?

Is there any user groups/msdn forums specifically for SQL mobile merge replication?

Tuesday, November 06, 2007 3:30 AM by AneeshAbraham

# re: Merge replication and complex filters

Aneesh,

I dont work with SQL at the moment but my comments on the improved merge features in this article covers many of the top level improvements.

Marcus

Tuesday, November 06, 2007 8:32 AM by marcpe

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker