How to Sync Large SQL Server Databases to SQL Azure

How to Sync Large SQL Server Databases to SQL Azure

Rate This
  • Comments 13

Over the past few days I have seen a number of posts from people who have been looking to synchronize large databases to SQL Azure and have been having issues.  In many of these posts, people have looked to use tools like SSIS, BCP and Sync Framework and have run into issues such as SQL Azure closing the transaction due to throttling of the connection (because it took to apply the data) and occasionally local out-of-memory issues as the data is sorted. 

For today’s post I wanted to spend some time discussing the subject of synchronizing large data sets from SQL Server to SQL Azure using Sync Framework.  By large database I mean databases that are larger than 500MB in size.  If you are synchronizing smaller databases you may still find some of these techniques useful, but for the most part you should be just fine with the simple method I explained here.

For this situation, I believe there are three very useful capabilities within the Sync Framework:

1)      MemoryDataCacheSize:  This helps to limit the amount of memory that is allocated to Sync Framework during the creation of the data batches and data sorting.  This typically helps to fix any out-of-memory issues.  In general I typically allocate 100MB (100000) to this parameter as the best place to start, but if you have larger or smaller amounts of free memory, or if you still run out-of-memory, you can play with this number a bit.

RemoteProvider.MemoryDataCacheSize = 100000;

 

2)      ApplicationTransactionSize (MATS):  This tells the Sync Framework how much data to apply to the destination database (SQL Azure in this case) at one time.  We typically call this Batching.  Batching helps us to work around the issue where SQL Azure starts to throttle (or disconnect) us if it takes too long to apply the large set of data changes.  MATS also has the advantage of allowing me to tell sync to pick up where it left off in case a connection drops off (I will talk more about this  in a future post) and has the advantage that it provides me the ability to get add progress events to help me track how much data has been applied.  Best of all it does not seem to affect performance of sync.  I typically set this parameter to 50MB (50000) as it is a good amount of data that SQL Azure can commit easily, yet is small enough that if I need to resume sync during a network disconnect I do not have too much data to resend.

RemoteProvider.ApplicationTransactionSize = 50000;

 

3)      ChangesApplied Progress Event:  The Sync Framework database providers have an event called ChangesApplied.  Although this does not help to improve performance, it does help in the case where I am synchronizing a large data set.  When used with ApplicationTransactionSize I can tell my application to output whenever a batch (or chunk of data) has been applied.  This helps me to track the progress of the amount of data that has been sent to SQL Azure and also how much data is left.

 

RemoteProvider.ChangesApplied += new EventHandler<DbChangesAppliedEventArgs>(RemoteProvider_ChangesApplied);

 

When I combine all of this together, I get the following new code that I can use to create a command line application to sync data from SQL Server to SQL Azure.  Please make sure to update the connection strings and the tables to be synchronized.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

 

using System.Data.SqlClient;

using Microsoft.Synchronization.Data.SqlServer;

using Microsoft.Synchronization.Data;

using Microsoft.Synchronization;

 

namespace SQLAzureDataSync

{

    class Program

    {

        public static string sqlazureConnectionString = "Server=[Your SQL Azure Server].database.windows.net;Database=AdventureWorksLTSQLAzure;User ID=[Your SQL Azure User Name]@[Your SQL Azure Server];Password=[Your SQL Azure Password];Trusted_Connection=False;Encrypt=True;";

        public static string sqllocalConnectionString = "Server=(local);Database=AdventureWorksLT2008;Trusted_Connection=True";

        public static readonly string scopeName = "AllTablesSyncGroup";

        public static int transactionCount;

        public static uint BatchSize = 50000;

        public static uint MemorySize = 100000;

 

        static void Main(string[] args)

        {

 

            // Test if input arguments were supplied:

            if (args.Length == 0)

            {

                System.Console.WriteLine("Please enter an argument.");

                System.Console.WriteLine("Usage: SyncTest.exe -setup");

                System.Console.WriteLine("       SyncTest.exe -sync");

            }

            else if (args[0] == "-setup")

                Setup();

            else if (args[0] == "-sync")

                Sync();

        }

 

        public static void Setup()

        {

            try

            {

 

                SqlConnection sqlServerConn = new SqlConnection(sqllocalConnectionString);

                SqlConnection sqlAzureConn = new SqlConnection(sqlazureConnectionString);

                DbSyncScopeDescription myScope = new DbSyncScopeDescription(scopeName);

 

                DbSyncTableDescription TestSchema1 = SqlSyncDescriptionBuilder.GetDescriptionForTable("TestSchema1", sqlServerConn);

 

                // Add the tables from above to the scope

                myScope.Tables.Add(TestSchema1);

 

                // Setup SQL Server for sync

                SqlSyncScopeProvisioning sqlServerProv = new SqlSyncScopeProvisioning(sqlServerConn, myScope);

                sqlServerProv.CommandTimeout = 60 * 30;

                if (!sqlServerProv.ScopeExists(scopeName))

                {

                    // Apply the scope provisioning.

                    Console.WriteLine("Provisioning SQL Server for sync " + DateTime.Now);

                    sqlServerProv.Apply();

                    Console.WriteLine("Done Provisioning SQL Server for sync " + DateTime.Now);

                }

                else

                    Console.WriteLine("SQL Server Database server already provisioned for sync " + DateTime.Now);

 

                // Setup SQL Azure for sync

                SqlSyncScopeProvisioning sqlAzureProv = new SqlSyncScopeProvisioning(sqlAzureConn, myScope);

                if (!sqlAzureProv.ScopeExists(scopeName))

                {

                    // Apply the scope provisioning.

                    Console.WriteLine("Provisioning SQL Azure for sync " + DateTime.Now);

                    sqlAzureProv.Apply();

                    Console.WriteLine("Done Provisioning SQL Azure for sync " + DateTime.Now);

                }

                else

                    Console.WriteLine("SQL Azure Database server already provisioned for sync " + DateTime.Now);

 

                sqlAzureConn.Close();

                sqlServerConn.Close();

            }

            catch (Exception ex)

            {

                Console.WriteLine(ex);

            }

        }

 

        public static void Sync()

        {

            try

            {

 

                SqlConnection sqlServerConn = new SqlConnection(sqllocalConnectionString);

                SqlConnection sqlAzureConn = new SqlConnection(sqlazureConnectionString);

 

 

                SqlSyncProvider RemoteProvider = new SqlSyncProvider(scopeName, sqlAzureConn);

                SqlSyncProvider LocalProvider = new SqlSyncProvider(scopeName, sqlServerConn);

 

                //Set memory allocation to the database providers

                RemoteProvider.MemoryDataCacheSize = MemorySize;

                LocalProvider.MemoryDataCacheSize = MemorySize;

 

                //Set application transaction size on destination provider.

                RemoteProvider.ApplicationTransactionSize = BatchSize;

 

                //Count transactions

                RemoteProvider.ChangesApplied += new EventHandler<DbChangesAppliedEventArgs>(RemoteProvider_ChangesApplied);

 

                SyncOrchestrator orch = new SyncOrchestrator();

                orch.RemoteProvider = RemoteProvider;

                orch.LocalProvider = LocalProvider;

                orch.Direction = SyncDirectionOrder.UploadAndDownload;

                Console.WriteLine("ScopeName={0} ", scopeName.ToUpper());

                Console.WriteLine("Starting Sync " + DateTime.Now);

                ShowStatistics(orch.Synchronize());

 

                sqlAzureConn.Close();

                sqlServerConn.Close();

            }

            catch (Exception ex)

            {

                Console.WriteLine(ex);

            }

        }

 

        public static void RemoteProvider_ChangesApplied(object sender, DbChangesAppliedEventArgs e)

        {

            transactionCount += 1;

            int totalBytes = transactionCount * Convert.ToInt32(BatchSize);

            Console.WriteLine("Changes Applied event fired: Transaction " + totalBytes+" bytes.");

           

        }

 

        public static void ShowStatistics(SyncOperationStatistics syncStats)

        {

            string message;

 

            message = "\tSync Start Time :" + syncStats.SyncStartTime.ToString();

            Console.WriteLine(message);

            message = "\tSync End Time   :" + syncStats.SyncEndTime.ToString();

            Console.WriteLine(message);

            message = "\tUpload Changes Applied :" + syncStats.UploadChangesApplied.ToString();

            Console.WriteLine(message);

            message = "\tUpload Changes Failed  :" + syncStats.UploadChangesFailed.ToString();

            Console.WriteLine(message);

            message = "\tUpload Changes Total   :" + syncStats.UploadChangesTotal.ToString();

            Console.WriteLine(message);

            message = "\tDownload Changes Applied :" + syncStats.DownloadChangesApplied.ToString();

            Console.WriteLine(message);

            message = "\tDownload Changes Failed  :" + syncStats.DownloadChangesFailed.ToString();

            Console.WriteLine(message);

            message = "\tDownload Changes Total   :" + syncStats.DownloadChangesTotal.ToString();

            Console.WriteLine(message);

        }

    }

}

 

 

Liam Cavanagh

Leave a Comment
  • Please add 1 and 5 and type the answer here:
  • Post
  • I want to synchronize a file system storage from a local office to a server destination.

    · The destination file storage is huge. About 1 TB in 24 million files and folders.

    · The source has only a few MB.

    · The problem I am facing is that the knowlegde file on the destination machine is huge (about 200MB). I mean the syncfile.metadata

    As I understood from the sych algorithm this file is sent to the source when the source want to send files to the destination

    I am seeing that the entire 200MB is sent on the network each time it needs to tell the source it’s knowledge.

    This solution is not good for me since there are about 50 sources trying to sync the server.

    And if I am running the Synctoy 2.1 which is based on the synchronization framework it doesn’t send this knowledge file and is much much faster.

    What should I change in order to make it work normal and fast for a huge destination?

    If the  Synctoy 2.1 is capable of doing it and it is wrapped aroung the sych framework what do i have to change to use this solution in my large enviorment?

    (BTW – I am not using hash for file comparison)

    please answer to

    pini@cliniworks.com

    if you can

    Thanks

  • Hi Pini,

    You mention that the source has only a few MB – so it looks like the local office only needs to synchronize a subset of files on the server.  If that is true, each local office endpoint with sync with the server with a different scope.

    Also, it looks like you only do one-way sync where the local office is always the source and the file server always the destination? Will changes from one local office flow down to another local office? Do you expect conflicts when syncing from source to destination?

    If you make sure that you define a different scope for each local office, and the scope on the server is only a subset of the whole data, you should see smaller metadata files.

    Thanks

    sid

  • Hi Sid, Thanks for replying

    The scenario is as follow:

    The remote offices are the once that generates the files and they should transfer the files to the destination file server only.

    They do not interact with each other. So the sync is a one way from the offices to the main server.

    The local office as you called it has indeed only a few MB but the synchronization with Micrsoft framework works with the synchfile.data which is the knowledge of each side.  during the session initialiaton the server transfers this file to the office so that the office will know what subset to send.

    Since the main server has 1TB of data (Tera) the knoledge file is huge and i cant accept this file being sent each time a synch oocurs.

    After that said does the scope thing you mentioed still valid? can you elaborate on that?

  • Couple more questions:

    if local office A uploads data to the file server, does that data then flow down to local office B?

    does anyone change the copy of the data on the file server?

  • No. The data is transfered only in one direction form the offices up to the main storage server.

    In the file server more data is added and since this storage also works with the same web application the offices are working.

    The full story is that each office has an exact copy of the web application but a local file storage so they can work OCA ( occetionally connected)

  • I am using sync framework since its CTP1.  It will work well for small size of data (either file or database).  For large DB or TB of files it is always a problem.

  • Thanks for the post, but I'm afraid this is not the full story to it.

    We have been using the November CTP of the AzureSyncProvider. It has always worked fine for us, with good performance.

    Given the posts about Azure being supported now natively by SyncFx 2.1, and the fact  that I dislike using CTP code in a production environment, we decided to upgrade. Since then the performance has gone down dramatically.

    We Synchronize SqlCe to Azure. The problem becomes apparent when a large number of new rows are present SqlCe (e.g 50.000). Change enumeration is fast enough (a few seconds), however the change application to Azure takes almost two minutes. Normally when we sync we have about 150.000 - 200.000 new rows, so you can see how this takes too much time.

    By performing a query on the change Tracking table of Azure I noticed a significant difference between elements that were added by the AzureSyncProvider CTP and the ones that were added by the 2.1 SqlSyncProvider: the [last_change_datetime] column of entries that we synced using the AzureSyncProvider all have the same timestamp. This suggests that a single transaction was used. In the elements synced using the SqlSyncProvider however at most 75 entries share the same timestamp. If those timestamps are indeed caused by using separate transactions then I can imagine why it takes a long time to sync (1)50.000 entries.

    For testing I have created both a new SqlCe database and a new Sql Azure database and provisioned these using SyncFx 2.1 to make sure this was not caused by missing stored procedures or anything like that.

    Is this a bug that I need to report, or is it by design somehow? And more important: is there a solution?

  • Hi,

    We have built an application to sync data using MS Sync framework 2.1 . The application runs successfully. However, there is a case when the rowcount in a table is a large as 7.2 millions. We have implemented batching in this application so that it runs successfully. However, it takes around 1.25 hours to successfully sync this table.

    We need to reduce this sync time. Can anyone help us to know the work around for a faster application?

  • Setup() is OK but initial Sync() is very slow (hours).  Using MS Sync FW 2.1 . . . 1.5 MM rows are already staged on local and remote (SQLAzure) servers.

  • One issue we have run into is the extra tables created by MS Sync 2.1. These tables and their indices are quite large and cause us to bump into our SQL Azure size limits. Is there any way around this?

  • Hi,

    I followed all the steps as in the article. Still I see network issues after inserting many records.

    I am seeing this information in the log file, as many records are inserted to database in transaction and later failed with network issue and data is rolling back.

    same issue is reproducing after many retries.

    Please let me know you have any other idea to fix this issue?

  • Unforutnately, even using these additional tuning parameters, my data sync between Azure DB 1 and Azure DB 2 was still just batching files after 7 days.  I don't think that this approach will work for larger Azure databases without being able to easily do the following:

    1) Batch and then transmit and transact.  This way, you don't spend a day or two waiting for all of the batches to be batched and transmitted and then rolled back near the end due to a SQL timeout or throttling issue.  If one batches and transacts in small chunks, one can slowly get the databases synchronized.

    2) Have an easy way to specify how much data to retrieve from each table for each batch, based on, for example, date/time rows.  Many of our tables reference other tables that have the appropriate row.  So, this mechanism would need to be fancier than the current Microsoft Data Sync filtering method, which I believe only allows one to specify columns in the table that one is currently synchronizing.

    I've now probably spent over 2 weeks of real developer time on:

    a) Issues with the SQL Azure Data Sync preview (total failure for synchronizing a larger DB to an on-premises empty copy).

    b) More issues with SQL Data Sync Framework-based application (see above).

    So, I've gone back to an old and proven technology -- SSIS.   Thankfully, we have an on-premises SQL Server and the knowledge of how to write and SSIS package that grabs X amount of data at a time, synchronizes it, and then repeats based on a SQL Server Agent job schedule.  It took me a few days to write thet SSIS package.  But, it succeeded the very first time.

    PS.  Our Azure DB is currently ~ 60 GB in size.

    Katy Leonard

    Senior Software Development Engineer

    Unify Square

  • Dear All,

    When I run SyncDB.exe -sync

    I got error below:

    Microsoft.Synchronization.Data.DbNotProvisionedException: the current operation could not be completed because the database is not provisioned for sync or you not have permission to the sync configuration tables.

    I used sa account but I still get that error.

    Could you help please?

Page 1 of 1 (13 items)