Data Sync Service for SQL Azure allows you to perform bi-directional synchronization between two or more SQL Azure databases. This blog post will discuss the basics of Data Sync Services for SQL Azure.
SQL Azure Data Sync Service runs on the Windows Azure making use of web and worker roles using the Microsoft Sync Framework. The Windows Azure account is one that is owned by the Microsoft Sync Team, and is managed by them; Data Sync Service for SQL Azure is not an application that you install within a role on you Windows Azure account. This service is hosted currently in the data center in South Central US. There is an online web interface that manages the configuration of your synchronization sets, allowing you to configure what databases you want to synchronize on what servers, in what data centers and how often you want them to synchronize. Currently, SQL Azure Data Sync Service is in CTP and is available in SQL Azure Labs.
In every synchronization scenario, there is a single hub database and many member databases. When you setup the synchronization group, you declare the hub database and assign as many empty member databases as you want.
On the first synchronization, Data Sync Service for SQL Azure provisions the member databases by copying the hub databases schema to them. In other words, you do not have to generate the schema on member databases; this is done for you by Data Sync Service for SQL Azure. Changes are also made to the hub database to efficiently track data changes.
When the schema is copied from the hub database to the member databases the foreign key constraints are not copied. This allows the table data to be inserted in any order on the member database. By not enforcing the foreign key constraints on member databases Data Sync Service for SQL Azure doesn’t have to convert the hub database to read-only on the first synchronization, plus it works with data schemas that have circular foreign key references (for more about circular references see this blog post).
After the member databases have been provisioned, on the first synchronization, all the data from the hub database is copied to the member databases table by table. The order of the inserts doesn’t matter, since there are no foreign key constraints on the member databases.
The member database can be on servers that are not in the same database center as Data Sync Service (South Central US). If all your databases are in different data centers, when you synchronize your databases (transfer data) you will be charged for bandwidth for both directions for database that are not in South Central US.
The limitation of only being able to run Data Sync Service from South Central US is a limitation of the CTP. As with the other Windows Azure Platform services, if all your databases are in the same data center and the Data Sync Service is running there, there will be no fee for data exchange.
Only the data that has changed will be transferred on synchronization, not the whole database. However, since that data needs to be synchronized with all member databases and the hub, the number of members is a multiplier in the cost of the data transfer.
For more about pricing click here.
Data Sync Service for SQL Azure allows you to pick the tables you would like to synchronize. You do not have to synchronize the whole database. This allows you to create a synchronization group where the member databases are a subset of the hub. This create an interesting scenario where you can give permissions to a member database that are different than the permission on the hub database, allowing users to access a subset of the data. I will discuss how to do this in a later blog post using PowerPivot.
When Data Sync Service for SQL Azure creates the schema on the member databases it is a copy of the schema on the hub database. The hub databases schema cannot be changed once the synchronization group has been established. In order to change the schema on the hub database you need to remove the synchronization group and recreate a new synchronization group which will reallocate your member databases; re-generating their schemas and moving the data again.
Synchronization can be done on demand or on a set schedule, unlike merge replication there is no real-time synchronization. The smallest increment for schedule replication is currently an hour in the current CTP, more frequent synchronization will be coming in the release version. Synchronization can also be done on demand.
Synchronization on Data Sync Service happens bi-directionally you can write to the member databases and when synchronization occurs, the hub and all member databases are updated. Or, you can write to the hub database and synchronize the member databases. In other words, all the databases are read write.
Data Sync Service for SQL Azure doesn’t allow you to program the handling of conflict resolution as compared to the Microsoft Sync Framework. For this reason it is best if you are inserting row into multiple member databases that you use uniqueidentifier data type as primary keys to your tables, instead of the int IDENTITY data type. This will prevent identity conflicts during the bi-directional synchronization. If you are working with a legacy database or do not want to use the uniqueidentifier data type, there are several other methods of ensuring a unique primary key across all the databases. I will blog more about this later.
Do you have questions, concerns, comments? Post them below and we will try to address them.
This is a really interesting feature. As far as picking tables to synchronize in the partial synchronization scenario, it could be very useful to synch not just a subset of tables but a query or view over a single table. For example, filtering out a single customer's data from a multi-tenant database into their own database where we could then enable OData access for the individual customer.
Right now, one-way synch from a hub to a member db in another datacenter is a decent strategy for redundancy, too.
Pete: You can do the Filter Scenario (row based filtering) using the Sync Framwork, however not from Data Sync Service for SQL Azure currently. You would need to enable your own worker role from Windows Azure and do a little programming.
I m interested to know what underlying feature/technology of SQL Server is being used to determined the changes for sync? Any article pointer please.. Thanks.
If they are...Change Data Capture (CDC) and/or Change Tracking; I would like to know any issues that we need to take care if the system already being using CDC and/or transactional replication features.
Mahesh: Data Sync Service for SQL Azure is using the Sync Framework (msdn.microsoft.com/.../default.aspx). Tables are added to your main database to track changes.
Thanks Wayne fo response. It would be good if we can know the architecture and internal mechanisms of sync process; as I can see it is adding/creating new objects and metadata... it would be good to know beforehand, before I mess with my db. :) . Also need undo process to rollback the same incase of issues.
That is a really good suggestion and I will plan to add a post to our blog (http://blogs.msdn.com/sync) that covers this architecture topic of the Data Sync Service. Just so you know, one of the planned features for the next Service Update to the Data Sync Service is a "clean-up" capability that would remove all the objects we created from your SQL Azure database.
You have mentioned that scheduled synchronization is possible. I am using microsoft sync frame work for creating a backup tool. In Microsoft sync how can we mention the schedules like every 1 hour or daily or weekly? I am using C# code for programming. I would like to know which class/interface provide methods for scheduling sync.
Right nw we are using a worker role for running sync operations and put the thread to sleep for a particular interval, It will be great if we can schedule it in Microsoft sync framework itself.
In my application i am creating dynamic tables like i am storing different customer data in different tables (Dynamic Schema). How well i leverage of Data Sync Service from Azure? I would like to use Active/Active deployment for high availability. How well I use Data sync Service to Sync databases in same time?