Introducing Geo-replication for Windows Azure Storage

Introducing Geo-replication for Windows Azure Storage

Rate This
  • Comments 7

We are excited to announce that we are now geo-replicating customer’s Windows Azure Blob and Table data, at no additional cost, between two locations hundreds of miles apart within the same region (i.e., between North and South US, between North and West Europe, and between East and Southeast Asia).  Geo-replication is provided for additional data durability in case of a major data center disaster.

Storing Data in Two Locations for Durability

With geo-replication, Windows Azure Storage now keeps your data durable in two locations. In both locations, Windows Azure Storage constantly maintains multiple healthy replicas of your data.

The location where you read, create, update, or delete data is referred to as the ‘primary’ location. The primary location exists in the region you choose at the time you create an account via the Azure Portal (e.g., North Central US). The location where your data is geo-replicated is referred to as the secondary location. The secondary location is automatically determined based on the location of the primary; it is in the other data center that is in the same region as the primary. In this example, the secondary would be located in South Central US (see table below for full listing). The primary location is currently displayed in the Azure Portal, as shown below. In the future, the Azure Portal will be updated to show both the primary and secondary locations. To view the primary location for your storage account in the Azure Portal, click on the account of interest; the primary region will be displayed on the lower right side under Country/Region, as highlighted below.

portalaccountprimaryregion

The following table shows the primary and secondary location pairings:

Primary

Secondary

North Central US

South Central US

South Central US

North Central US

North Europe

West Europe

West Europe

North Europe

South East Asia

East Asia

East Asia

South East Asia

Geo-Replication Costs and Disabling Geo-Replication

Geo-replication is included in current pricing for Azure Storage.

If you do not want your data geo-replicated you can disable geo-replication for your account. To turn geo-replication off, please contact Microsoft Windows Azure Support. Note that there is no cost savings for turning geo-replication off.

When you turn geo-replication off, the data will be deleted from the secondary location. If you decide to turn geo-replication on again after you have turned it off, there is a re-bootstrap egress bandwidth charge (based on the data transfer rates) for copying your existing data from the primary to the secondary location to kick start geo-replication for the storage account. This charge will be applied only when you turn geo-replication on after you have turned it off. There is no additional charge for continuing geo-replication after the re-bootstrap is done.

Currently all storage accounts are bootstrapped and in geo-replication mode between primary and secondary storage locations.

How Geo-Replication Works

When you create, update, or delete data to your storage account, the transaction is fully replicated on three different storage nodes across three fault domains and upgrade domains inside the primary location, then success is returned back to the client. Then, in the background, the primary location asynchronously replicates the recently committed transaction to the secondary location. That transaction is then made durable by fully replicating it across three different storage nodes in different fault and upgrade domains at the secondary location. Because the updates are asynchronously geo-replicated, there is no change in existing performance for your storage account.

Our goal is to keep the data durable at both the primary and secondary location. This means we keep enough replicas in both locations to ensure that each location can recover by itself from common failures (e.g., disk, node, rack, TOR switch failing, etc), without having to talk to the other location. The two locations only have to talk to each other to geo-replicate the recent updates to storage accounts. They do not have to talk to each other to recover data due to common failures. This is important, because it means that if we had to failover a storage account from the primary to the secondary, then all the data that had been committed to the secondary location via geo-replication will already be durable there.

With this first release of geo-replication, we do not provide an SLA for how long it will take to asynchronously geo-replicate the data, though transactions are typically geo-replicated within a few minutes after they have been committed in the primary location.

How Geo-Failover Works

In the event of a major disaster that affects the primary location, we will first try to restore the primary location. Dependent upon the nature of the disaster and its impacts, in some rare occasions, we may not be able to restore the primary location, and we would need to perform a geo-failover. When this happens, affected customers will be notified via their subscription contact information (we are investigating more programmatic ways to perform this notification). As part of the failover, the customer’s “account.service.core.windows.net” DNS entry would be updated to point from the primary location to the secondary location. Once this DNS change is propagated, the existing Blob and Table URIs will work. This means that you do not need to change your application’s URIs – all existing URIs will work the same before and after a geo-failover.

For example, if the primary location for a storage account “myaccount” was North Central US, then the DNS entry for myaccount.<service>.core.windows.net would direct traffic to North Central US. If a geo-failover became necessary, the DNS entry for myaccount.<service>.core.windows.net would be updated so that it would then direct all traffic for the storage account to South Central US.

After the failover occurs, the location (what use to be the secondary) that is accepting traffic is considered the new primary location for the storage account. This location will remain the primary location unless another geo-failover was to occur. In addition, after a storage account does a failover to a new primary, we will bootstrap a new secondary, which will also be in the same region. In the future we plan to support the ability for customers to choose their secondary location (when we have more than two data centers in a given region), as well as the ability to swap their primary and secondary locations for a storage account.

Order of Geo-Replication and Transaction Consistency

Geo-replication ensures that all the data within a PartitionKey is committed in the same order at the secondary location as at the primary location. This said, it is also important to note that there are no geo-replication ordering guarantees across partitions. This means that different partitions can be geo-replicating at different speeds. However, once all the updates have been geo-replicated and committed at the secondary location, the secondary location will have the exact same state as the primary location. However, because geo-replication is asynchronous, recent updates can be lost in the event of a major disaster if a failover occurs.

For example, consider the case where we have two blobs, foo and bar, in our storage account (for blobs, the complete blob name is the PartitionKey).  Now say we execute transactions A and B on blob foo, and then execute transactions X and Y against blob bar.  It is guaranteed that transaction A will be geo-replicated before transaction B, and that transaction X will be geo-replicated before transaction Y.  However, no other guarantees are made about the respective timings of geo-replication between the transactions against foo and the transactions against bar. If a disaster happened and caused recent transactions to not get geo-replicated, that would make it possible for transactions A and X to be geo-replicated, while losing transactions B and Y. Or transactions A and B could have been geo-replicated, but neither X nor Y had made it. Along with other possible combinations, where the only guaranteed ordering of geo-replicated transactions are those to the same blob.  The same holds true for operations involving Tables, except that the partitions are determined by the application defined PartitionKey of the entity instead of the blob name. For more information on partition keys, please see Windows Azure Storage Abstractions and their Scalability Targets.

Because of this, to best leverage geo-replication, one best practice is to avoid cross-PartitionKey relationships whenever possible. This means you should try to restrict relationships for Tables to entities that have the same PartitionKey value. Since all transactions within a single partitionKey are geo-replicated in order, this guarantees those relationships will be committed in order on the secondary.

The only multiple object transaction supported by Windows Azure Storage is Entity Group Transactions for Windows Azure Tables, which allow clients to commit a batch of entities together as a single atomic transaction, since they have the same PartitionKey value. Geo-replication also treats this batch as an atomic operation. Therefore, the whole batch transaction is always committed atomically on the secondary.

Summary

This is our first step in geo-replication, where we are now providing additional durability in case of a major data center disaster. The next steps involve developing features needed to help applications recover after a failover, which is an area we are investigating further.

Brad Calder and Monilee Atkinson

Leave a Comment
  • Please add 8 and 3 and type the answer here:
  • Post
  • awesome post..I really like the such type of post....Keep it up

    <a href="www.boxmypad.com/"> college packing checklist</a>

  • Its a nice feature though a few questions

    1. After failover to secondary storage when would it come back to primary storage again? Because the secondary storage (which is new primary in case of failover would increase latency)

    2.Are you planning to provide this with API support to be used in Azure Traffic Manager?

    Keep it up

    -Sachin

    Sachin at cumulux dot com

  • Hi Sachin

    As mentioned above, after failover the old secondary would be the new primary and it would stay that way until there was another geo-failover (for what we have in production today).  As you point out, that would increase latency if your compute was still in the old primary location and not in the new primary location.     To reduce that latency, some customers may want to move their compute or some of their compute to the new primary location if there was a storage failover.     Also mentioned, in the future we plan to support the ability for you to choose what location to place the new secondary in and the ability to swap the primary and secondary locations for a storage account.  I don’t have a timeline for when we will have this capability, but the goal is to allow the customer to switch which location is the primary and which one is the secondary (once a secondary location is re-bootstrapped after the failover).

    For your second question, Azure Storage doesn’t use the Azure Traffic Manager.  It just uses standard DNS to direct your Uris to the correct primary location for your storage account.

    Thanks
    Brad

  • Hi

    Great News, thanks for that. i have 3 questions

    1. just to have more understanding - i made action to main table, it replicate to 3 replications, then to other 3 replication in the Secondary locations? so we have 6 places were our data exists?

    2. when when when we will have it in SQL Azure?

    3. do we have GUI to that Secondary location geo-replicating ?

    thanks for the efforts

    pini

  • Hi Pini

    Here are some answers to your questions:

    1. Correct, we keep multiple copies of your data in each of the 2 locations.  This allows us to recover from common hardware failures within the same data center, so if there is a failover we have a durable copy of all of the data that was committed to the secondary ready to be used.  

    2. Don’t have a roadmap for SQL Azure.   Possibly ask the question here:

    social.msdn.microsoft.com/.../threads

    3. There isn’t any at this time, but that is a feature request we have.

    Thanks

    Brad

  • It was must to have feature to ensure you data is available whatever happens, however I have few suggestions on this

    1. User should have provision to select his 'Secondary', this is like I want to decide where I want to keep extra copies of my data. This is for specific customers who has got some regulations to follow which may become a matter of concern if specific country / region is chosen automatically.

    2. There should be a option of having maintaining log of transactions which can be played again on secondary during failover to ensure complete data is available.

    Otherwise it is great step ahead on convincing customer on storage with 6 copies without any efforts ........

  • Hi Laxmikant

    For (1), we will look at doing that in the future when we have more than two data centers in a given region.

    For (2), with asynchronous geo-replication there is no way to do that if there is an unplanned failover (major data center disaster), since the whole focus on async is to commit the data quickly on the primary, ack back to the client, and then asynchronously geo-replicate the changes to the secondary.   We have had feature requests for providing synchronous geo-rep (RPO of 0), and then yes that would ensure that the update is on both the primary and secondary before returning success back to the client.     What some customers do to achieve that level of consistency (between the primary and secondary)right now is to perform their own form of geo-rep at their application level, where in their application level logic they store the data in two locations and maintain consistency between them at the application level.

    Thanks

    Brad

Page 1 of 1 (7 items)