DCSIMG
Ensuring High Availability of Web Applications in Windows Azure - Windows Azure - Site Home - MSDN Blogs

Ensuring High Availability of Web Applications in Windows Azure

Ensuring High Availability of Web Applications in Windows Azure

  • Comments 7

To prevent your Windows Azure applications from potentially experiencing a loss of availability during platform upgrades, we recommend deploying more than one instance of your web-facing roles. Doing so also enforces the Windows Azure Compute Service Level Agreement (SLA), which guarantees 99.95% external connectivity for Internet-facing roles when two or more role instances are deployed for a given application.   Please click here to download the full Windows Azure Compute SLA.


Comments
  • Just to add to this point:

    The individual role instances may be restarted at any time due to a number of reasons, including the following:

    a)      Role instance cycling

    b)      Role instance moving to another physical node

    c)      Physical node rebooting (which is hosting the role instance)

    d)      Physical node undergoing an OS upgrade

    With only a single instance, the entire role is unavailable when the role instance is being restarted.  With more than one instance, Windows Azure ensures that at least one instance is still available during these operations. In order to maintain higher availability, we recommend running more than one instance of each internet-facing role in your Windows Azure application and your application will not experience downtime with 2 or more instances of internet facing role(web role).

    - Azure Kid

  • Hi Team,

    I have seen people complaining that their Live Website went down completely for minutes because of the Windows Azure Platform OS upgrade and it came back up and running after sometime. They had only one role instance in their service. Azure support team has told them to keep more than one instances.

    Questions for you:

    ==============

    1. Why can't you update the customer's beforehand if you are performing an OS upgrade? I believe this is a planned process initiated by the Microsoft Engineers.

    2. Suggesting the customer's to have more than one instance is like telling them to have an additional tyre incase the original tyre get punctured. If the customer wants to get more instance, he will be ending up paying more money. What if the customer just wanna proceed with 1 role instance? What is your solution for not getting affected during these OS Upgradation?

    Thanks in advance for your reply

    Azure Kid

  • @AzureKid, load balancing would be used with onpremise severs also even when there is moderate need for availability for web application. So requirement for at least two role instances is fair constraint to provide availability. With respect to OS upgrade, AFAIK customer could explicitly specify OS version, in which case OS upgrades are not done by MS automatically and customer can choose to manually upgrade OS with planned down time. HTH.

  • @ AzureDeveloper: Thanks alot. I did not know that the customers can explicitly specify OS version and do their upgrade their own. Thanks for the info.

    Any further explanation/ exeprience are also welcome.

  • Just checking, was assuming as you say that load-balancing would make having two instances come in at very close to the same CPU+storage fees, but worry about some smb app's of mine I have planned where there's little activity for hours, low concurrency, what fee-based effect, if any, does this have?

    thanks

  • Why don't you offer to pay for the second instance? The whole point of cloud computing is for reliability and scalability. As a customer, I shouldn't have to pay extra to get this.

  • I have a lot of respect for the Azure platform and the people working on it, but the SLA really comes off as a marketing feature rather than something that can actually help companies guarantee 99.95% role uptime.

    A quote from the SLA:

    "'Connectivity downtime' is the total accumulated minutes that deployed Internet facing roles that have not been stopped by action from Customer have no external connectivity during a five minute period, as measured and aggregated in five minute intervals."

    As I interpret this, there is no guarantee that every other transaction won’t be lost.  Also, the other parts of the SLA seem to imply a company using the instances is completely responsible for recording, reporting and proving that a role is actually suffering downtime and not just having network connectivity issues.

    Seems to me that this SLA will never be exercisable realistically. If a company is small and only has 2 instances, they probably won’t have the tools or the resources to be able to record downtime at 5 minute intervals.  If a company is large and has 10 instances per role they are much less likely to have role downtime since all their instances must be unreachable in order to observe downtime.

    I realize that this does give incentives for Microsoft to keep datacenters up, but I submit that it is no more than the fear of customer disappointment and damaged brand integrity.  Realistically, if customers actually have <99.5% uptime of their 2 instance setup, it will be very difficult to prove.

Page 1 of 1 (7 items)
Leave a Comment
  • Please add 8 and 1 and type the answer here:
  • Post