Building Reliable Cloud Systems on Microsoft Azure

Sometimes mission critical systems may have higher SLA requirements than those offered by the cloud platform services. It is possible to build systems whose availability is higher than the constituent services by incorporating necessary measures into the system architecture. Airplanes, satellites, nuclear reactors, oil refineries and other mission critical real world systems have been building highly available architectures from components that are prone to fail.

These systems employ a varying degree of redundancy based on the tolerance for failure. For instance, Space Shuttle used 4 active redundant flight control computers with the fifth one on standby reduced functionality mode. Through proper parallel architecture for redundancy, software systems can attain the necessary reliability on general purpose cloud platforms like Windows Azure.

Building systems with reliability of 99.99 or above is a lot of hard work and can get expensive pretty fast. Cloud platforms like Microsoft Azure gives you the necessary tools but you have to do the hard work of designing the necessary redundancy into the system layers.

Just published a paper on this subject at: https://download.microsoft.com/download/1/C/2/1C23BE8E-D0D8-448C-BF38-A0708C9EF9F5/Building_Mission_Critical_Systems_on_Cloud_Platforms.pdf.

Technorati Tags: Software Reliability,Microsoft Azure,99.99,99.999,misssion critical cloud systems,software redundancy