There are several ways to create virtual machines (VM) in Windows Azure. One option is to create a new VM from the Azure management portal using a platform image from the Image Gallery. You can also create an image in an on-premises environment, upload the .vhd file to Windows Azure and then use it to create a virtual machine. The VHD format used in Windows Azure is the same as in an on-premises environment, which allows you to move .vhd disks between on-premises environments and Windows Azure.

Note: There are two important terms for .vhd disks: image and disk – an image is a generalized template you can use to create a new VM, and a disk is a VHD that is used by a specific VM, i.e. a configured VM and could be an OS disk or data disk – for more details, see Virtual Machine building blocks.

Once a VM is created (from image or booted from OS disk) you can attach additional data disks; thus, multiple disks can be associated with a single VM. (For details on creating a VM in Windows Azure, see Create a Virtual Machine Running Windows Server.)

The VHDs for VMs in Windows Azure reside on Azure Storage as page blobs. Hence, Virtual Machines benefit from the scalability and durability of the underlying storage service (e.g. with up to 6 replicas of a VHD). But there is another interesting aspect to this – the Windows Azure Storage service provides a great API that allows you to work directly with blobs. For page blobs you can also write to a range of bytes in a blob. Thus, if you follow the VHD Image Format Specification you can create a valid VHD directly through the Azure API.

We conducted a POC to validate the feasibility of this possibility. The POC scenario was defined and implemented as follows:

  • Create a VHD on-premises
  • Mount it as a drive on-premises and create folder and files
  • Un-mount the VHD and replicate it to an Azure page blob (initial sync)
  • After the initial sync mount the VHD on-premises again and change files
  • Send page updates for offset changes to the page blob in Azure (incremental sync)
  • Stop writing changes to the VHD
  • Create a VM in Windows Azure and add the VHD from the page blob as a data disk

After implementing and testing this scenario, we successfully verified that a) we were able to use the VHD in Azure as a data disk and b) all file changes were properly synchronized. While this scenario is fairly artificial, we demonstrated that you can directly modify VHDs in page blobs and assuming you keep or produce a valid VHD format you can use the page blob as a .vhd disk for an Azure VM.

In this POC we used the REST API Put Blob operation to create and initialize the page blob with the size of the VHD. To write content to a page blob we used Put Page by specifying the range of bytes at specific offset to be written.

A single threaded replication of changes to a page blob is fairly simple to implement. Depending on throughput requirements, it might be necessary to spin up multiple threads for replication (as part of the POC we demonstrated increased throughput when using multi-threading). Sequence of writes is an important consideration and if processing is multi-threaded and operations happen in parallel, there will be a need to guarantee write sequence. The Azure API offers you support for that through the "x-ms-blob-sequence-number" property; however, for every write you'll need to make another API call to update the blob sequence number. This might have an impact on the overall throughput.

Another common requirement is compression of data for the transfer. This could be implemented in the following way: the client can compress the data and then write it to a temporary blob in Azure; a worker role running in Azure can take care of decompressing the data from the temporary blob and then transferring it to the final page blob.
This pattern can be also combined with the previous requirement of guaranteeing the sequence of writes. The worker role can bring the requests in order before finalizing the write to the final blob. The process may sound easier than it is in reality because it will also involve an additional queue and/or a storage table for control purposes. This pertains more to a general pattern of multi-threaded transfer of data while guaranteeing order of request and goes beyond the scope of this post. Here it is important to mention that while introducing some additional cost for the worker role that will be running in Azure (as well as queue and/or table, where additional cost is probably negligible), this pattern is more flexible and will allow you to solve additional requirements (specific to your app). The compression/decompression is a good example of such a requirement that will help you to reduce bandwidth consumption and could help you to increase the overall throughput of transfer.

In this blog post I described an outcome of a POC that demonstrated the ability to create or modify a VHD directly in a page blob through the Azure API that can be used as a VHD disk for an Azure VM. In addition to the standard ways of creating a new VM directly in Azure, or uploading an existing one from an on-premises environment, this may open interesting ways for creating disks and VMs in Windows Azure.