Windows Azure Blob service provides mechanisms to ensure data integrity both at the application and transport layers. This post will detail these mechanisms from the service and client perspective. MD5 checking is optional on both PUT and GET operations; however it does provide a convenience facility to ensure data integrity across the network when using HTTP. Additionally since HTTPS provides transport layer security additional MD5 checking is not needed while connecting over HTTPS as it would be redundant.
To ensure data integrity the Windows Azure Blob service uses MD5 hashes of the data in a couple different manners. It is important to understand how these values are calculated, transmitted, stored, and eventually enforced in order to appropriately design your application to utilize them to provide data integrity.
Please note, the Windows Azure Blob service provides a durable storage medium, and uses its own integrity checking for stored data. The MD5's that are used when interacting with an application are provided for checking the integrity of the data when transferring that data between the application and service via HTTP. For more information regarding the durability of the storage system please refer to the Windows Azure Storage Architecture Overview.
The following table shows the Windows Azure Blob service REST APIs and the MD5 checks provided for them:
REST API
Header
Value
Validated By
Notes
Put Blob
x-ms-blob-content-md5
MD5 value of blobs bits
Server
Full Blob
Content-MD5
Full Blob, If x-ms-blob-content-md5 is present Content-md5 is ignored
Put Block
MD5 value of block bits
Validated prior to storing the block
Put Page
MD5 value of page bits
Validated prior to storing the page
Put Block List
Client on subsequent download
Stored as the Content-MD5 blob property to be downloaded with blob for client side checks
Set Blob Properties
Sets the blob Content-MD5 property.
Get Blob
Client
Returns the Content-MD5 property if one was stored/set with the blob
Get Blob (range)
MD5 value of blobs range bits
If client specifies x-ms-range-get-content-md5: true the Content-MD5 header will be dynamically calculated over the range of bytes requested. This is restricted to <= 4 MB range requests
Get Blob Properties
Table 1 : REST API MD5 Compatibility
From the Windows Azure Blob Storage service perspective the only MD5 values that are explicitly calculated and validated on each transaction are the transport layer (HTTP) MD5 values. MD5 checking is optional on both PUT and GET operations. Note, since HTTPS provides transport layer security when using HTTPS any additional MD5 checking would be redundant, so MD5 checking is not needed when using HTTPS. We will be discussing two separate MD5 values which will provide checks for at different layers:
We have already discussed above how the Windows Azure Blob service can provide transport layer security via the Content-MD5 HTTP header or HTTPS. In addition to this the client can store and manually validate MD5 hashes on the blob data from the application layer. The Windows Azure Storage Client library provides this calculation functionality via the exposed object model and relevant abstractions such as BlobWriteStream.
When utilizing the CloudBlob Convenience layer methods in most cases the library will automatically calculate and transmit the application layer MD5 value. However, there is an exception to this behavior when a call to an upload method results in
In both of the above cases, an MD5 value is not passed in to be checked, so in this scenario if the client requires data integrity checking they need to make sure and use HTTPS. (HTTPS can be enabled when constructing a CloudStorageAccount via the constructor or by specifying HTTPS as part of the baseAddress when manually constructing a CloudBlobClient)
All other blob upload operations from the convenience layer in the SDK send MD5’s that are checked at the blob service.
In addition to the exposed object methods, you can also provide the x-ms-blob-content-md5 header via the Protocol layer on a PutBlob or PutBlockList request.
The below table lists the convention functions used to upload blobs, and which ones support sending MD5 checks and when they are sent.
Layer
Method
Convenience
CloudBlob.OpenWrite
MD5 is sent. Note, this function is not currently supported for PageBlob
CloudBlob.UploadByteArray
MD5 is sent if:
CloudBlob.UploadFile
CloudBlob.UploadText
CloudBlob.UploadFromStream
Table 2 : Blob upload methods MD5 compatibility
The CloudBlob Download methods do not provide application layer MD5 validation; as such it is up to the application to verify the Content-MD5 returned against the data returned by the service. If the application layer MD5 value was specified on upload the Windows Azure Storage Client Library will populate it in CloudBlob.Properties.ContentMD5 on any download (i.e. DownloadText, DownloadByteArray, DownloadToFile, DownloadToStream, and OpenRead).
The example below shows how a client can validate the blobs MD5 hash once all the data is retrieved.
Example
// Initialization string blobName = "md5test" + Guid.NewGuid().ToString(); long blobSize = 8 * 1024 * 1024; StorageCredentialsAccountAndKey creds = new StorageCredentialsAccountAndKey(AccountName, AccountKey); CloudStorageAccount account = new CloudStorageAccount(creds, false); CloudBlobClient bClient = account.CreateCloudBlobClient(); // Set CloudBlobClient.SingleBlobUploadThresholdInBytes, all blobs above this // length will be uploaded using blocks bClient.SingleBlobUploadThresholdInBytes = 4 * 1024 * 1024; // Create Blob Container CloudBlobContainer container = bClient.GetContainerReference("md5blobcontainer"); Console.WriteLine("Validating the Container"); container.CreateIfNotExist(); // Populate Blob Data byte[] blobData = new byte[blobSize]; Random rand = new Random(); rand.NextBytes(blobData); MemoryStream retStream = new MemoryStream(blobData); // Upload Blob CloudBlob blobRef = container.GetBlobReference(blobName); // Any upload method will work here: byte array, file, text, stream blobRef.UploadByteArray(blobData); // Download will re-populate the client MD5 value from the server byte[] retrievedBuffer = blobRef.DownloadByteArray(); // Validate MD5 Value var md5Check = System.Security.Cryptography.MD5.Create(); md5Check.TransformBlock(retrievedBuffer, 0, retrievedBuffer.Length, null, 0); md5Check.TransformFinalBlock(new byte[0], 0, 0); // Get Hash Value byte[] hashBytes = md5Check.Hash; string hashVal = Convert.ToBase64String(hashBytes); if (hashVal != blobRef.Properties.ContentMD5) { throw new InvalidDataException("MD5 Mismatch, Data is corrupted!"); }
Figure 1: Validating a Blobs MD5 value
Page blobs are designed to provide a durable storage medium that can perform a high rate of IO. Data can be accessed in 512 byte pages allowing a high rate of non-contiguous transactions to complete efficiently. If HTTP needs to be used with MD5 checks, then the application should pass in the Content-MD5 on PutPage, and then use the x-ms-range-get-content-md5 on each subsequent GetBlob using ranges less than or equal to 4MBs.
Considerations
Currently the convenience layer of the Windows Azure Storage Client Library does not support passing in MD5 values for PageBlobs, nor returning Content-MD5 for getting PageBlob ranges. As such, if your scenario requires data integrity checking at the transport level it is recommended that you use HTTPS or utilize the Protocol Layer and add the additional Content-MD5 header.
In the following example we will show how to perform page blob range GETs with an optional x-ms-range-get-content-md5 via the protocol layer in order to provide transport layer security over HTTP.
// Initialization string blobName = "md5test" + Guid.NewGuid().ToString(); long blobSize = 8 * 1024 * 1024; // Must be divisible by 512 int writeSize = 1 * 1024 * 1024; StorageCredentialsAccountAndKey creds = new StorageCredentialsAccountAndKey(AccountName, AccountKey); CloudStorageAccount account = new CloudStorageAccount(creds, false); CloudBlobClient bClient = account.CreateCloudBlobClient(); bClient.ParallelOperationThreadCount = 1; // Create Blob Container CloudBlobContainer container = bClient.GetContainerReference("md5blobcontainer"); Console.WriteLine("Validating the Container"); container.CreateIfNotExist(); int uploadedBytes = 0; // Upload Blob CloudPageBlob blobRef = container.GetBlobReference(blobName).ToPageBlob; blobRef.Create(blobSize); // Populate Blob Data byte[] blobData = new byte[writeSize]; Random rand = new Random(); rand.NextBytes(blobData); MemoryStream retStream = new MemoryStream(blobData); while (uploadedBytes < blobSize) { blobRef.WritePages(retStream, uploadedBytes); uploadedBytes += writeSize; retStream.Position = 0; } HttpWebRequest webRequest = BlobRequest.Get( blobRef.Uri, // URI 90, // Timeout null, // Snapshot (optional) 1024 * 1024, // Start Offset 3 * 1024 * 1024, // Count null); // Lease ID ( optional) webRequest.Headers.Add("x-ms-range-get-content-md5", "true"); bClient.Credentials.SignRequest(webRequest); WebResponse resp = webRequest.GetResponse();
Figure 2: Transport Layer security via optional x-ms-range-get-content-md5 header on a PageBlob
This article has detailed various strategies when utilizing MD5 values to provide data integrity. As with many cases the correct solution is dependent on your specific scenario.
We will be evaluating this topic in future releases of the Windows Azure Storage Client Library as we continue to improve the functionality offered. Please leave comments below if you have questions.
Joe Giardino