Disclaimer: The procedures explained in this document are now current as of September 2014, and are subject to change without notice.
Of utmost concern to enterprise customers (indeed all customers) who are considering deploying their applications to Microsoft Azure is the security of their underlying data. A sometimes overlooked aspect of data protection is assuring that when disk space is freed and reallocated to other customers that those new owners cannot read the data that was there when the space was released. An extreme form of this is when the drives themselves are removed from the data center for disposal or repurposing for other tasks. The most straightforward way to make such a guarantee would be to overwrite the freed space with zeros or some other pattern before freeing it. Such overwrites can significantly impact performance, so Azure (like most systems) uses more complex but more efficient mechanisms.
In this post, we will find practices implemented in Microsoft Azure and SQL Database to prevent the possibility of data leakage or exposure of one customer’s data to another customer upon deletion of a Microsoft Azure virtual machine instance, virtual machine drives, Drives, Storage data, SQL Database data, or a SQL Database instance itself. The mechanisms vary in detail but are all conceptually similar: no user is ever allowed to read a place on the disk that he/she has not previously written.
For a fuller treatment of Data Protection in Microsoft Azure, please see the latest “Protecting Data in Microsoft Azure” whitepaper here.
In practice, disks are allocated sparsely. This means that when a virtual disk is created, disk space is not allocated for its entire capacity. Instead, a table is created that maps addresses on the virtual disk to areas on the physical disk and that table is initially empty. The first time a customer writes data on the virtual disk, space on the physical disk is allocated and a pointer to it is placed in the table. We can see this conceptually in the progression of diagrams below:
Figure 1: Data Blocks Allocated to Users
In Figure 1 above, two users each have two data blocks that have been allocated to them on disk based on their respective write requests.
Figure 2: User Frees Up Data Block
In Figure 2 above, one user “deletes” data that frees up a data block. The data block is marked is free, but is otherwise unaffected.
Figure 3: User Allocated Recently Freed Data Block
In Figure 3 above, upon a write request, a new user is allocated a recently freed data block as well as a data block that has not been previously allocated. The previously freed data block is still unaffected. Essentially, the process is that when a user makes a write request to disk it must be determined whether there is already space on an existing data block allocated to that user that can store the new data. If so, then the new data overwrites the data in the existing block. If not, then a new data block is allocated and data is written to the new block. The logic can be seen in the figure below.
Figure 4: User Requests to Write Data to Disk
Now there is the question regarding the possibility of one customer reading the deleted data of another customer, or an Azure administrator reading a customer’s deleted data. If anyone tries to read a region on virtual disk that they have not yet written to, physical space will not have been allocated for that region and thus only zeroes would be returned. So we can see this logic in the figure below, and the result. Only an Azure administrator could read blocks marked as free, and there are no utilities that would help the administrator figure out who the previous owner of the block was.
Figure 5: User Makes Read Request
Conceptually, this applies regardless of the software that keeps track of reads and writes. In the case of SQL Database, it is the SQL Database service that does this enforcement. In the case of Storage, it is the Storage service. In the case of non-durable drives of a VM, it is the VHD handling code of the host OS. Since customer software only addresses virtual disks (the mapping from virtual to physical address takes place outside of the customer VM), there is no way to express a request to read from or write to a physical address that is allocated to a different customer or a physical address that is free.
Note: in some cases, the logic for writes (see Figure 4) is modified in that if a block is written for a second time that data is not overwritten on disk. Instead, a new block is allocated and the data is written there. The old block is then marked as free. This approach is often referred to as a log-based file system. It may sound inefficient, but it allows most data writes to be to consecutive locations on the physical disk, and this minimizes seek times and yields better performance. This detail is transparent to the customer, but it is relevant because it means that even if a customer were to explicitly overwrite every block on the virtual disk with zeros before freeing it, it would not assure that the customer’s data was not still present on the physical disk.
Data destruction techniques vary depending on the type of data object being destroyed, whether it be whole subscriptions themselves, storage, virtual machines, or databases. In a multi-tenant environment such as Microsoft Azure, careful attention is taken to ensure that one customer’s data is not allowed to either “leak” into another customer’s data, or when a customer deletes data, no other customer (including, in most cases, the customer who once owned the data) can gain access to that deleted data.
If a subscription has expired or is terminated, all associated customer data (including storage accounts) is held for a period before it is actually deleted in order to recover from accidental subscription cancellation. That suspension period is 90 days. In no more than 180 days after expiration or termination, Microsoft will disable the account and delete customer data. If a storage account is terminated within an existing subscription (or when a subscription deletion has reached its timeout), the storage account is not actually deleted for another two weeks (again, to prevent mistakes). But once a storage account is finally deleted, or when blob or table data is deleted outside the context of a storage account deletion, the sectors on disk become immediately available for reuse. How long before they are actually overwritten varies, depending on the fullness and activity rate on the server, but is rarely more than two days. This is very different from the experience with a file deleted from a file system such as NTFS, where the space could be reallocated immediately, but in practice may linger for months. A property of log-structured file systems is that all unallocated sectors of the disk are overwritten routinely in the course of copying and compressing logs.
NOTE: If you want to make storage data unrecoverable faster, you should delete tables and blobs individually before deleting the storage account or subscription.
As mentioned above, Storage is built on a log-structured file system, where all writes are written sequentially to disk. This can be more efficient than a conventional file system because it minimizes the number of disk “seeks”, though that benefit is paid for somewhat by the need for updating the pointers to objects every time they are written. But the new versions of the pointers are written sequentially to disk as well. Disk reads do remain random access, but the read load can be reduced through judicious use of caching. A side effect of this design is that if there is a secret on disk, you can’t ensure it is gone by overwriting with other data. The original data will remain on the disk and the new value will be written sequentially. Pointers will be updated such that there is no way to find the deleted value anymore.
Once the disk is full, however, the system has to write new logs onto disk space that has been freed up by the deletion of old data. Instead of allocating log files directly from disk sectors, log files are created in a file system running NTFS. A background thread running on Storage nodes frees up space by going through the oldest log file, copying blocks that are still referenced from that oldest log file to the current log file (and updating all pointers as it goes). It then deletes the oldest log file. So there are two categories of free space on the disk: space that NTFS knows is free, where it allocates new log files from this pool; and space within those log files that Azure Storage knows is free since there are no current pointers to it.
Now when a user asks “Is my deleted data still recoverable from the disk?” it could mean one of two different things:
This latter category has no fixed expiration date, and is more akin to a “half-life”. After X number of days, there is a 50% chance it is still on the disk; after 2X number of days, there is a 25% chance it is still on disk, and so on. The probability never goes to zero, and there is no guarantee when particular data is gone, but Microsoft assures customers that their data will indeed be protected from unauthorized access.
Virtual machines are stored in Storage as blobs, so the deletion rules apply as explained above. When a VM is deleted, the space on disk that held the contents of its local virtual disk is marked as free, but is not zeroed. The space will eventually be used to hold data for some other object, but there is no upper bound on the amount of time the obsolete contents may stay there. The virtualization mechanism, however, is designed to ensure that those spots on the disk cannot be read by another customer (or even the same customer) until data is written again, thus ensuring there is no threat of data leakage. When a new virtual disk is created for a VM, it will appear to the VM to be zeroed, but that illusion is created by explicitly zeroing the buffers when a portion of the virtual disk is read before it is written. If a VM instance is reinitialized in place, it’s the same as if it had been moved to new hardware.
There are two kinds of virtual drives that might be accessible to a VM instance in Microsoft Azure. The C:, D:, and E: drives that exist for Web and Worker roles are backed by disks that are local to the compute node. The data on them is not stored redundantly and must be considered ephemeral. In the event of a hardware failure, the VM instance is moved to a different node and the virtual disk contents are reset to their initial values. If a VM instance is reinitialized in place, the C:, D:, and E: drives revert to their initial states, the same as if it had been moved to new hardware.
Drives (aka “X-Drives”) are also implemented as blobs in Storage. X-Drives are persistent and are not reset unless the customer takes some explicit action to replace them. This data is stored redundantly and survives hardware failures. Deleting a VM instance does not cause data to be deleted in an associated X-Drive. An X-Drive is deleted by deleting the blob itself (or by deleting the storage account containing the blob).
With SQL Database, deleted data is marked for deletion, but it is not zeroed. If an entire database is deleted, it is the equivalent of deleting its entire contents. The SQL Database implementation is designed to ensure used data is never leaked by disallowing all access to the underlying storage except via the SQL Database API. That API allows users to read, write, and delete data, but does not have any way to express the idea of reading data that the user has not previously written.
In the common case, customers want assurance that there is no unauthorized access to their data. In some cases, however, they would like assurance that there is not even any authorized access to deleted data. While there is no way through interfaces exposed to customers to retrieve data once it has been deleted or changed, that data may remain on disks for an extended period and it would theoretically be possible to recover it with internal forensic tools (though the likelihood of the deleted data being present would decline over time). Ultimately, any physical disk is either completely erased or destroyed after being removed from the production environment.
We are contemplating possible future features that would allow deleted data to be recovered (and changed data restored) for a limited period of time without customers having to make explicit backups. Using these tools, a customer could not be assured that the data would be for practical purposes inaccessible by authorized parties immediately after it is deleted. Any such tools will be designed to make deleted data retrievable only for a limited period of time less than 30 days unless the customer opted into some longer backup period. As of this writing, there are unexposed tools that may permit recovery of data deleted from SQL Database databases for 14-21 days. There are no such tools for Storage or Compute ephemeral disks.