Isolation of Virtual Machines
By Brandon Baker, Microsoft
Hi Brandon Baker here, and as the lead security engineer for Hyper-V I’ve been heads-down in this technology for many years. I’d like to take a moment and share some of my thoughts and experiences on a topic near and dear to my heart: isolation of virtual machines.
I regularly hear questions along the lines of “how do I know VMs will stay isolated?” or “just how much attack surface is there underneath a VM, and do I need to worry about it?” These questions demonstrate that people are thinking seriously about virtualization security. I touched on these points in my Black Hat 2007 presentation, and I’ll discuss them more here.
When we talk about virtual machine security we can mean many things - securing the images of virtual machines on a host, securing access to the administration of virtual machines, securing software inside virtual machines, ensuring patches for software inside a VM remain up-to-date, responding to compromises of software inside virtual machines. These items aren’t unique to virtualization and should be given the same kind of consideration you give to managing the security of physical machines.
Another thing we can mean by virtual machine security, something that is unique to virtualization, is the isolation of virtual machines from each other. By this I mean “what happens if the operating system inside a VM gets compromised?” And more importantly “are the other operating systems and data in other VMs on the same physical system safe?” Ideally the answer to these questions should be the same as if we were talking about two separate physical machines. But it’s not that simple when there’s virtualization underneath. Ensuring that a compromised VM stays isolated requires a great deal of rigor and correctness in the virtual machine monitor (a hypervisor in the case of Hyper-V) and all of the software in the host that interacts with a virtual machine. It’s through the use of sound security practices that we can reduce the risk of compromise of these components and provide greater assurance that a VM stays isolated.
The reason this matters so much is that a compromised physical machine almost always has only one way to get at other machines - through the network. Ideally we want virtual machines to have this same security model, but if you have lots of channels out of a VM and a wide attack surface in your virtualization software then you present new ways for malicious software to jump the gap. It would be like having to worry about software from one physical machine finding its way across the carpet into all of the other machines in the same room. This is not something that a system administrator or security professional wants to worry about.
Instead, security professionals talk about one common theme: reducing risk. An established way of doing this is through reducing your attack surface. The logic here is clear - if you present a smaller surface to an adversary you stand a better chance of defending yourself. We’ve taken that to heart in Hyper-V by separating our hypervisor from the rest of the virtualization software and device drivers that run in the dedicated service, or “root,” partition, limiting the number of channels and touch points into and out of a VM, creating per VM worker processes for all of our emulated devices that live way up in unprivileged processes in the root partition far away from the hypervisor, and providing tools to minimize the environment in the root partition with Server Core. We’ve also made it easy and advisable to not run any additional software in the root partition. I was quoted as saying “The last thing I ever want to hear as a security engineer is 'it's a feature, not a vulnerability’,” and I firmly believe this is the right attitude when thinking about virtualization security.
In the development of Hyper-V, we spent a lot of time validating all of the points through which a VM can send data out to the host. I’ve personally enumerated every single one of these throughout our threat modeling. I have a diagram outside my office showing all of the channels, or data flows, from a VM to the hypervisor. This includes all of the intercepts, hypercalls, instruction completion points, synthetic interrupts, memory reads and writes, and page table accesses we do. With each data flow represented as a line, it sort of looks like a giant squid. Someone even drew a toothy grin on the circle that represents the hypervisor. In each of these data flows we have to be concerned about a flaw in parsing VM data that could lead to a compromise of the system. To be crystal clear - a compromise of the hypervisor means a complete compromise of the whole machine. All VMs owned. Game over. So we really, really care a great deal about all of these channels. We want as few channels as reasonably possible, and we want the code behind each of these channels to be as simple as possible. I firmly believe that complexity is the enemy of security, and virtualization is one place that already has enough complexity. Fortunately, our hypervisor is small, around 600kB, and will only get smaller as time goes by and more of what it does gets done by hardware. So the assurance we can give our customers is that the software on the end of those channels is owned by us, has been vetted by us and by other security professionals, and we own fixing any flaws found as quickly as possible.
Our goal is to provide a locked-down environment with as few moving parts as possible, because we feel removing as much risk in this space as possible is the smart thing to do to achieve security and increase assurance in the platform. At the same time, we care about how ISVs, partners, and customers use our virtualization platform, and look for opportunities to improve the hypervisor platform as well as ensure correct and secure usage of the capabilities of the hypervisor. We want to help people think more about using virtualization and less about the security of their virtualization platform.
Brandon Baker, Senior Development Lead, Microsoft Corporation