In my interview with Doug Hauger, General Manager of Windows Azure – he suggested there’d be an “answer” to the 12 cents/hour cost of the small compute instance. At the PDC, the “extra small instance” was announced. This is a VM that only costs 5 cents/hour and has the following features:
This means it’s obviously suited to low CPU workloads and also low memory workloads. The suggestion is that it’s particularly suited for the following:
So how does it differ from the other instance sizes? Well it’s all in the actual virtual-physical architecture. For example XS (extra small) instances will share resources with other XS instances on the same physical host. So you don’t get a dedicated processor and memory is not dedicated to your XS instance. That’s why the first bullet list above says “equivalent to”.
In contrast, the Small/Medium/Large and Extra Large (S/M/L/XL) instances don’t share such resources with other VMs hosted on the same physical server. These instances for example have multi-processor support and even support for NUMA (Non-Uniform Memory Architecture). This is where each processor is allocated a section of memory for its own use. It’s a vaguely favourite topic of mine, I used to work for Digital back in the early 90s when they did a lot of the work on symmetric multi-processing systems. The basic problems with shared memory are that when 2 CPUs want to access the same address at the same time, one has to wait. With NUMA, the problem is reduced because each CPU has its own memory. The CPU architectures support this in hardware.
S/M/L/XL instances also have signifcantly larger memory allocation and also access to bandwidth on the network and the local storage is greater too.
The aim with these instances is push work to the hardware wherever possible. An example of this is the use of, at minimum, AMD-V Rapid Virtualization Indexing (RVI) and Intel VT Extended Page Tables (NPT) technology in the processors chosen to run in the data centres. These technologies support Second Level Address Translation (SLAT). In a standard OS which uses virtual memory, there are tables used to translate the virtual addresses the OS uses in to the physical addresses the hardware itself actually uses.
Modern CPU chips cache the address translations in to an on-chip cache called the Translation Lookaside Buffer (TLB). You know the way things work in most code – programs work their way through a series of loops. Let’s say you are going to go round a loop 1000 times. If within that loops you access a number of addresses, let’s say 20 addresses – then you’ll effectively need to do 20,000 address translations. The way the TLB works is that any address that is translated, is automatically copied in to the TLB. Then when the CPU wants to go to a certain virtual address it first looks in the super-high speed TLB – you can see how this will be be a winner in our loop scenario. The firs time round, none of the addresses are in the TLB, but the second and subsequent rounds – they are all there.
So there’s a hardware solution to the problems of address translation performance and it’s built in to all modern CPU chips – the TLB.
Well, the problem is doubled up when you have a virtual machine environment. That’s because there are really 3 address spaces:
System physical address (SPA)
Guest physical address (GPA)
Guest virtual address (GVA)
A guest OS (a VM instance In Windows Azure) has to translate its virtual address in to what it thinks is a physical address, and then the system has to in turn translate that in to the true physical address. The AMD-V Rapid Virtualization Indexing (RVI) and Intel VT Extended Page Tables (NPT) technology in the processors support an extra cache for address translation called the SLAT.
You can see in the diagram above: the parent OS is actually a special type of guest OS. Take Instance 1 in the top right corner. It’s virtual address space has to be mapped first to what the instance “thinks” is the physical address. But because it’s all running on one host, that “physical memory” has to be mapped in to the real physical memory in the host. Again, if the address translation can be cached in the chip’s on-board cache, then you can get an increase in performance. That’s what SLAT is. In Windows Azure, the S/M/L/XL instance sizes require processor chips which support SLAT.
So you can see with all but the XS instance sizes, where there is a lot of sharing – as much as possible is pushed to the hardware with the S/M/L/XL instances. WIth the Windows Azure Hypervisor, there is also large page support – 2MB and also 1GB pages!
Also – network performance is different for the different sizes of instance:
You can se how small the peak network performance is for the XS instance.
So – remember what I said at the start - The XS instance size is particularly suited to these workloads:
It doesn’t benefit from the same resources that the S/M/L/XL sizes do in more than just what a cursory glance at the spec would suggest so think carefully before you use it in a production system. It definitely has its place, but it also definitely doesn’t have its place. You need to be clear what you are using it for if you decide to go with that option.