Understanding High-End Video Performance Issues with Hyper-V

Understanding High-End Video Performance Issues with Hyper-V

Rate This
  • Comments 30

A while ago I wrote a relatively short blog post high-lighting the fact that there are performance issues with Hyper-V when used with a high-end graphics adapter.  Since then I have been inundated with people asking questions and trying to get their heads around this issue.  Today I would like to take a chance to drill in on this:

What is the cause of the problem?

Okay – let’s grab the pertinent text from the original KB article:

This issue occurs when a device driver or other kernel mode component makes frequent memory allocations by using the PAGE_WRITECOMBINE protection flag set while the hypervisor is running. When the kernel memory manager allocates memory by using the WRITECOMBINE attribute, the kernel memory manager must flush the Translation Lookaside Buffer (TLB) and the cache for the specific page. However, when the Hyper-V role is enabled, the TLB is virtualized by the hypervisor. Therefore, every TLB flush sends an intercept into the hypervisor. This intercept instructs the hypervisor to flush the virtual TLB. This is an expensive operation that introduces a fixed overhead cost to virtualization. Usually, this is an infrequent event in supported virtualization scenarios. However, some video graphics drivers may cause this operation to occur very frequently during certain operations. This significantly magnifies the overhead in the hypervisor.

Usually when I talk to people about this – their eyes start to gloss over – so let’s dig in a little here.  With the help of Wikipedia we can get some definitions here:

  • Write-combining (http://en.wikipedia.org/wiki/Write-combining):

    Write combining (WC) is a computer bus technique for allowing data to be combined and temporarily stored in a buffer -- the write combine buffer (WCB) -- to be released together later in burst mode instead of writing (immediately) as single bits or small chunks.

    Write combining cannot be used for general memory access (data or code regions) due to the 'weak ordering'. Write-combining does not guarantee that the combination of writes and reads is done in the correct order. For example, a Write/Read/Write combination to a specific address would lead to the write combining order of Read/Write/Write which can lead to obtaining wrong values with the first read (which potentially relies on the write before).

    In order to avoid the problem of read/write order described above, the write buffer can be treated as a fully-associative cache and added into the memory hierarchy of the device in which it is implemented. Adding complexity slows down the memory hierarchy so this technique is often only used for memory which does not need 'strong ordering' (always correct) like the frame buffers of video cards.

    In summary, write-combining is a method of accessing memory that is typically only used by video cards.

  • Translation Lookaside Buffer (TLB) (http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer)

    A Translation lookaside buffer (TLB) is a CPU cache that memory management hardware uses to improve virtual address translation speed. It was the first cache introduced in processors. All current desktop and server processors (such as x86) use a TLB. A TLB has a fixed number of slots that contain page table entries, which map virtual addresses to physical addresses. It is typically a content-addressable memory (CAM), in which the search key is the virtual address and the search result is a physical address. If the requested address is present in the TLB, the CAM search yields a match quickly, after which the physical address can be used to access memory. This is called a TLB hit. If the requested address is not in the TLB, the translation proceeds by looking up the page table in a process called a page walk. The page walk is a high latency process, as it involves reading the contents of multiple memory locations and using them to compute the physical address. Furthermore, the page walk takes significantly longer if the translation tables are swapped out into secondary storage, which a few systems allow. After the physical address is determined, the virtual address to physical address mapping and the protection bits are entered in the TLB.

    So the TLB is a CPU cache that helps with translation between virtual address spaces and physical address.  Note that these virtual address spaces have nothing to do with virtual machines – but are used to allow multiple applications on an operating system to be isolated from each other.

Summarizing all of this – video card drivers tend to use memory access methods that cause Hyper-V to need to clear out the CPU cache for memory page table mapping a lot.  This is an expensive thing to do in Hyper-V at the best of times.  In fact – the above TLB article on Wikipedia even has a section on the problems of virtualization and the TLB.

Now that we have the ground rules in place – let’s head on to some of the other questions.

How could you possibly ship Hyper-V with this issue?  Did you not test this product?

To answer the second question first – I actually was the first person (in the world) to hit this issue.  Early on in development I tried to use Hyper-V as my desktop OS on my home system with a GeForce 8800 video card.  Everything seemed to work okay (though some things were oddly sluggish) until I tried to pay Age of Empires III.  I had never played this game before, and the first time I tried to play it was on top of Hyper-V.  In short, it sucked.  Unfortunately I spent most of the weekend trying to tweak my rig and looking for patches to Age of Empires III before I thought to try disabling Hyper-V.

As soon as I realized what was happening I filed a bug and the issue was investigated.

When the issue was determined to be a specific result of the combination of the Hyper-V hypervisor and the Nvidia driver – we decided to leave things as they were for a couple of reasons:

  • Windows Server does not include any video drivers other than the SVGA driver by default
  • Windows Server will not install a high-end video driver automatically at any stage – you need to manually install the Windows 7 drivers

Also, Hyper-V was being developed solely for server virtualization and:

  • We have always recommended that nothing be run in the management operating system, other than basic management tools
  • No server workload that we tested generated anywhere near the rate of TLB flushing that these video drivers cause

Finally, this is a really hard issue to address.  In fact, there are no hypervisor based virtualization platforms that addresses this issue today – and while there are several under development I suspect that they will either have specific hardware requirements (I will get to this later) or will have simplifications / limitations to help them mitigate this issue (like only having one virtual machine).

Why does this affect Hyper-V and not Virtual PC?

Here we are seeing the difference between a hypervisor and a host VMM type solution.  With a hypervisor base platform (like Hyper-V) everything runs on top of the hypervisor – even the management operating system.  Where as with a hosted VMM platform (like Virtual PC) the host operating system still has direct access to the hardware.  To explain this better – here is a diagram:

Drawing1

Hopefully you can see the difference here.  It should also be noted that all desktop virtualization products available today use an architecture similar to that of Virtual PC.

How do I know if this is affecting my computer?

To check if this is affecting your system – what you need to do is open Performance Monitor (you can do this by running “perfmon” from the start menu).  Select the Performance Monitor node and click on the plus symbol to add a new counter.  Then find the Hyper-V Hypervisor Root Partition entry, expand it, select Virtual TLB Flush Entries/sec and add the Root counter.  This will allow you to keep an eye on the rate of TLB flushing in the management operating system:

UntitledUntitled2

So what do you look for now?  On my system – the only time I see a significant rate of TLB flushing (>10) is when I start a virtual machine.  A system that has this problem will either generate a continuous rate of TLB flushing above 100 or will generate spikes in the thousands.

What can I do to stop this / work around it?

There are a couple of options here:

  1. Use the default video driver (SVGA).

    Yes, I know it is not sexy or fun – but if you are planning to just use Hyper-V as a server virtualization platform this is your easiest and simplest option.  It is the way we intended Hyper-V to be used, and it will always give the best performance.

  2. Tone down the use of 3D graphics.

    Some video cards (like the Nvidia Quadro FX 1700M) seem to work fine as long as Aero is not enabled and no 3D applications are running.  If I enable Aero I start to see a fairly frequent rate of spikes in my TLB flush count (which causes annoying lurches in the window animation).  Running a 3D game (like Halo 2) is just terrible.

    This means that for those of you who do not want the high-end driver for 3D graphics, but instead need it for multi-monitor support or for the ability to connect a projector to your laptop (like me) this may work.

  3. Choose your video card carefully.

    As a general rule of thumb – the less capable the video card, the less likely this is to be an issue.  My previous laptop had an integrated graphics controller – which was terrible for gaming – but worked great for Hyper-V.  When I wanted to get my new laptop and found that there was no Intel option – I tracked down a coworker with a similar graphics card in their laptop and tried out Hyper-V on it before going ahead and buying it.

  4. Get a system with Second Level Address Translation (SLAT).

    SLAT is a technology that goes by different names depending on whether you get Intel (where it is called “Extended Page Tables” (EPT)) or AMD (where it is called “Nested Page Tables” (NPT) or “Rapid Virtualization Indexing” (RVI)).  These technologies are an extension to the traditional TLB that allow us to use the hardware to handle multiple TLBs – one for each virtual machine.  We added support for this hardware in Windows Server 2008 R2.  If you run Windows Server 2008 R2 on a system with SLAT capabilities – you will not have any problems running 3D graphics at all.

    Intel started shipping this technology in the Nehalem (or core i7) processor line.  AMD has been shipping this for a while now – ever since generation 3 of the AMD Quad-core family.  Unfortunately neither have shipped this technology in the laptop processors yet – though Intel has indicated that they are planning to soon.

Hopefully this has answered all of your questions satisfactorily.  If you have any further questions – please feel free to ask away.  I would also encourage you that if you have a video card that appears to work well with Hyper-V and 3D graphics – post the details in the comments so that others can benefit from your good fortune!

Cheers,
Ben

Leave a Comment
  • Please add 6 and 2 and type the answer here:
  • Post
  • It's a pitty that I see you discard high end graphics server for Hyper-V. Meanwhile VirtualBox and VMWare can support this kind of feature and can be used on high end graphics servers (such like those Nvidia Tesla servers or Supermicro solutions) for serving 3D power for rendering tasks (without consuming CPU time), physics simulations, etc.

  • WE aren't discarding.   We are trying to fined a solution.   Problem is that VMWare workstation is a user process that only supports dual cores.   A Hyper-v service would be much more stable, responsive, and convenient to demo and server from.

  • I'm just beside my self I can't get a Core i7 laptop running hyper V on a laptop w/ the native 3D graphics.

  • BizTalker: This article is not about crash issues running Hyper-V with high-end graphics, it is about *performance issues*. Can you post a crash dump so MS and NVIDIA can debug the issues.

  • Note that the Core i7 processors in question has SLAT, so if the crash issues are solved, there should be no performance issues.

  • "Unfortunately neither have shipped this technology in the laptop processors yet"

    Actually the high-end mobile quad-core Core i7s with EPI was already shipping by the date of the blog post.

  • Hi Ben,

    I think these problems are mostly gone for me on the Windows Server 2008 R2 SP1 Beta but there are some quirks that confuse me, for instance the CTRL+ALT+DEL redraw and full screen YouTube are still slow but Windows key + Arrow is very fast now, and generally everything seems improved. Could you shed any light on whether these changes are expected and why it seems to be better now? By the way, I have an SMP processor with an NVIDIA GeForce 8400GS graphics card. It's a Dell XPS M1330.

    Cheers,

    Tristan

  • Hi Ben.  Thanks for the article.  I know I'm coming to the party a little late but ..

    I have a new Dell Studio 14 with an i5-520M cpu.  The system came with Intel's HD graphics.  If i enable Hyper-V before I install the HD graphics driver than life is good.  However once I install the driver I get the dreaded stop 119 code.

    I could live with the SVGA support on the host machine but for the fact that I can't get my laptop to project onto an external monitor until I install the dreaded HD driver.  Sadly I can't live without that!

    Any suggestions?

    Thanks again.

  • For everyone who has been seeing problems with core i7 laptops - please try out the Windows Server 2008 R2 SP1 beta - as this should address the problem there.

    Cheers,

    Ben

  • Uninstalling and or disabling Hyper-v does not get rid of the slow RDP, I dont need Hyper-v anymore, any ideas how to get rid of this problem without svga and or reinstalling?

  • Jasper -

    If Hyper-V is uninstalled, then something else must be causing the problems you are seeing.

    Cheers,

    Ben

  • In my Toshiba A505-S6033 (with an Nvidia GeForce 310M and i7 processor ) and with Hyper V installed, everything seems to be working fine (I did had to install KB975530 because I had some BSOD problems at first, but after that, performance is as good as when the laptop had Windows 7)

    Is there an explanation for this?  Why is my laptop not affected by KB961661 ) ? Is it because of this "SLAT" thing?

    Oh, and BTW, I do use Aero on Windows 2008 R2 , and works it perfectly (and it feels very fast).

  • I got a Gateway DX4840-02m that came with an Intel video card, but needed more power so I bought a MSI Nvidia GT 240, that worked perfectly with Windows 7 x64, but after I installed Windows 2008 R2, windows would not start: I would get a BSOD saying STOP 0x00000116, and that the problem was at nvlddmkm.sys.

    Tried everything, older drivers, newest driver, full OS reinstall, nothing worked... and then I decided to try the SP1 RC... and... it worked! I would love to know which feature or hotfix in the SP1 fixed the BSOD. Maybe what fixed it is RemoteFX ?

  • I have a Dell T110 server running hyper-V that I need to set up dual monitors on.  What PCI-E video card do you recommend?

  • sir i have DELL N5010 Laptop i install win server 2008 R2 in my laptop

    n hyper-v to

    so please sir can any one how know drivers i get for win server 2008 R2

    please help me frnds

Page 2 of 2 (30 items) 12