Yeah, that title sounds a little crazy but let me explain. The phenomena I am referring to is that once you have identified the solution to a seemingly complex problem the problem is revealed to be simple. In my case the problem was really poor performance on the console of a very powerful box. I am currently using a 8 processor 16GB machine running Windows Server 2008 x64 as my primary “workstation”.

The main problem was that it had pretty severe video performance issues. It is also worth noting that this is one of several of these machines we have scattered around and nobody else seemed to be having these problems.

Here is a quick summary of the weird behaviors I was seeing:

  1. One of the three monitors attached to this machine would go black and then repaint every time I mapped or unmapped a network drive. No kidding.
  2. Unlocking the workstation, even immediately after locking it, resulted in a 50-120 second delay while it repainted my screen and generally sat there looking lame.
  3. My VX-600 webcam which I use to chat with my 3 year old son when I am at going be at work late at night was totally unusable. Any time I started the camera the whole machine slowed to a complete crawl.
  4. Scrolling anything in windows in the applications I use like IE, Word, Excel, WinDbg, and Visual Studio was jerky and very slow.
  5. Dragging a window from one monitor to the others was jerky and sometimes would just stall the machine for 10-20 seconds.
  6. Enabling the Aero desktop made the whole thing unusable.

I searched everywhere for problems with Windows Server 2008 and performance problems on the physical console. Why the physical console? Because I can use Remote Desktop to connect to this machine it runs just fine.

Based on all of these symptoms I figured it was the fault of the stock video card I had in the machine which uses a cheap NVidia Quadro GPU so I went out and got a new video card making for a matched set of GEForce 8600 SE cards in this machine. This configuration now matches my neighbors machine which doesn’t have these problems. In the end this change helped somewhat but performance was still pretty poor. I tried updating my video drivers, no help there either. Time to try some other stuff.

The next thing I did was switch my processor scheduling setting from “Background services” to “Programs”. This seemed to produce a slight improvement but not enough to make the machine usable from the console.

Next I shut down my Hyper-V VMs. I considered that perhaps they were chewing up resources that my machine desperately needed, despite the fact that I had 7GB-10GB of RAM free and CPU utilization was averaging 5%. That seemed to have no effect at all but since they just made it take longer each time I rebooted and I wasn’t using them right now I left them turned off. 

Since it wasn’t any of the obvious stuff, next I went to the trusty ole Reliability and Performance Monitor. I created a new counter set to track the Memory, Processor, Processes and Threads objects. Surely if I have a performance problem on the physical console it will show up in one of those.

I started the log, locked my machine, waited 2 minutes and then unlocked it. It took about 55 seconds to repaint all the screens and get control of the apps. When I cracked open the log I could see that there were a few of my helper apps like Communicator, SnagIt and UltraMon taking a pretty big chunk of CPU for most of that unlock time.

I used MSConfig.exe to disable all of the autorun things I had loaded and rebooted. Performance was better. At this point I assumed that the performance I was seeing was as good as it was going to get so I set about determining which of autorun apps was killing what little performance I could get out of this machine.

I proceeded to enable the apps one at a time, reboot, and test the performance of locking and unlocking the machine. In the end I removed the webcam software as it seemed to be the only thing left when everything else was loaded and it still ran better than it had before.

After using the machine for an hour or so I locked it and came back ten minutes later. When I unlocked the machine it took more than a minute for me to get control of the desktop and even then it was pretty sluggish. Back to square one.

During all of this I have been talking with my peers to see what other ideas they might have. One of them suggested that maybe I had a desktop heap problem. Since I have run out of desktop heap in the past and I know what that looks like I didn’t really think that was it but I am desperate now so I doubled the amount of desktop heap for the interactive session to 40MB. No help there either, same behavior.

At this point I had pretty much resigned myself to using this machine in it’s current state until I have time to pave it and reinstall everything. A few minutes later one of my peers who has the exact same configuration was showing me how his machine performed. It was awesome! Unlocking his machine was pretty much instantaneous! What in the @#$%&*%$# is wrong with mine?

It was then that he saved me by commenting that at one point he had installed Hyper-V and had some performance problems, not as bad as mine, but he had some, and he fixed them by uninstalling Hyper-V…

Now I’m thinking “Hyper-V can’t be my problem, I don’t even have any VMs running”…and then it dawned on me. Hyper-V uses a HyperVisor to wedge in between the OS and the hardware. What could possibly go wrong there?

Needless to say, 5 minutes later, after I had uninstalled Hyper-V and rebooted, my machine is as blistering fast as my co-worker’s!

Now that I knew what the solution was was I did some quick searches for problems with Hyper-V video performance. I immediately found KB 961661. For reference here is the cause section of the KB article:

This issue occurs when a device driver or other kernel mode component makes frequent memory allocations by using the PAGE_WRITECOMBINE protection flag set while the hypervisor is running. When the kernel memory manager allocates memory by using the WRITECOMBINE attribute, the kernel memory manager must flush the Translation Lookaside Buffer (TLB) and the cache for the specific page. However, when the Hyper-V role is enabled, the TLB is virtualized by the hypervisor. Therefore, every TLB flush sends an intercept into the hypervisor. This intercept instructs the hypervisor to flush the virtual TLB. This is an expensive operation that introduces a fixed overhead cost to virtualization. Usually, this is an infrequent event in supported virtualization scenarios. However, some video graphics drivers may cause this operation to occur very frequently during certain operations. This significantly magnifies the overhead in the hypervisor.

This is why I say, when you know the solution the problem is a lot easier to find.