Now that Hyper-V has been in the market for over 9 months a common question that has come my way is “what should I monitor?” This question has been asked for a couple of reasons such as; How do I know if my machine is overloaded? How can I figure out what resources were used so I know what to bill? These questions usually amount to measuring the networking, storage, and CPU use. Another common question is where do you monitor Hyper-V? I’ll cover these questions and more.

Q: Where do I monitor Hyper-V?

Let’s start with the question of where to monitor. In order to answer this question you have understand a little of the Hyper-V architecture. Hyper-V has three main components – the virtstack, devices, and hypervisor. Windows Server 2008 is what boots the system and launches the virtstack and hypervisor. The virtstack responsible for handling emulated devices, managing VM’s, servicing I/O, and more. The hypervisor is responsible for scheduling Virtual Processors, managing interrupts, servicing timers, and controlling other chip level functions. It does not understand devices or I/O (ie there are no hypervisor drivers). The devices are part of the root and are also installed in guests as part of the Integration Services.

image

Since the root has a full view of the system and controls the VM’s it is also responsible for providing monitoring information via WMI and Performance Counters. You can see a full list of performance counters that are provided from the root @ http://blogs.msdn.com/tvoellm/archive/tags/Hyper-V+Performance+Counters/default.aspx . See the WMI reference on MSDN @ http://msdn.microsoft.com/en-us/library/cc136992(VS.85).aspx . A common question is can you monitor a Virtual Machine from within the running guest? From within the guest there is only access to the performance counters provided by the OS that is running in the guest. All those should work as expected however keep in mind that utilization counters (like % Processor) are relative to the amount of virtual processor used. Not the physical processor. So it might appear the utilization counters are over reporting the actual physical resource usage. Rate counters (like network packets / sec) don’t have this problem. For more on guest counter skew see http://blogs.msdn.com/tvoellm/archive/2008/03/20/hyper-v-clocks-lie.aspx

We could have provided some VM specific data inside the guest and might in the future. However our current thinking is not to do this. If you are running a Windows OS or have a tool that understands the WMI protocol you can use the remote management API’s to query the root (see msdn.microsoft.com for WMI documentation on how to do this). You can also do this from outside the physical server as well and tools like System Center Virtual Machine Manager.

Now that you know to monitor Hyper-V and all virtual machines from the root partition let’s look at what you should monitor.

Q: What should I monitor?

What you monitor on a regular basis really depends on what you are trying to do. I’ll cover the major resources and what to typically monitor. This will be a good starting point for you.

Here are the top level performance counters to monitor and I’ll go into more detail on each;

  • Overall health:
    • Hyper-V Virtual Machine Health Summary
    • Hyper-V Hypervisor
  • Processor:
    • Processor
    • Hyper-V Hypervisor Logical Processor
    • Hyper-V Hypervisor Root Virtual Processor
    • Hyper-V Hypervisor Virtual Processor
  • Memory:
    • Memory
    • Hyper-V Hypervisor Partition
    • Hyper-V Root Partition
    • Hyper-V VM Vid Partition
  • Networking:
    • Network Interface
    • Hyper-V Virtual Switch
    • Hyper-V Legacy Network Adapter
    • Hyper-V Virtual Network Adapter
  • Storage:
    • Physical Disk
    • Hyper-V Virtual Storage Device
    • Hyper-V Virtual IDE Controller

Now let’s go into each counter set and how to use them to monitor the system. As time allows I have been documenting each performance counter set on my blog.

You should take note of the bold italics because they represent the names of counters sets and counters.

Overall health:

There are really two counter sets I use to get an overall understanding of the system . The first is the “Hyper-V Virtual Machine Health Summary” which only has two counters; “Health Ok” andHealth Critical”. If anything is Critical it means some resource (most likely disk) has been exhausted or other unrecoverable error has occurred. If you server see “Health Critical” you should take action to figure out what has happened.

The second counter set is the “Hyper-V Hypervisor” counter set which is detailed here - http://blogs.msdn.com/tvoellm/archive/2008/05/09/hyper-v-performance-counters-part-two-of-many-hyper-v-hypervisor-counter-set.aspx . I like to use this counter set to understand how many Logical Processors the system recognizes (Logical Processor), the number of virtual machines running (“Partitions” – 1), and the total number of Virtual Processors (Virtual Processors). The logical processors (LP) are important because they are where all the work is done. They are a representation of the physical processor (core or CPU thread like HT and SMT). The virtual processors (VPs) tell you something about the guests and also if the system is overloaded. You should make sure the VP to LP ratio does not exceed eight to one because we don’t currently support beyond this limit. Hyper-V does not set a hard cap so you can exceed it. Just understand you are in largely untested configuration and might see guest failures beyond 8:1. Some other limits to be aware of is WS08 Hyper-V supports only 24 Logical Processors (LPs) and Windows Server 2008 R2 Hyper-V has a current stated limit of 32 LPs as of Windows Server 2008 R2 Beta. We will likely push this limit up.

The last counter in the “Hyper-V Hypervisor” counter set that I use is the “Total Pages”. This counter gives an indication of how much meta data memory the Hypervisor is using to manage the virtual machine. Unfortunately this counter does not capture all the overhead because another component called the Virtual Interface Driver (VID) also has overhead to manage partitions and in WS08 the “Hyper-V VID Partition” does not work . The good news is this counter set does work in Windows Server 2008 R2.

Processor:

Once you have an idea of the overall system capabilities and configuration though the “Hyper-V Hypervisor” counter set you will want to monitor the processors on the system. The most important counter set to monitor is the “Hyper-V Hypervisor Logical Processor”. This counter set allows you to determine how much of the physical processor are being used. The virtual processor counter sets only show a slice of the “Hyper-V Hypervisor Logical Processor”.

  • Hyper-V Hypervisor Logical Processor
  • Hyper-V Hypervisor Root Virtual Processor
  • Hyper-V Hypervisor Virtual Processor

The Hyper-V Hypervisor Logical Processor counter set is detailed here - http://blogs.msdn.com/tvoellm/archive/2008/05/09/hyper-v-performance-counters-part-three-of-many-hyper-v-logical-processors-counter-set.aspx. The most useful counters in this counter set are the following;

  • %Guest Run
  • %Hypervisor Run Time
  • %Idle Run Time
  • %Total Run Time

I generally only look at the _Totals. There is one logical processor that that carries more load than the rest and that is LP0. This LP is where all interrupts in the system are directed and if there is too much load you can see this LP hit 100% which likely means IO is a bottleneck in the system. There are some technologies in Windows Server 2008 R2 that help reduce the load for networking and those are – VMQ, Chimney and RSS. There is no RSS support in guest VM’s.

The “Hyper-V Hypervisor Root Virtual Processor” and “Hyper-V Hypervisor Virtual Processor” are just slices of the LP counter and can help you understand how much total CPU the root and guests are using on the system. These counters are detailed here - http://blogs.msdn.com/tvoellm/archive/2008/05/12/hyper-v-performance-counters-part-four-of-many-hyper-v-hypervisor-virtual-processor-and-hyper-v-hypervisor-root-virtual-processor-counter-set.aspx . There are real no limits one should expect for these counters however I generally expect to see the “% Hypervisor Time” be below 25%. Any higher and this could indicate you are not running with integration services installed. You should always make sure you have Integration Services installed for the best performance. See this link for me detail - http://blogs.msdn.com/tvoellm/archive/2008/01/02/hyper-v-integration-components-and-enlightenments.aspx .

You should also monitor the “Processor” counter set. This counter set is only for the root CPU and does suffer from skew as detailed here - http://blogs.msdn.com/tvoellm/archive/2008/03/20/hyper-v-clocks-lie.aspx. Even with the skew this counter set is useful because it gives you an idea of how busy the root is. Remember the root is involved in all IO. This means that when the root CPU’s are saturated your whole system is likely saturated. In general you want to see the root CPU lower than 10% utilization and over 50% might indicate an issue.

Memory:

A common question I get is – “how much memory is a VM using?” There is no simple answer because different layers account for memory. For example the “Hyper-v [Root] Partition” counters determine how much memory the Hypervisor is managing and using on behalf of a VM which includes the guest address space but not all the memory in the worker process and VID partition. The “Hyper-V VM Vid Partition” counters account for the guest address space and any additional memory the VID needs to manage the VM. Nothing accounts for the memory in the Worker Process that is paired with the Hypervisor and VID Partition except the root processor memory counters but I know of no way to figure out what the pairing is. The root also uses memory to service IO on behalf of a VM and there is no accounting of this memory other than in the “Memory” counters which is not VM specific.

The following are the counter sets one should monitor in general;

  • Hyper-V Hypervisor Partition
  • Hyper-V Hypervisor Root Partition
  • Hyper-V VM Vid Partition
  • Memory

The Hyper-V Hypervisor [ROOT] Partition counters are interesting because they indicate in the “1G GPA Pages” and “2M GPA Pages” counters whether or not a VM is using large pages which improves overall VM performance. Large pages are only used on systems that have vTLB hardware support. See more on vTLB support here – http://blogs.msdn.com/tvoellm/archive/2009/04/06/why-does-my-desktop-box-slowdown-when-i-install-hyper-v.aspx . The partition counter set also indicates in the “Deposited Pages” counter how much memory the hypervisor is using for managing the VM. To figure out most of the memory used for the VM in Megabytes you can use the following formula; almost_total_vm_memory = “1G GPA Pages” * 1024 + (“2M GPA Pages” * 2) + ((“4K GPA Pages” + “Deposited pages”) / 256). The last counter that is interesting in the partition counter set is the “Virtual Processors” counter which lets you know how many Virtual Processors a VM is configured to use.

The Hyper-V VM Vid Partition counters have two interesting counters. The “Physical Pages Allocated” is the total number of guest pages and VID pages needed to manage the VM. The “Remote Physical Pages” let you know on NUMA based systems if a VM is spanning multiple nodes. You really want to avoid this whenever possible. You can require a VM to start of a particular node or nothing by using the API at http://blogs.msdn.com/tvoellm/archive/2008/09/28/Looking-for-that-last-once-of-performance_3F00_-Then-try-affinitizing-your-VM-to-a-NUMA-node-.aspx . Another way is to stop and restart the VM and if possible Hyper-V will allocate all memory on a single NUMA node.

The Memory counter set allows you to monitor how much memory is being consumed in the root. The root is responsible for managing all memory in Hyper-V.  When a VM starts you will see the "Available Bytes" go down by at least the amount of memory given to the guest plus around another 16 - 64MB for guests meta data structures.

 My recommendation is to monitor the following counters;

  • Available Bytes - This will give you an idea of how much memory is remaining for guests.  There is a reserve of 256MBytes or 512MBytes that the root will always leave outside of guest memory.  The exact amount varies but Hyper-V release. So if you find a time when a VM wont start it may be there are too few available bytes to satisfy the reserve.
  • Pages / Sec - This is a measure of memory pressure since it tracks hard faults.  Those are page faults that require a disk access.  Usually the cause for the number to spike is when there are two few available bytes on the system and processes are competing with each other for physical RAM.

Networking:

The network counters are useful for monitoring the overall networking performance on the system. The most important thing to generally monitor is the total throughput counters to make sure the NICs are not getting saturated. Once the NICs are saturated your overall system performance will be capped because no more web requests, remote storage requests, queries, etc can be received than what is currently being handled. The first counter set “Network Interface” gives the overall performance of physical device where as the other counter sets listed below represent the activity of the virtual switches and network adapters in the VM’s.

  • Network Interface
  • Hyper-V Virtual Switch
  • Hyper-V Legacy Network Adapter
  • Hyper-V Virtual Network Adapter

For the Network Interface the following are the top level counters to monitor;

  • Bytes Total / Sec
  • Offloaded Connections
  • Packets / Sec
  • Packets Outbound Errors
  • Packets Receive Errors

However if I’m going to monitor networking I generally look at all the counters. Mostly you want to make sure the network is not saturating and that the error counts are low. If the error counts grow rapidly you might have too much load on the system or some problem in end- to-end connectivity (including hardware).

The “Hyper-V Virtual Switch” counters are good to monitor because depending on how you have configured your network some or all of the traffic might only exist on the virtual switch. For example guest to guest packets don’t have to leave the machine to be routed and in fact on an internal network switch there is no physical adapter connected. In a future blog post I’ll detail what each counter means. The most useful ones are;

  • Bytes/Sec
  • Packets/Sec

The “Hyper-V Virtual Network Adapter” and the “Hyper-V Legacy Network Adaptercounter sets allow you to see how much ingress and egress a VM is doing. This counter sets are named with the friendly name of the VM plus the name of the network adapter followed by two GUIDs. The GUIDs are the internal id of the VM and adapter which is important when querying via WMI.

There are two counter sets because there are two types of virtual network card you can assign to a VM. If you assign a Legacy Network Adapter then the counter set you should use is the “Hyper-V Legacy Network Adapter”. In general you should not use the network adapter type because it is not enlightened, creates a lot of CPU load in the root, and is generally slower than the Network Adapter. The challenge is you need the Legacy Network Adapter to get a VM working before installing Integration Services. Once your VM is working with Integration Services you should use the Network Adapter and the “Hyper-V Virtual Network Adapter” counter set. Keep in mind Windows Server 2008 and Windows Server 2008 R2 both have integration Services pre-installed.

The “Hyper-V Legacy Network Adapter” counters to monitor are;

  • Bytes Dropped
  • Bytes Sent / Sec
  • Bytes Received / Sec

In the “Hyper-V Virtual Network Adapter” you should monitor;

  • Bytes / Sec
  • Packets / Sec

Storage:

The storage counters are useful for monitoring the overall disk performance on the system as well as for each VM. The first counter “Physical Disk set will give overall storage performance on the system. The next two are strictly for VM’s.

  • Physical Disk
  • Hyper-V Virtual Storage Device
  • Hyper-V Virtual IDE Controller

Inside the “Physical Disk” counter set I tend to monitor only three things. The first is “Current Disk Queue Length” which gives one an idea of how busy the drives are. The “Current Disk Queue Length” should be around two per drive. If you have a RAID 10 volume with 4+4 (total of 8 drives). Then a queue length of 16 is reasonable. A queue length of 32 might indicate this disk is saturated and is the bottleneck in the system. The next counter I monitor is the “Disk Bytes / Sec”. I generally expect to see about 10MB/sec per drive which is a fairly safe number for most drives. Some can do better and some worse. So for the RAID 10 4+4 a throughput of around 80MB/sec is reasonable for sequential workloads whereas 10MB/sec is not. For random workloads I look at the last counter “Disk Transfers / Sec” and expect to see about 100 IO’s per second (IOPs) per drive. Once again there are drives that do much better like 180 IOPs and some that do worse like laptop drives which are around 60 IOPS. For the RAID 10 4+4 around 800 IOPs for 8Kbytes reads and writes is reasonable. Generally writes are a bit lower.

There are two Hyper-V storage counter sets because of how storage works in Hyper-V. In Hyper-V we provide two virtual storage buses for VM’s. One is IDE and one is SCSI. The Virtual IDE counters show up in the “Hyper-V Virtual IDE Controller” counter set unless Integration Services are loaded and then you will see the activity for both virtual IDE and SCSI in the “Hyper-V Virtual Storage Device” counter set. If you don’t have integration services installed the only the “Hyper-V Virtual IDE Controller” will show the VM disk activity. If you want to read more on Hyper-V storage check out this link – http://blogs.msdn.com/tvoellm/archive/2007/10/13/what-windows-server-virtualization-aka-viridian-storage-is-best-for-you.aspx .

I tend to monitor the all the counters on both the “Hyper-V Virtual IDE Controller” and “Hyper-V Virtual Storage Device” because there is just the basics set.

For the “Hyper-V Virtual IDE Controller” there are;

  • Read Bytes / Sec
  • Write Bytes / Sec
  • Read Sectors / Sec
  • Write Sectors / Sec

And for “Hyper-V Virtual Storage Device” there are;

  • Error Count
  • Flush Count
  • Read Bytes / Sec
  • Write Bytes / Sec
  • Read Count
  • Write Count

The Error count should always be zero for the virtual storage device.

Conclusion:

There are many counters in Hyper-V which provide useful information and will help you understand what the system is doing. The table below summarizes the counters above in a perfmon like format. Note that you should not always collect all instances in a counter set. Some counter sets have too many counters to collect all instances. If you attempt to collect all counters for all instances you might see periods in the data that are empty. This means the system is not keeping up the amount of counters requested.

In the table below you will see (_Total) or (*). (_Total) means only the total should be collected. (*) means collect all counters. \* means collect all the counters in the counter set. \<name> means only collect that counter. This is the notation perfmon uses in its collection files.

\Hyper-V Virtual Machine Health Summary \*

\Hyper-V Hypervisor\*

\Processor(_Total)\*

\Hyper-V Hypervisor Logical Processor(*)\%Guest Run

\Hyper-V Hypervisor Logical Processor(*)\%Hypervisor Run Time

\Hyper-V Hypervisor Logical Processor(*)\%Idle Run Time

\Hyper-V Hypervisor Logical Processor(*)\%Total Run Time

\Hyper-V Hypervisor Root Virtual Processor (*)\%Guest Run

\Hyper-V Hypervisor Root Virtual Processor (*)\%Hypervisor Run Time

\Hyper-V Hypervisor Root Virtual Processor (*)\%Idle Run Time

\Hyper-V Hypervisor Root Virtual Processor (*)\%Total Run Time

\Hyper-V Hypervisor Virtual Processor (_Total)\*

\Memory\Pages / Sec

\Memory\Available Bytes

\Hyper-V Hypervisor Partition(*)\2M GPA Pages

\Hyper-V Hypervisor Partition(*)\Deposited Pages

\Hyper-V Hypervisor Partition(*)\Virtual Processors

\Hyper-V Hypervisor Root Partition(*)\*

\Hyper-V VM Vid Partition(*)\Physical Pages Allocated

\Hyper-V VM Vid Partition(*)\Remote Physical Pages

\Network Interface(*)\*

\Hyper-V Virtual Switch(*)\*

\Hyper-V Legacy Network Adapter(*)\*

\Hyper-V Virtual Network Adapter(*)\*

\Physical Disk(*)\Current Disk Queue Length

\Physical Disk(*)\Disk Bytes / sec

\Physical Disk(*)\Disk Transfers/sec

\Hyper-V Virtual Storage Device(*)\*

\Hyper-V Virtual IDE Controller(*)\*