Failover Clustering and Network Load Balancing Team Blog
Do you have a large number of virtualized workloads in your cluster? Have you been looking for a solution that allows you to detect if any of the virtualized workloads in your cluster are behaving abnormally? Would you like the cluster service to take recovery actions when these workloads are in an unhealthy state? In Windows Server 2012, there is a great new feature, in Failover Clustering called “VM Monitoring”, which does exactly that – it allows you monitor the health state of applications that are running within a virtual machine and then reports that to the host level so that it can take recovery actions. You can monitor any Windows service (such as SQL or IIS) in your virtual machine or ANY ETW event occurring in your virtual machine. When the condition you are monitoring gets triggered, the Cluster Service logs an event in the error channel on the host and takes recovery actions.
In this blog, I will provide a step by step guide of how you can configure VM Monitoring using the Failover Cluster Manager in Windows Server 2012.
Note: There are multiple ways to configure VM Monitoring. In this blog, I will cover the most common method. In a future blog, I will cover the many different flexible options for configuring VM Monitoring.
Before you can configure monitoring from the Failover Cluster Manager on a Management Console the following pre-steps are required:
1) Configure the guest operating system running inside the virtual machine
a) The guest operating system running inside the virtual machine must be running Windows Server 2012
b) Ensure that the guest OS is a member of a domain which is same as the host or a domain with a trust relationship with the host domain.
2) Grant the cluster administrator permissions to manage the guest
a) The administrator running Failover Cluster Manager must be a member of the local administrators group in the guest
3) Enable the “Virtual Machine Monitoring” firewall rule on the guest
a) Open the Windows Firewall console
b) Select “Allow an app or feature through Windows Firewall”
c) Click on “change settings” and enable the “Virtual Machine Monitoring” rule.
You can also enable the “Virtual Machine Monitoring” firewall rule using the Windows PowerShell® cmdlet Set-NetFirewallRule:
Set-NetFirewallRule -DisplayGroup "Virtual Machine Monitoring" -Enabled True
VM Monitoring can be easily configured using the Failover Cluster Manager through the following steps:
1) Right click on the Virtual Machine role on which you want to configure monitoring
2) Select “More Actions” and then the “Configure Monitoring” options
3) You will then see a list of services that can be configured for monitoring using the Failover Cluster Manager.
You will only see services listed that run on their own process e.g. SQL, Exchange. The IIS and Print Spooler services are exempt from this rule. You can however setup monitoring for any NT service using Windows PowerShell® using the Add-ClusterVMMonitoredItem cmdlet – with no restrictions:
Add-ClusterVMMonitoredItem –VirtualMachine TestVM -Service spooler
When a monitored service encounters an unexpected failure, the sequence of recovery actions is determined by the Recovery actions on failure for the service. These recovery actions can be viewed and configured using Service Control Manager inside the guest. In the example below, on the first and second service failures, the service control manager will restart the service. On the third failure, the service control manager will take no action and defer recovery actions to the cluster service running in the host.
The cluster service monitors the status of clustered virtual machines through periodic health checks. When the cluster services determines that a virtual machine is in a “critical” state i.e. an application or service inside the virtual machine is in an unhealthy state, the cluster service takes the following recovery actions:
1) Event ID 1250 is logged on the host
a. This event can be monitored with tools such as System Center Operations Manager to trigger further customized actions
2) The virtual machine status in Failover Cluster Manager will indicate that the virtual machine is in an “Application Critical” state.
Get-ClusterResource “TestVM” | fl StatusInformation
3) Recovery action is taken on the virtual machine in “Application Critical” state
a. The virtual machine is first restarted on the same node
Note: The restart of the virtual machine is forced but graceful
b. On the second failure, the virtual machine restarted and failed over to another node in the cluster.
Note: The decision on whether to failover or restart on the same node is configurable and determined by the failover properties for the virtual machine.
That’s the VM Monitoring feature in Windows Server 2012 in a nutshell!
Subhasish Bhattacharya Program Manager Clustering & High Availability Microsoft
Can you use VM Monitor without a cluster? I could see scenarios where that would be desireable when using Hyper-V Replica instead (trigger failover to offsite replica) using VM Monitoring?
VM Monitoring is a feature of Failover Clustering and is not available without it. However, remember that Hyper-V Replica is fully integrated with clustering as well. So you can definately configure a highly available VM on a cluster which is using VM Monitoring for application health monitoring and is also using Hyper-V Replica for disaster recovery.
In our case, Management OS on cluster codes and Guest OSs in VMs lay in different VLANs and can communicate only via hardware firewall. Which ports has to be enabled from MgmgOS to VMs?
The following are the firewall rules that need to be enabled to configure VM Monitoring *from the host*. Note that you do not need to enable any firewall rules if you configure VM Monitoring directly in the guest. To configure VM Monitoring directly in the guest - Install the Failover Clustering management tools in the guest and then use the Add-ClusterVMMonitoredItem PowerShell cmdlet in the guest:
Echo Request - ICMPv4-In - Inbound rule. Echo Request messages are sent as ping requests to other nodes. ICMPv4 Protocal #: 1 Local ports: All ports
Echo Request - ICMPv6-In - Inbound rule. Echo Request messages are sent as ping requests to other nodes. ICMPv6 Protocal #:58 Local ports: All ports
NB-Session-In Inbound rule to allow NetBIOS Session Service connections. [TCP 139] TCP Protocal #:6 Local ports:139
RPC Inbound rule for the Task Scheduler service to be remotely managed via RPC/TCP. TCP Protocal #:6 Local ports:RPC Dynamic Ports
DCOM-In Inbound rule to allow DCOM traffic for remote Windows Management Instrumentation. [TCP 135] TCP Protocal #:6 Local ports:135
thanks for the quick reaction and information.
The monitoring configuration of to be monitored services could be successfully accomplished directly in the Guest OS with the following cmdlets:
Add-ClusterVMMonitoredItem -Service <service1>,<service2>
However, monitoring the configured services from the MgmtOS using Failover Cluster Manager appear to need a network connection. The "Monitored Services:" field for the clustered VM in the FCM states "Unable to determine monitored services: The network path was not found." This changes as soon as the network connection between MgmtOS and VM is enabled, which we would due to security reasond want to avoid.
From the presentation WSV411 (Teched NA 2012) I understood the Hyper-V Bus would be used for the monitoring.
Am I missing something?
The failover cluster manager would need the firewall ports to be open to display the services being monitored. However, the actual health state is being send from the guest to the host through the Hyper-V bus. So in the case of a failure you would see the cluster service take action and your VM would be placed in App Critical state until remediation (this would be reflected on the UI). On the Cluster Manager the VM state would be displayed as Online when there are no application failures. It would change to App Critical when any of the applications being monitored fails.
As a test you can terminate the monitored service and check if the host does take action (it should).
the test of termination the processes behind the monitored services via Task Manager in the GuestOS worked, the cluster service restarted the GuestOS after multiple failures of the services, just as configured in services.msc.
The PS cmdlets Get-ClusterGroup <VM> and Get-ClusterResource "Virtual Machine <VM>" inform of the state "Pending(Application in VM Critical)" during the restart of the GuestOS. Obviously, these cmdlets use the Hyper-V bus.
The PS cmdlet Get-ClusterVMMonitoredItem -VirtualMachine <VM> appear to need a network connection, as well as FSM to display status of the VM. This behaviour is a bit awkward to me, maybe this will be fixed with one of the next patches.
Thanks for the explanation.
Thanks for the feedback Zoran - we can certainly look into incorporating this into a future release. Just out of curiosity, what workloads are you using VM Monitoring to monitor?
We use it to monitor ERP systems with 3rd party database software. Guest clustering is an obvious alternative, but slightly better availability does not justify more complex application installation and administration.