Trouble Connecting to Cluster Nodes? Check WMI!

Trouble Connecting to Cluster Nodes? Check WMI!

Rate This
  • Comments 7

A frequent cluster network connection issue we see happens when the cluster cannot use WMI.  WMI is Windows Management Instrumentation, which is an interface through which Windows components can provide information and notifications to each other, often between remote computers (more info about WMI).  Failover Clustering and System Center Virtual Machine Manager (SCVMM) often use WMI to communicate between cluster nodes, so if there is an issue contacting a cluster node, WMI may be the culprit.  We use WMI in most of our wizards, such as ‘Create Cluster Wizard’, ‘Validate a Configuration Wizard’, and ‘Add Node Wizard’, so any of the following messages and warnings we list could be due to WMI issues:

·         "RPC Server Unavailable" error.

·         Access is Denied.

·         The computer ‘Node1’ could not be reached.

·         Failed to retrieve the maximum number of nodes for ‘{0}’.

·         The computer ‘Node1.contoso.com’ does not have the Failover Clustering feature installed.  Use Server Manager to install the feature on this computer.

o   Note: first confirm you have installed the Failover Clustering feature on this node

 

 

Troubleshooting Steps

Follow these series of troubleshooting steps to allow you to continue connecting your cluster.

 

1) Ensure it is not a DNS Issue

It is possible that the reason you cannot contact the other servers is due to a DNS issue.  Before troubleshooting WMI, try connecting to that cluster, node or server using these methods when prompted by the cluster:

a)      Network Name for the cluster or node

a.       Example: MyNode

b)      FQDN for the cluster or node

a.       Example: MyNode.contoso.com

c)       IP Address for the cluster or node

a.       Example: 10.10.10.123

d)      Some wizard pages have a ‘browse’ button which allows you to find other clusters in the domain through Active Directory

 

 

2) Check your that WMI is Running on the Node

Windows Server Failover Clustering supports PowerShell and earlier version also come with a lightweight WMI client (WBEMTest).  Using either PowerShell or Wbemtest you can confirm that WMI is up and running.  Although you can use WMI remotely, it is better to test this directly on the server to ensure there are no other networking or firewall issue affecting the connection.

 

WMI Service

First check that the ‘Windows Management Instrumentation’ Service has started on each node by opening the Services console on that node.  Also check that its Startup Type is set to Automatic.

 

 

Next we will check that Failover Clustering WMI (MSCluster) is running.  These tests would be applicable after the cluster has already been created since we are checking for cluster-specific WMI functionality. 

WBEMTest or directly on the server

·         Launch CMD

·         CMD > WBEMTest

·         The Windows Management Instrumentation Tester will launch

·         Select Connect

·         Namespace: Root\MSCluster

·         Select Connect

o   If you see more options available, it means you are connected and WMI is working

§  Feel free to try a query to confirm, such as selecting ‘Query’ and enter: SELECT * from MSCluster_Resource

o   If you see an error, there is a WMI issue

PowerShell or remotely from another node within the same cluster (2008 R2 or higher only)

·         Launch Elevated PowerShell

·         PS > get-wmiobject mscluster_resourcegroup -computer MyNode -namespace "ROOT\MSCluster“

o   If you see a lot of information displayed, WMI is running

o   If you see an error, there is a WMI or firewall issue

 

 

3) Check your Firewall Settings

When a cluster is created, we automatically open up all the firewall settings you need.  However enterprise security policies can make changes over time, so it is worth checking that the firewall on each server is allowing cluster communication.  WMI request a DCOM connection to be made between the nodes, so you need to ensure that the ‘Remote Administration’ setting is enabled on every cluster node.  This can be done through the Windows Firewall GUI or running the elevated command: CMD > netsh firewall set service RemoteAdmin enable.  You will see a variety of errors or warnings if your firewall is not property configured.  For more information about how WMI uses the firewall and troubleshooting firewall issues, visit: http://msdn.microsoft.com/en-us/library/aa389286(VS.85).aspx.

 

 

4) Reboot the Node

This can often fix intermittent issues.  Follow best practices when rebooting the server, such as live migrating VMs and gracefully failing over other services and applications to reduce downtime.  Only do this if the other troubleshooting attempts described above have failed.

 

 

5) Rebuild a Corrupt WMI Repository

If you continue to see errors after checking that WMI is running, the firewall is properly configured and rebooting, it is possible that your WMI repository has become corrupt so the cluster can no longer read from it.  The following steps will enable you to rebuild your repository so that the other nodes can read from it again.  Rebuilding the repository should be your last troubleshooting step, not your first.

 

·         In the Services console, manually stop the WMI service to ensure that dependent services are stopped

·         Start WMI service again

·         Launch and elevated CMD or PowerShell

·         CMD/PS > winmgmt /ResetRepository

 

 

6) Patch WMI for Performance Improvements

You initial connection problems should now be fixed.  If you continue to experience intermittent connection issues caused by WMI, it could be due to the performance of your servers.  We have released a hotfix for 2008 R2 which improves the speeds at which we return WMI queries, and this is optimized for the most common WMI calls which SCVMM makes.  Get it here: http://support.microsoft.com/kb/974930.

 

 

Good luck in resolving your cluster connection issues with WMI!

 

Thanks,

Symon Perriman
Program Manager II
Clustering & High-Availability

Microsoft

 

Leave a Comment
  • Please add 8 and 3 and type the answer here:
  • Post
  • For what it's worth, I have seen in a few environments where the use of the -Authentication parameter needs to be used and set to 'PacketPrivacy' in order to make a WMI connection to a Windows 2008 cluster in PowerShell.  Otherwise, an 'Access Denied' is what you will see.  Also, the -Authentication parameter is a PowerShell V2 feature and does not apply to PowerShell V1.

    Get-WMIObject -Authentication PacketPrivacy mscluster_resourcegroup -computer MyNode -namespace "ROOT\MSCluster“

  • Great tip, thanks Boe!  

    -Symon

  • {Hopefully this is not a duplicate - it should be deleted, if it is}

    There is another, perhaps more likely cause of the error "Failed to retrieve the maximum number of nodes for ‘{0}’."

    Specifically you will get this error if the cluster you are creating is 64-bit and the client is 32-bit (and possibly vice-versa) - so it's probably more likely if the cluster is Server 2008 R2 Core, because you (generally) don't create the cluster using the console tools.

    Instead, use Failover Cluster Manager on a platform with the same architecture as the server cluster - Windows 7 x64 or Server 2008 R2, for Server R2 Core.

  • To avoid the 64-32 bits problem. Just install the WoW64-FailoverCluster Feature if you want to manage cluster from 32 bits computer.

  • MS support is telling me that rebuilding a WMI DB on a clustered 2008 R2 system is a bad idea, which I find hard to believe. Can anyone here expound upon that?

  • Rebuilding the entire WMI repository is a big hammer...  running 'winmgmt /ResetRepository' should be your last resort troubleshooting step.

    As a slightly smaller hammer, you might want to just reset the cluster WMI provider first.  This can be done by running:     ‘mofcomp C:\Windows\system32\wbem\ClusWMI.mof'

  • Hi.

    Am also receiving the same: "Operation has failed. An error occurred connecting the cluster <name>; Provider load failure"

    when trying to connect to the Windows 2012 cluster name, and also IP address. I am experiencing this on 2 of my 10 nodes. They are all running Hyper-V with Production VMs onboard. Guest VMs are running fine however.

    Moved all VM guests away, set node in Maintenance mode, and....

    1) Rebooted the one node which cannot connect - same result afterwards

    2) Confirmed all Server services on all 10 nodes are running - same result when trying to connect to cluster

    3) Have performed netsh winsock reset catalog, and rebooted - same result afterwards

    4) Have performed netsh int ip reset reset.log, and rebooted (which removed all IP address configuration from my NICs/Teams) - same result afterwards (after fixing the IPs)

    5) Confirmed WMI all ok using wbemtest suggestions, and checking/restarting WMI services and all dependant services - same result afterwards when trying to connect to cluster

    6) winmgmt /verifyrepository shows WMI is consistent

    So, I am no closer to connecting to the cluster name from this node. There seems to be hesitancy from other community members around rebuilding the WMI repository on cluster nodes. Any other ideas would be greatly appreciated...

    Thanks

    Darren

Page 1 of 1 (7 items)