Failover Clustering and Network Load Balancing Team Blog
The first post in this series looked at two changes that have been implemented in Windows Server 2008 R2 Failover Clustering to improve group distribution and resource load balancing amongst nodes, so that each node is more likely to receive an equal number of groups after certain events (such as a move, cluster start, or failover). This post will discuss some of the new resource group management features in Windows Server 2008 R2 Failover Clustering, including Auto Start, Persistent Mode and Group Wait Delay.
The following new settings have added:
· Auto Start
o Determines if a group will start automatically when starting a cluster or recovering after a failure
o Failover Cluster Manager: Auto start
o PowerShell & Cluster API: Priority
· Persistent Mode
o When enabled, this remembers the last node the administrator onlined a group on, or moved a group to. The group will be hosted on this “default” node on next cluster cold start.
o Failover Cluster Manager: Enable Persistent Mode
o PowerShell & Cluster API: DefaultOwner
· Group Wait Delay
o Specifies the amount of time groups will wait for their default or preferred owner node to come up during cluster cold start, before the groups are moved to another node.
o PowerShell & Cluster API: ClusterGroupWaitDelay
This setting is intended to delay lower priority roles from recovering after failures, in order to allow higher priority groups to take the necessary system resources to come online successfully. This may be particularly useful in Hyper-V clusters, where administrators may want to keep lower priority virtual machines offline, to give higher priority virtual machines a better chance to come online faster when the cluster is started or after failure recovery. Note that administrator action is required after each and every failure of groups not marked with “Auto Start” to bring them back online.
By default, cluster roles have this setting enabled. To disable auto-start for a group (changing it low-priority), do the following in Failover Cluster Manager:
1. In the tree view, select the group under Services and applications
2. In the Actions pane, select Properties
3. Deselect the “Auto start” checkbox
When Auto Start is disabled, after failure of any resource in the group, that group will remain in a failed state and not come online. As a result, low-priority groups will not failover due to resource failures. On group failover due to node failure, the group will remain offline regardless its previous state. When cluster cold starts, the group will stay offline as well. The administrator will need to either manually online or move the group to bring it back online.
If the administrator manually moves a group, whether its resources come online depends on their persistent state. The persistent state of a resource simply reflects the last resource state set by the administrator (either online or offline), and the cluster will maintain that state when possible. The persistent state is automatically set when the administrator manually onlines or offlines a group or resource. So if the administrator never brought a group online, or previously offlined the group without bringing it back online, the group will stay offline after a manual move. However, if the administrator brought the group online at some point, but the group became failed or offline due to resource or node failure, manually moving the group will also bring it online.
If one or more preferred owners are set, and failback is enabled for the group, the group will come online as normal after failback—as if it were manually moved. Because a group could be online before failback, and failback is an administrator-defined policy to automatically move groups to a more preferred node when available, failback is more useful when a group is kept online.
This mode is intended to allow groups to come online on the node which an admin last moved them to. By default, cluster roles have this setting disabled, except for Hyper-V virtual machine cluster roles, which have this enabled by default. This setting is useful when the cluster is shutdown and later started, in order to better distribute the resources across the nodes and allow them to come online faster, as they were likely spread across the nodes before the cluster was offlined. Otherwise, all the resources will attempt to restart on the first nodes which achieve quorum and compete for resources. This only applies to a group if it did not failover after being placed by the administrator. If a group has failed over since the last administrator placement, it is brought online on the node which the administrator last move it to.
To enable persistent mode for a group, do the following in Failover Cluster Manager:
3. Select the “Enable persistent mode” checkbox
On a cluster cold start, any group with persistent mode enabled will wait for and come online on its “default node”, which is the last node the administrator moved the group to, or onlined the group on. If the administrator last moved the group to a “best possible” node, the cluster will manage group placement and the default node is cleared. Note that if the group’s preferred owners are defined, the preferred owners take precedence over the default owner. The default owner behaves as if it were the last preferred owner—the group would be placed there if it could not be placed on any of its preferred owners.
This mode has no effect on group failover or other moves.
To prevent groups from waiting indefinitely for their preferred or default owner nodes, the group wait delay can be configured. However, with a larger numbers of groups, it takes longer for nodes to join, and the group wait delay may need to be increased, which is discussed in the next section.
This setting only applies during a cluster cold start, and is intended to allow administrators to balance the amount of time groups stay in Pending state while waiting for their preferred or default owner nodes to come online. Increased wait delay causes groups to take longer to come online, but also increases the likelihood that groups will actually be hosted on their preferred or default owner nodes. The default value for the Group Wait Delay is 30 seconds.
This setting applies only to groups that have preferred or default owners specified. Such group could come online on any of its preferred or default owner nodes that come up within the group wait delay.
This is a cluster-wide property ClusterGroupWaitDelay can be configured using PowerShell:
PS> (Get-Cluster <clustername>).ClusterGroupWaitDelay = <time in seconds>
For example, to change this setting on a cluster named “Cluster1” to 300 seconds, use the command:
PS> (Get-Cluster Cluster1).ClusterGroupWaitDelay = 300
It is also important to keep in mind that the cluster service start time can be affected by hardware, system resources, the number of groups, and other factors, so you may need to make further adjustments.
Howard SunSoftware Development Engineer in TestClustering & High-AvailabilityMicrosoft