Welcome to MSDN Blogs Sign in | Join | Help

New Validation Tests in 2008 R2 Failover Clustering

Hi Cluster Fans,

 

Our Validate a Configuration Wizard was so popular that we’ve improved it in Windows Server 2008 R2.  Validate is the tool which verifies that your entire configuration is suitable for Failover Clustering.  It will test the servers, networks, storage, run a series of failover tests, and inventory all the configuration information into saved reports.  It can be run before, during or after deployment as a troubleshooting tool. 

 

 

Cluster Support

Running ‘Validate’ and making sure that no tests fail is one of only two requirements to have a supported Failover Cluster in Windows Server 2008 / R2.  The other is that each component has a Windows Server 2008 / R2 Logo.

 

More information about the Validate a Configuration Wizard: http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx

More information about the Logo Program: http://www.microsoft.com/whdc/winlogo/default.mspx

 

 

New Tests for Windows Server 2008 R2

There are 5 categories of tests in Windows Server 2008 R2.  The entire Cluster Configuration category is new, but this set of tests will only be run on clusters which have already been created.  It lists useful information about the current configuration which helps troubleshooters easily understand how it is deployed.  The image below is an example of some of the new, prescriptive information from Validate tests.

 

 

 

The Inventory and Storage categories run the same tests, with some minor changes.  The Network and System Configuration categories have a few additions.

 

In this image, the highlighted tests are new.

 

 

 

Cluster Configuration

       List * – Provides an overview of Core Group, Networks, Resources, Storage, Services and Applications.  It gives useful information about how the resources are configured and include graphical dependency reports.

       Validate Quorum Configuration – Checks if the quorum mode used is recommended, with the settings depending on the number of nodes and availability of storage similar to the “Configure Cluster Quorum Wizard”, in addition it checks the recommended values for quorum arbitration time

       Validate Resource Status - Validates that cluster resources are online, and list the cluster resources that are running in separate resource monitors.  If a resource is running in a separate resource monitor, it is usually because the resource failed and the Cluster service began running it in a separate resource monitor (to make it less likely to affect other resources if it fails again).

       Validate Service Principal Name - Issue a warning if the Service Principal Name cannot be found on a Kerberos enabled network names.  SPN verifies the identity of the computer to which it is connecting

       Validate Volume Consistency - If any volumes are flagged as inconsistent ("dirty"), it provide a reminder that running chkdsk is recommended.

 

Network

       List Network Binding Order - Lists the order in which networks are bound to each adapters on the nodes.

       Validate Multiple Subnet Properties - if it is determined to be a multi-subnet cluster, retrieve the settings for all network name resources and determine if the private properties for HostRecordTTL, RegisterAllProvidersIP and PublishPTRRecords are optimal for that configuration and validates that settings related to DNS are configured appropriately for clusters using multiple subnets.

 

System Configuration

·         Validate Cluster Service and Driver Settings – Validate startup settings used by services and drivers, including the Cluster service, CSVFilter.sys, NetFT.sys, and Clusdisk.sys.

·         Validate Memory Dump Settings - Validate that none of the nodes currently require a reboot (as part of a software update) and that each node is configured to capture a memory dump if it stops running.

·         Validate System Driver Variable - Validate that all nodes have the same value for the system drive environment variable, such as C:\

·         Validate Operating System Installation Options – The ‘Validate Operating System Installation Options’ test will check that all nodes are using the Core or the Full Installation option.  It is required that all nodes run the same installation option since not all roles and features are supported on Core, so workloads would not be able to failover to Core nodes if the role or feature is not available.

o   This replaced the ‘Validate Operating Systems’ tests which was no longer necessary since x86 architecture is no longer supported in Windows Server 2008 R2, and we now check that all nodes are x64 or ia64 when adding them to the list of servers to Validate. 

 

 

For more information about all of the Validation tests, visit: http://technet.microsoft.com/en-us/library/cc726064.aspx

 

We will continue to improve ‘Validate’ in our future products, so send us feedback about which new tests you would like to see.  You can send feedback by clicking on the ‘Email’ link in the upper right corner of this page.

 

Thanks,

Symon Perriman
Program Manager II
Clustering & High-Availability
Microsoft

Posted by msclustm | 0 Comments

New Cluster Docs for Cluster Shared Volumes (CSV) & Migration

Hi Cluster Fans,

 

We have recently added some Windows Server 2008 R2 Failover Clustering content on the web.  Here are some of our recent publications with special thanks to Jan Keller and our technical writers.

 

Cluster Shared Volumes (CSV)

 

·         Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2:

http://technet.microsoft.com/en-us/library/ff182346(WS.10).aspx

 

·         Requirements for Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2:

http://technet.microsoft.com/en-us/library/ff182358(WS.10).aspx

 

·         Designating a Preferred Network for Cluster Shared Volumes Communication:

http://technet.microsoft.com/en-us/library/ff182335(WS.10).aspx

 

·         Recommendations for Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2:

http://technet.microsoft.com/en-us/library/ff182320(WS.10).aspx

 

·         Backing Up Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2:

http://technet.microsoft.com/en-us/library/ff182356(WS.10).aspx

 

·         Cluster Shared Volumes Events and Errors:

http://technet.microsoft.com/en-us/library/ee830307(WS.10).aspx

 

o   Failover Clustering Events and Errors:

http://technet.microsoft.com/en-us/library/dd353290(WS.10).aspx

 

Migration

 

·         Migrating Clustered Services and Applications to Windows Server 2008 R2 Step-by-Step Guide:

http://technet.microsoft.com/en-us/library/ff182314(WS.10).aspx

 

·         Migration Paths for Migrating to a Failover Cluster Running Windows Server 2008 R2:

http://technet.microsoft.com/en-us/library/ee791924(WS.10).aspx

 

·         In-Place Migration for a Two-Node Cluster:

http://technet.microsoft.com/en-us/library/ff182312(WS.10).aspx

 

·         Cluster Migrations Involving New Storage: Mount Points:

http://technet.microsoft.com/en-us/library/ff182345(WS.10).aspx

 

·         Migrating DHCP to a Cluster Running Windows Server 2008 R2 Step-by-Step Guide:

http://technet.microsoft.com/en-us/library/ee460952(WS.10).aspx

 

 

Thanks,

Symon Perriman

Program Manager II
Clustering & High-Availability

Microsoft

R2 Print Cluster? Get this Hotfix! KB 976571

Hi Cluster Fans,

 

Many of you use Failover Clustering to provide high-availability to your print servers to ensure that the print spooler resource stays up and running.  We have recently released a Rollup Hotfix specific to print clustering which contains several fixes to improve the stability of the overall print system and fix issues with migration of print servers using PrintBRM.  Microsoft recommends that you install this Hotfix on all of your 2008 R2 print cluster nodes.

 

 

If you are running a Windows Server 2008 R2 print cluster GET THIS HOTFIX!  http://support.microsoft.com/kb/976571

 

 

How will print cluster issues be triaged by Microsoft?

 

 If you run into any issues and need to call Microsoft’s support line, they will follow this triage process:

1)      Does your complete solution pass the ‘Validate a Configuration’ tests?

·         No: Fix the errors which are reported to bring your cluster to a supported configuration and try to reproduce the problem again.

·         Yes: Proceed to Step 2

·         More information about Validate: http://technet.microsoft.com/en-us/library/cc772055.aspx

2)      Does your print cluster have the rollup Hotfix, KB 976571?

·         No: Install this Hotfix on all of your cluster nodes and try to reproduce the problem again.

·         Yes: Microsoft will triage the specific issue you are reporting.  This may include recommending Driver Isolation and removing unnecessary 3rd party print components, like language monitors and print processors.  Or updating required 3rd party print components, such as is recommended for the HP Universal Print Driver when the properties page takes a long time to load: http://blogs.technet.com/yongrhee/archive/2009/09/14/windows-2008-r2-cluster-the-print-queue-propertes-of-a-hp-printer-may-take-a-long-time-to-open.aspx

·         More information about installing Hotfixes: http://blogs.msdn.com/clustering/archive/2009/06/12/9731520.aspx (follow this process even though it refers to Service Packs)

 

 

What is included in this Hotfix?

 

This Hotfix contains several fixes specific to print clustering, KB 976571.  This single Hotfix package which is available as a free download will fix all of the issues described below.

 

Issue 1: Shared Print Disappears

 

·         A shared printer is installed on a print server failover cluster that is running Windows Server 2008 R2.

 

·         The print processor of the related driver has multiple dependent files.

 

·         The print server fails over to another node.

 

In this scenario, the shared printer may be missing on the new node after the failover operation.  This is because only one dependent file is copied to the destination node in every failover operation. The printer is missing after the failover operation because some dependent files are not available.

 

Issue 2: Spooler Stops Responding

 

·         A shared printer is installed on a print server failover cluster that is running Windows Server 2008 R2.

 

·         Some clients open connections to this shared printer.

 

·         The print server fails over to another node.

 

In this scenario, the spooler service may stop responding after this printer server resource fails over to another node, so all print jobs fail.  This is because the print server has entered an infinite loop.

 

Issue 3: Restoring Printers using PrintBRM


When you use the Print Back-up Restore Migrate utility (PrintBRM.exe) to restore printers on a print server failover cluster that is running Windows Server 2008 R2, some printers cannot be restored, and you receive the following error message: 1081 invalid printer name.  In order to add the print processors to the print server failover cluster, the print spooler resource must first go offline and then come online.  This issue occurs because the printer is restored before the print spooler resource comes back online.

 

 

 

These fixes will also be included in Windows Server 2008 R2 SP1.

 

Thanks,

Symon Perriman
Program Manager II
Clustering & High-Availability
Microsoft

PowerShell for Failover Clustering: Creating Highly Available Workloads

Hi Clustering PowerShell Scripters,

 

One of the things we’ve provided in Failover Clustering PowerShell is a set of CMDlets to easily create highly available workloads in a cluster.

 

PS C:\Windows\system32> Get-Command -Module FailoverClusters | ?{ $_.Name -like "Add-Cluster*Role" }

CommandType     Name                                                Definition

-----------     ----                                                ----------

CMDlet          Add-ClusterFileServerRole                           Add-ClusterFileServerRole [[-Name] <String>] [-S...

CMDlet          Add-ClusterGenericApplicationRole                   Add-ClusterGenericApplicationRole [[-Name] <Stri...

CMDlet          Add-ClusterGenericScriptRole                        Add-ClusterGenericScriptRole [[-Name] <String>] ...

CMDlet          Add-ClusterGenericServiceRole                       Add-ClusterGenericServiceRole [[-Name] <String>]...

CMDlet          Add-ClusterPrintServerRole                          Add-ClusterPrintServerRole [[-Name] <String>] [-...

CMDlet          Add-ClusterServerRole                               Add-ClusterServerRole [[-Name] <String>] [-Stora...

CMDlet          Add-ClusterVirtualMachineRole                       Add-ClusterVirtualMachineRole [[-Name] <String>]...

 

Each of the above CMDlets takes care of:

·         Creating the cluster group

·         Moving disk resource(s) into the group

·         Creating resources which may include creating the correct IP resources depending on your cluster network configuration (number of IPs and DHCP vs. static IPs)

·         Setting properties for each resource created

·         Setting dependencies between resources

·         Bringing the resources online

 

With that said, you’ll notice that we have not provided a CMDlet for each and every clustered workload we can create through our HA Wizard in our Failover Cluster Manager GUI.

 

 

 

In this blog, I will show you how easy it is to create other workloads using PowerShell without the extra steps above.  The secret lies in the Add-ClusterServerRole CMDlet.

To illustrate this, I’ll be using the Microsoft Distributed Transaction Coordinator, MSDTC, role as an example. That might be useful if you’re trying to automate your SQL Server installs. Note that while this is focused on MSDTC, you can use the same concepts for other workloads.

 

For comparison purposes, I created a DTC workload named ahmedbc4Dtc on my cluster using the Failover Cluster Manager GUI. As seen here, that populated the group with the right resources and the correct dependencies between the resources.

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc

Name                                    OwnerNode

----                                    ---------

ahmedbc4Dtc                             ahmedbc4-n2

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc | Get-ClusterResource | ft -auto

Name                                                State  Group       ResourceType

----                                                -----  -----       ------------

ahmedbc4Dtc                                         Online ahmedbc4Dtc Network Name

Cluster Disk 6                                      Online ahmedbc4Dtc Physical Disk

IP Address 157.55.88.0 (2)                          Online ahmedbc4Dtc IP Address

IP Address 2001:4898:0:fff:200:5efe:157.55.88.0 (2) Online ahmedbc4Dtc IPv6 Tunnel Address

IP Address 2001:4898:f0:1000:: (2)                  Online ahmedbc4Dtc IPv6 Address

MSDTC-ahmedbc4Dtc                                   Online ahmedbc4Dtc Distributed Transaction Coordinator

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc | Get-ClusterResource | Get-ClusterResourceDependency | ft -auto

Resource                                            DependencyExpression

--------                                            --------------------

ahmedbc4Dtc                                         [IP Address 157.55.88.0 (2)] or [IP Address 2001:4898:f0:1000:: ...

Cluster Disk 6

IP Address 157.55.88.0 (2)

IP Address 2001:4898:0:fff:200:5efe:157.55.88.0 (2) ([IP Address 157.55.88.0 (2)])

IP Address 2001:4898:f0:1000:: (2)

MSDTC-ahmedbc4Dtc                                   ([ahmedbc4Dtc]) and ([Cluster Disk 6])

 

Now, how can I create something similar with PowerShell?

 

The easiest way is to create a base server role with the Add-ClusterServerRole CMDlet. I named it ahmedbc4Dtc1. This CMDlet does the heavy weight lifting for you, including creating the right number of IP resources and whether they are DHCP or statically configured based on your cluster networking configuration.  It also moves the disk resource for you in the group and sets the right dependencies between the network name resource and the IP resources created.

 

PS C:\Windows\system32> Add-ClusterServerRole -Name ahmedbc4Dtc1 -Storage "Cluster Disk 7"

Name                                    OwnerNode                                                                 State

----                                    ---------                                                                 -----

ahmedbc4Dtc1                            ahmedbc4-n2                                                              Online

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc1 | Get-ClusterResource | ft -auto

Name                                                State  Group        ResourceType

----                                                -----  -----        ------------

ahmedbc4Dtc1                                        Online ahmedbc4Dtc1 Network Name

Cluster Disk 7                                      Online ahmedbc4Dtc1 Physical Disk

IP Address 157.55.88.0 (3)                          Online ahmedbc4Dtc1 IP Address

IP Address 2001:4898:0:fff:200:5efe:157.55.88.0 (3) Online ahmedbc4Dtc1 IPv6 Tunnel Address

IP Address 2001:4898:f0:1000:: (3)                  Online ahmedbc4Dtc1 IPv6 Address

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc1 | Get-ClusterResource | Get-ClusterResourceDependency | ft -auto

Resource                                            DependencyExpression

--------                                            --------------------

ahmedbc4Dtc1                                        [IP Address 157.55.88.0 (3)] or [IP Address 2001:4898:f0:1000:: ...

Cluster Disk 7

IP Address 157.55.88.0 (3)

IP Address 2001:4898:0:fff:200:5efe:157.55.88.0 (3) ([IP Address 157.55.88.0 (3)])

IP Address 2001:4898:f0:1000:: (3)

 

Notice it is missing the DTC resource.  So, just add that to the group and add the right dependency.  In this case, the DTC resource depends on a disk resource and network name resource as you can see above with the group I created earlier for comparison purposes through the GUI.

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc1 | Add-ClusterResource -Name MSDTC-ahmedbc4Dtc1 -ResourceType "Dist

ributed Transaction Coordinator"

Name                          State                         Group                         ResourceType

----                          -----                         -----                         ------------

MSDTC-ahmedbc4Dtc1            Offline                       ahmedbc4Dtc1                  Distributed Transaction Co...

 

PS C:\Windows\system32> Add-ClusterResourceDependency MSDTC-ahmedbc4Dtc1 ahmedbc4Dtc1

Name                          State                         Group                         ResourceType

----                          -----                         -----                         ------------

MSDTC-ahmedbc4Dtc1            Offline                       ahmedbc4Dtc1                  Distributed Transaction Co...

 

PS C:\Windows\system32> Add-ClusterResourceDependency MSDTC-ahmedbc4Dtc1 "Cluster Disk 7"

Name                          State                         Group                         ResourceType

----                          -----                         -----                         ------------

MSDTC-ahmedbc4Dtc1            Offline                       ahmedbc4Dtc1                  Distributed Transaction Co...

 

PS C:\Windows\system32> Get-ClusterGroup ahmedbc4Dtc1 | Get-ClusterResource | Get-ClusterResourceDependency | ft -auto

Resource                                            DependencyExpression

--------                                            --------------------

ahmedbc4Dtc1                                        [IP Address 157.55.88.0 (3)] or [IP Address 2001:4898:f0:1000:: ...

Cluster Disk 7

IP Address 157.55.88.0 (3)

IP Address 2001:4898:0:fff:200:5efe:157.55.88.0 (3) ([IP Address 157.55.88.0 (3)])

IP Address 2001:4898:f0:1000:: (3)

MSDTC-ahmedbc4Dtc1                                  ([ahmedbc4Dtc1]) and ([Cluster Disk 7])

 

And, now, I’ll online the group.

 

PS C:\Windows\system32> Start-ClusterGroup ahmedbc4Dtc1

Name                                    OwnerNode                                            State

----                                    ---------                                            -----

ahmedbc4Dtc1                            ahmedbc4-n2                                          Online

 

 

That’s all for the group. But, one thing you’ll notice in the Failover Cluster Manager is the difference between the group the GUI created and the one I just created with PowerShell.  Notice the nice icon for the DTC group (icon beside the group name and type on the tabular view in the first diagram) and the “Manage MSDTC” action link (in the second diagram).

 

 

 

The reason this is not the same with the group I created via PowerShell is that I didn’t set the group type for the group properly when I added the DTC resource to the group.

 

PS C:\Windows\system32> gwmi -Namespace root/MSCluster -Class MSCluster_ResourceGroup | ?{ $_.Name -eq "ahmedbc4Dtc" } |

 fl Name,GroupType

Name      : ahmedbc4Dtc

GroupType : 103

 

PS C:\Windows\system32> gwmi -Namespace root/MSCluster -Class MSCluster_ResourceGroup | ?{ $_.Name -eq "ahmedbc4Dtc1" }

| fl Name,GroupType

Name      : ahmedbc4Dtc1

GroupType : 9999

 

To “fix” this up, set the group type via WMI.

 

PS C:\Windows\system32> ( gwmi -Namespace root/MSCluster -Class MSCluster_ResourceGroup | ?{ $_.Name -eq "ahmedbc4Dtc1"

} ).SetGroupType(103)

 

Now, you’re set.

 

 

 

Happy scripting!

 

Regards,

Ahmed Bisht

Senior Program Manager

Clustering & High-Availability

Microsoft

 

Network Load Balancing in R2: Using ETW Tracing

Hi,

 

We are going to talk about a new feature in Windows Server 2008 R2 for Failover Clustering, ETW tracing.  With this added functionality, we have provided a mechanism of tracing why NLB has decided to drop or accept a given network packet.

 

This blog provides the following information on the new ETW tracing for NLB delivered in Windows Server 2008 R2.

 

·         Overview of ETW Tracing

·         How to Setup ETW Tracing

·         How to Enable, Disable and View the Traces

·         How to Uninstall ETW Tracing Manifest

·         Examples of Tracing Output

 

Details on how to interpret the results and use them for advanced debugging purposes will be covered in the future blog posts.

 

Overview of ETW Tracing

 

ETW is best described by the following MSDN article:

 

Event Tracing for Windows (ETW) is a general-purpose, high-speed tracing facility provided by the operating system. Using a buffering and logging mechanism implemented in the kernel, ETW provides a tracing mechanism for events raised by both user-mode applications and kernel-mode device drivers. Additionally, ETW gives you the ability to enable and disable logging dynamically, making it easy to perform detailed tracing in production environments without requiring reboots or application restarts.

 

NLB leverages this infrastructure to provide the end users with more detailed information regarding why packets are accepted or rejected by NLB.  While ETW is designed and implemented with performance in mind, you want to be aware that these logs consume storage space.  For example, a server handling 100 connections per second, could fill up many MB of data in less than a minute due to the detailed level of analysis it is doing, so it is important to be aware of this if you plan to run ETW tracing for extended periods of time.  Below, in the installation section, you can see the command line for finding out where the ETW log file is located and how to delete it when done with debugging.

 

How to Setup ETW Tracing

 

You can find a manifest file here: http://blogs.msdn.com/clustering/pages/9944942.aspx.  The text on that page should be saved as networkloadbalancing-core-diagnostic.events.man and copied to your NLB cluster nodes.  

 

Important: this is an unsupported script, please use this script at your own risk.  Microsoft’s Customer Support Services (CSS/PSS) will not support issues associated with this script.

 

Then run the following command from the directory where you copied the manifest file:

 

> wevtutil im networkloadbalancing-core-diagnostic.events.man

 

Note that this needs to be done from an elevated console window.  The above command will only register the NLB manifest.  The tracing is not yet being collected, but the following sections describe how to do this.

 

How to Enable, Disable and View the Traces

 

On the NLB cluster node you can to collect traces through the UI or Command Line.

 

UI (Event Viewer – eventvwr.msc)

 

·         Enable Analytics and Debugging Logs (one time)

·         Make sure you’ve installed the manifest

·         Click “View” menu and select “Show Analytic and Debug Log”

 

 

 

·         To start tracing.

·         Navigate to the channel: Events Viewer\Applications and Services Logs\Microsoft\Windows\NLB\Diagnostics.  Right click on the channel and select “Enable Log”.

 

 

 

·         Run your scenario

 

·         To stop and view collected events

o   Navigate to the channel. Right click on the channel and select Disable Log. You will now see events show up in the list.

 

 

 

·         At this point you should see the NLB ETW tracing in the Diagnostics pain on the middle of the screen.

 

 

 

 

 

Command Line (Event Viewer - wevtutil.exe)

 

·         To see provider information:

 

           >  wevtutil gl Microsoft-Windows-NLB/Diagnostic

 

 

 

This tells us that the ETW tracing file that is being generated is stored at:

%SystemRoot%\System32\Winevt\Logs\Microsoft-Windows-NLB%4Diagnostic.etl

 

·         To see statistics:

> wevtutil gli Microsoft-Windows-NLB/Diagnostic

 

·         To start use:

> wevtutil sl Microsoft-Windows-NLB/Diagnostic /e:true /q

 

·         To stop use:

> wevtutil sl Microsoft-Windows-NLB/Diagnostic /e:false /q

 

·         To view events as a text file first stop the provider and then use:

> wevtutil qe Microsoft-Windows-NLB/Diagnostic /f:text > events.txt

 

How to Uninstall the NLB Tracing Manifest

 

From an elevated console window, run:

 

> wevtutil um networkloadbalancing-core-diagnostic.events.man

 

 

 

 

 

Example of Tracing Output

 

 

Node1: Node 1 accepted the connection

1.  Log Name:      Microsoft-Windows-NLB/Diagnostic

2.  Source:        Microsoft-Windows-NLB-Diagnostic

3.  Date:          10/30/2009 2:36:50 PM

4.  Event ID:      1

5.  Task Category: Filtering Receive Accept

6.  Level:         Information

7.  Keywords:      Accept,Receive,Filtering,NLB

8.  User:          N/A

9.  Computer:      G10C3N8X64N2.ctdev.nttest.microsoft.com

10.         Description:

NLB cluster on interface {10000000-0000-0006-7b00-310030003000} received traffic from 10.30.4.198:63691 destined to 10.30.4.157:5001 [protocol: TCP (0x0), flags: 0x2]. This cluster node will accept this traffic (reason: Unconditional Ownership). Source port 63691, destination port 5001, and protocol TCP have been used for the accept/drop decision.

 

In the above tracing from Node1, we see that the connection we defined in our user scenario is being accepted (line 5). The event ID “1” (line 4) indicates that this event pertains to an “Accepted Connection”.  The highlighted green segment depicts that the reason this connection was accepted was that this packet was “unconditionally owned” by the current node.  We will see more reasons in future blog posts regarding debugging NLB with ETW tracing.

 

 

Node2: Node 2 rejected the connection

1.  Log Name:      Microsoft-Windows-NLB/Diagnostic

2.  Source:        Microsoft-Windows-NLB-Diagnostic

3.  Date:          10/30/2009 2:36:50 PM

4.  Event ID:      2

5.  Task Category: Filtering Receive Drop

6.  Level:         Information

7.  Keywords:      Drop,Receive,Filtering,NLB

8.  User:          N/A

9.  Computer:      G10C3N8X64N1.ctdev.nttest.microsoft.com

10.         Description:

NLB cluster on interface {10000000-0000-0006-7b00-300036004200} received traffic from 10.30.4.198:63691 destined to 10.30.4.157:5001 [protocol: TCP (0x0), flags: 0x2]. This cluster node will drop this traffic (reason: Owned Elsewhere). Source port 63691, destination port 5001, and protocol TCP have been used for the accept/drop decision.

 

Similarly, Node2 has rejected this packet, and the reason in green highlighting shows that this packet was “Owned Elsewhere” (Node1 as per above).  The even ID “2” (line 4) can be used in event viewer to filter for only dropped packets.

 

 

Thanks,

Rohan Mutagi & Ahmed Bisht

Clustering and High-Availability Team

Microsoft

Backup your CSV disks with DPM 2010 Beta!

Happy Holidays Cluster Fans!

 

The System Center Data Protection Manager (DPM) 2010 Beta is now available for download and testing. 

 

This supports backing up your Cluster Shared Volume (CSV) disks, in addition to the following Clustering and Hyper-V features:

·         Supports Windows Server from 2003 through 2008 R2 including Hyper-V Server 2008 and 2008 R2

·         Protection of Live Migration-enabled servers running on CSV in Hyper-V R2

·         Flexibility to protect virtual machines from Windows guests or from the hypervisor host

·         Host-based backups will now enable single-item restores from within the VHD

·         Ability to restore virtual machines to an alternative host

·         Enhanced disaster recovery options for long-distance data protection and business continuity initiatives

 

Try the Beta now at: http://www.microsoft.com/systemcenter/dataprotectionmanager/en/us/2010beta-overview.aspx.

Documentation of how to use DPM 2010 with CSV is included with the download package.

 

We still strongly recommend using a hardware snapshot provider for protecting CSV workloads.  The rationale for this recommendation is described here: http://blogs.technet.com/asim_mitra/archive/2009/12/11/snapshot-provider-considerations-while-backing-up-a-csv-cluster.aspx

 

If you have any issues, comments or feedback, you can report them to Microsoft’s Customer Service and Support (CSS), your Technical Account Manager (TAM) or discuss it in the DPM Newsgroup.

 

Thanks,

Symon Perriman
Program Manager II
Clustering & High-Availability
Microsoft

 

Posted by msclustm | 0 Comments

Dynamic Disks with Windows Server Failover Clustering

Hi,

Since Failover Clustering requires shared storage, we get a lot of questions about which storage types we support.  Windows Server Failover Clustering has a very flexible storage model that allows a wide variety of storage and volume management solutions from 3rd parties to integrate and extend the functionality of clustering.  One common question I commonly get asked is around Dynamic Disk support on Windows Server Failover Clusters, so I thought I would take a moment to address this. 

 

Are Dynamic Disks supported on Failover Clusters? 

Yes, they are, however support is not provided natively in-box in Windows for Failover Clusters.  It requires an add-on product from Symantec called Storage Foundation for Windows to enable support of Dynamic Disks on Windows Server Failover Clusters.  This is also true for the recently released Windows Server 2008 R2.  You can learn more about the Storage Foundation for Windows product here:
http://www.symantec.com/business/storage-foundation-for-windows

This KB article also discusses support for Dynamic Disks on Windows Server Failover Clusters: http://support.microsoft.com/kb/237853

 

Now let me ask you…why do you want to use Dynamic Disks? 

Dynamic Disks do provide a number of different features, so we like to understand why you use them.

I commonly hear two answers when I ask this question:

·         “I need to have large partitions”

·         “I need to be able to dynamically grow partitions”

 Well, did you know that you actually don’t need Dynamic Disks to accomplish those? 

Large Partitions

While Basic disks that use MBR partition table only support 2 TB partitions, GUID partition table (GPT) disks enable partitions that are greater than 2 TB and are fully supported on Failover Clusters, using Windows Server 2008 and 2008 R2.  If you happen to still be using Windows Server 2003, you can add support for GPT based disks with a post Service Pack 2 hotfix, available at:  http://support.microsoft.com/kb/919117.

 If you want to learn more about the advantages of GPT disks, here is a good FAQ:  http://www.microsoft.com/whdc/device/storage/GPT_FAQ.mspx . 

So, you can create large volumes with Basic disks, and there is no need for Dynamic Disks. 

 

Dynamically Growing Partitions

With Windows Server 2003, 2008 and 2008 R2 you can dynamically increase the size of a partition on a Failover Cluster.  In Windows Server 2003 this needed to be done via the command line with DiskPart, as described in this KB article: http://support.microsoft.com/kb/304736.

In Windows Server 2008 and 2008 R2, there is a simple right-click option in the Disk Management (DiskMgmt.msc) snap-in to “Extend Volume”.  Another new option in Windows Server 2008 R2 is that you can now not only extend a volume, but you can also “Shrink Volume”. 

So, you can dynamically grow or shrink volumes with Basic disks, no need for Dynamic Disks.

 

So, is there really a need for Dynamic Disks?

There are fewer reasons why you might need Dynamic Disks these days, since much of this functionality is now possible with Basic disks.  What are the reasons why you might actually need them?  That is fair to discuss as well and I commonly hear two answers:

·         “I want to use Software RAID”

·          “I need to be able to span a single volume over multiple LUNs”

 

Software RAID & Spanning Volumes

With Failover Clustering it requires external storage (Fibre Channel, iSCSI or SAS), so most customers choose to go with the Hardware RAID they already have in the storage array instead of using Software RAID.  With spanning volumes, that really is a matter of how you do your SAN management when increasing capacity.  Most storage arrays these days support dynamically expanding the size of a LUN.  As I said earlier, with Basic disks you can dynamically increase the size of that volume to match the new larger LUNs.  However some people prefer to concatenate LUNs and span a single volume over multiple LUNs, then they just create a new LUN and span the volume over that new LUN when they want to add capacity.  For IT departments that are segmented, I sometimes hear this is ‘easier’ for SAN admins to just create a new LUN, opposed to tracking down the right LUN and expanding it.  I have no right or wrong answer for you here, it’s a matter of how you manage your SAN’s.

  

I hope this helps in understanding that Dynamic Disks are supported with Windows Server Failover Clustering with the add-on product from Symantec Storage Foundations for Windows.  Microsoft has continually strived to build functionality into Windows that provides ease of use and convenience for server administrators.  This is the case with the ability to dynamically expand and shrink volumes and create large volumes using GPT formatted disks.  As mentioned, using Dynamic Disks do provide a great add-on product that extends the functionality of Failover Clustering and I hope the above information allows you to make informed choices on what is right for you and your customers.

 

Thanks,

Elden Christensen
Senior Program Manager Lead
Clustering & High-Availability
Microsoft

Copying VHDs onto a CSV Disk? Use the Coordinator Node!

Hi Cluster Experts,

 

At TechReady Europe 2009 and other events, we’re seeing an overwhelming interest in Windows Server Failover Clustering (WSFC) from customers, especially around our new Cluster Shared Volumes (CSV) innovation introduced in R2.

  

With Cluster Shared Volumes (CSV) one of the nodes in the cluster is responsible for synchronization of access to files on a shared volume.  This is the node which currently owns the cluster ‘Physical Disk’ resource associated with that LUN, which is referred to as the coordinator node.  Each LUN can have its own coordinator and all nodes are equal and could be a coordinator, so it could be any node.  When a VM is deployed and running on a CSV volume almost all of Hyper-V’s access to VHD files associated with a VM go directly to the disk and the coordinator node is not involved.  This enables VMs to have fast direct access and give great performance for the VM and the applications running within the VM.  So it really doesn’t matter which node is the coordinator or where VM’s are running.

 

One exception to that happens when you are copying VHD files to CSV volumes as you create and deploy your VMs.  As you copy/create the VHD files to a CSV volume, those writes to the disk are extending write operations, and, as a result, they are redirected over the network to the coordinator node.  This can result in it taking longer to copy the file.  The moral of the story is that when you are going to do a file copy to a CSV volume, to get the greatest performance it is best to do the copy on the coordinator node if you are doing a local copy.  If you are doing a remote copy over the network, it’s best to have the coordinator node be the target of the copy.  You can view the current owner or even move ownership of Physical Disk resources from one node to another with the Failover Cluster manager snap-in (CluAdmin.msc) or PowerShell.  So you can make whatever node you are on the coordinator.

 

 

Or, use PowerShell with the Get-ClusterSharedVolume CMDlet: 

 

 

To give an example, on my 2-node cluster, I tried making a copy of a VHD file (6.4 GB) from a local path to my CSV volume where I want the VHD file to live. What you will see is that depending on which node I initiated the file copy from, the time it took to copy the file varied.

 

Here is what I did: I made a copy of the same file onto a local folder on each cluster node, then I tried making a copy of that file to the same CSV volume (with different destination file names) on both cluster nodes.  But, I noticed that on one node, it took me just over 2.5 minutes (first screen shot), while on another node it took me just under 3.5 minutes (second screen shot).

 

 

Although both copy operations seem to be “local” (I’m copying from the same file from C:\My_VHDs to C:\ClusterStorage\Volume2 on both nodes), the copy was taking more time to complete on Node2 (3.5 minutes) compared to performing the same operation on Node1 (2.5 minutes) because Node1 was the coordinator node for the destination CSV volume.  On that node, the writes are all local writes because it is the coordinator itself.  From the other node, Node2, the writes are actually redirected over the network to Node1 (because they are extending writes to the file).

 

So, what does that really mean?

 

The bottom line here is that if you’re trying to make copies of files to CSV and you want to get this copy to complete in the fastest possible time, make sure you do the copy on the coordinator node.

 

If you want to learn more about CSV, there is a lot of material out there that you can refer to:

·         http://technet.microsoft.com/en-us/library/dd759255.aspx

·         http://technet.microsoft.com/en-us/library/dd630633(WS.10).aspx

·         http://blogs.msdn.com/clustering/archive/2009/03/02/9453288.aspx

·         http://blogs.msdn.com/clustering/archive/2009/02/19/9433146.aspx

 

 

Regards,

Ahmed Bisht

Senior Program Manager

Clustering & High-Availability

Microsoft

 

Installing Network Load Balancing (NLB) on Windows Server 2008 R2

Hi, 

 

This post will describe several ways to install Network Load Balancing (NLB) using the command line interface.  In Windows Server 2008 R2 installing Windows Features using scripts at the command line has changed a little since Windows Server 2008.  There are four ways to install the feature:

 

1.       Server Manger UI

2.       Command Prompt using dism.exe. 

3.       PowerShell  using CMDlets for ServerManager

4.       Unattended installs

 

As the Server Manager UI and unattended installs have not changed in Windows Server 2008 R2, this blog describes installing NLB using dism.exe and PowerShell, which are both new in Windows Server 2008 R2.

 

Command Prompt Using DISM

To install NLB on Windows Server 2008 R2 Full Installation

 

1.       Open  an elevated command prompt and run > dism /online /enable-feature /featurename:NetworkLoadBalancingFullServer

 

2.       If you want to install the management tools only run > dism /online /enable-feature /featurename:NetworkLoadBalancingManagementClient

 

Install NLB on Windows Server 2008 R2 Core Installation

 

1.       Open  an elevated command prompt and run > dism /online /Enable-feature /featurename:NetworkLoadBalancingHeadlessServer   

 

Note that you may need to launch a new command prompt with administrative privileges using the RunAs command.

 

PowerShell

 

By default, Server Core does not have management tools such as Nlbmgr.exe or PowerShell CMDlets installed, thus you may want to manage NLB from a remote server.  If you want to manage NLB on Server Core locally you must install PowerShell which will include cmdlets for NLB.  To install PowerShell in Server Core run the following commands.

 

1.       From a command prompt, run > dism /online /Enable-feature /featurename:NetFx2-ServerCore  

 

Note that you may need to launch a new command prompt with administrative privileges using the RunAs command.

 

2.       Then run > dism /online /Enable-feature /featurename:MicrosoftWindowsPowershell

 

3.       If NLB is not installed, install NLB as described in the section above, ‘Install NLB on Windows Server 2008 R2 Core Installation’

 

 

Thanks,

Gary Jackman

Software Test Engineer

Clustering & High-Availability

Microsoft

PowerShell for NLB: Common Scenarios

Hi,

 

This is the second blog is our series of posts on PowerShell for Network Load Balancing (NLB).  The first post introduces you to the CMDlets: http://blogs.msdn.com/clustering/archive/2009/10/28/9913877.aspx

 

Most of NLB CMDlets have the following common parameters.

 

 -InterfaceName

Specifies the interface to which NLB is bound

 -NodeName

Specifies the name of the cluster node that you want to manage

 

Most CMDlets require reference to a Cluster object.   To get a Cluster object you can run Get-NLBCluster and pass the output object to the desired CMDlet or use the -interfaceName parameter. 

 

We will discuss running CMDlets and using the output as input of another CMDlet in future posts.

 

Creating a New Cluster

New-NLBCluster

A new cluster can be created via NLB using New-NLBCluster CMDlet. This is a synchronous command, meaning that it will only return after completing the operation.  You can also use this CMDlet to create a new cluster on remote nodes.  To achieve this, the managing system must have Windows Server 2008 R2 installed and the cluster node must be Windows Server 2008 or higher.

 

New-NLBCluster has the following parameters of interest.

 

 -InterfaceName

Specifies the interface to which NLB is bound

 -ClusterPrimaryIP

The clusters primary IP address. More IP addresses can be added via Add-NLBClusterVIP

 -HostName

We can create a cluster on a remote machine by passing the machine name here

 -ClusterName

Specifies the name of the new cluster (optional)

 -DedicatedIP

This will add a dedicated IP address to the stack that can be used to reach this machine directly

 -OperationMode

The cluster operation mode can be one of the following: unicast, multicast, igmpmulticast

 

 

Example

 

 

 

Adding Nodes to a Cluster

Add-NLBClusterNode

Once a cluster has been created, we may want to add more nodes to the cluster. This can be achieved via the Add-NLBClusterNode CMDlet.

Parameters of interest:

 -InterfaceName

Specifies the interface to which NLB is bound

 -HostName

We can create a cluster on a remote machine by passing the machine name here

 -NewNodeName

The name of the new node that needs to be added to the cluster

 -NewNodeInterface

Interface on which we want to bind NLB on the new node

 

Example

 

 

Managing Port Rules

Set-NLBClusterPortRule

After creating a new NLB cluster you may want to modify the port rules before adding any nodes.  To do so you will want to use the Set-NLBClusterPortRule CMDlet.

 

Set-NLBClusterPortRule will modify existing port rules.  For example, when creating a new cluster, the default port rule is added.  If you want to customize the port rule you can either delete the existing port rule or modify the existing port rule.  Modifying the existing port rule is the best approach because you run only one command rather than two commands.

 

Set-NLBClusterPortRule has the following parameters that I believe are the most useful.   As always, for detailed help on this please run Get-Help Set-NLBClusterPortRule.

 

 -NewStartPort

Specifies the new start port for the cluster port rule. The acceptable range is between 0 and 65535

 -NewEndPort

Specifies the new end port for the cluster port rule. The acceptable range is between 0 and 65535

 -NewAffinity

Specifies the new affinity for the cluster port rule. There are three possible values for port rule affinity: none, single, and network

 -NewIP

Specifies the new IP address for the cluster port rule

 -NewTimeout

Specifies the new timeout in minutes for the cluster port rule. The acceptable range is between 0 and 240

 -InterfaceName

Specifies the interface to which NLB is bound

 -Port

Specifies a port number within the port rule to set

 

Example

This shows how to change the port rule:

 

 

 

The previous example assumes that only one port rule exists prior to modifying the port rule.  If multiple port rules exist prior to running the command and you wanted to modify the StartPort or EndPort,  you will get an error because the port ranges (as specified by the start port and end ports) overlap.

 

Example

If you want to modify the port range, you should use the -port parameter:

 

 

 

You may have noticed that the example shows changing affinity instead of the port range.   I did this to set up for the next example where I change the affinity to single affinity on both port rules. 

 

 

 


Managing Cluster Nodes

Set-NLBClusterNode

To manage NLB node properties such as host priority, initial host state or persisted suspend state, you need to use Set-NLBClusterNode.

 

 -HostPriority

Specifies the host priority or host ID for the cluster node. The value should be between 1 and 32

 -InitialHostState

Specifies the initial host state for the cluster node. The value is either started, stopped, or suspended

 

 By default Set-NLBClusterNode manages only one node at a time.  For example, when running a command from one of the nodes the local host is the node that is managed.

 

 

  

If you want to run a command that executes on all nodes you can first run the Get-NLBClusterNode and redirect the output to Set-NLBClusterNode.

 

 

  

To view all node properties you can run the following Get-NLBClusterNode and pipe the output through Format-List CMDlet.

                                                          

  


 

Controlling Cluster Nodes

 Start-NLBClusterNode & Stop-NLBClusterNode

To control the state (such as stop or start) of the cluster or a node there is a CMDlet for the respective action or "verb" and the respective object.  For example to stop a cluster you could run Stop-NLBClusterNode while Start-NLBClusterNode CMDlet will start the specific cluster node.

 

The CMDlet I want to discuss here is the Stop-NLBClusterNode command, specifically the parameter, -Timeout.  This new parameter lets you control the time you want to wait before forcing the Stop operation on the node. Now you don’t have to wait for Drain to complete, before doing a stop. You can simply run this command with a timeout value, like in the example below.

 

In creating the CMDlets we combined stop and drainstop in to one CMDlet, Stop-NLBCluster and Stop-NLBClusterNode.

 

 -Drain

Drains existing traffic before stopping the cluster node

 -Timeout

Specifies the number of minutes to wait for the drain operation before stopping the cluster node

 

Example

This example will do the following:

1.       Drain all the connections on the Cluster

2.       If there are no outstanding connections, stop the cluster immediately

3.       If all connections are not drained in less than 10 minutes, force stop the node, breaking all existing connections to that particular node.

 

 

 

 

Debugging NLB with PowerShell

Get-NLBClusterDriverInfo

The NLB team has added an awesome CMDlet, Get-NLBClusterDriverInfo, this CMDlet is a replacement for the nlb.exe binary that you may have used. This is a loaded CMDlet with lots of options. Note, this CMDlet does not provide any remoting capabilities, so it does not take hostname as input parameter.

 

1.       Getting the Cluster configuration: When this CMDlet is run without any arguments, it returns the basic cluster configuration on the current machine.

 

 

 

2.       We can determine if a given connection will be handled by the current node using the -filter argument.  This argument requires the following additional arguments to be set:

 -ClientIP

IP Address of the client in question

 -ClientPort

If known, the client source port. This can be set to 0, if unknown

 -ServerPort

The destination port of the server. Example, http could be on 80

 -ServerIP

The server's IPAddress. For incoming connections, this means the VIP

 

In the following example, we are checking to see if a TCP connection coming from client: 1.1.1.1 will be accepted by the NLB server on Port 80, whose VIP is 1.1.1.2

 

 

 

Stay tuned for more NLB PowerShell information!

 

 

Thanks,

Rohan Mutagi & Gary Jackman
Clustering & High-Availability Test Team
Microsoft

PowerShell Help Online & Management Pack Updates for Failover Clustering & NLB

Hi Cluster Fans,

 

We have added Windows Server 2008 R2 PowerShell help on TechNet and have updated our Management Packs to support 2008 R2 for System Center Operation Manager (SCOM), for both Failover Clustering and Network Load Balancing.

 

PowerShell Help Online 

With PowerShell help online you can see the same information as the inbox Get-Help CMDlet in an easier to browse format, with more examples and information added over time.  The website can also be launched in your default web browser directly from PowerShell, assuming your machine has a web browser.  This is done by adding -Online to the Get-Help CMDlet.

 

Failover Clustering

The main site for PowerShell for Failover Clustering is at: http://technet.microsoft.com/en-us/library/ee461009.aspx

 

Here’s an example for Failover Clustering:  PS > Get-Help Test-Cluster -Online

 

Network Load Balancing

The main site for PowerShell for Network Load Balancing is at: http://technet.microsoft.com/en-us/library/ee817138.aspx

 

Here’s an example for Failover Clustering:  PS > Get-Help New-NLBCluster -Online

 

SCOM Management Pack Updates

We’ve updated our Management Packs for System Center Operation Manager (SCOM) for both Failover Clustering & NLB to add news features and support Windows Server 2008 R2. 

 

Failover Clustering

The SCOM Failover Clustering Management Pack provides both proactive and reactive monitoring of your Windows Server 2003, Windows Server 2008 and Windows Server 2008 R2 cluster deployments. It monitors Cluster services components—such as nodes, networks, and resource groups—to report issues that can cause downtime or poor performance.

 

The main site for the Management Pack for Failover Clustering is at: http://www.microsoft.com/downloads/details.aspx?FamilyId=AC7F42F5-33E9-453D-A923-171C8E1E8E55

 

Some of the improvements include:

·         Support for discovery and monitoring of Windows Server 2008 R2 clusters and functionality such as Cluster Shared Volumes

·         MP scalability improvements (the MP supports monitoring of 300 resource groups per cluster)

·         Noise reduction, for example clustered resources are no longer discovered and monitored by default (resource groups are monitored by default)

·         Configuration or hardware issues that interfere with starting the Cluster service

·         Alerts about connectivity problems that affect communication between cluster nodes or between a node and a domain controller

·         Active Directory Domain Services (AD DS) settings that affect the cluster; for example, permissions needed by the computer account that is used by the cluster

·         Configuration issues with the network infrastructure needed by the cluster; for example, issues with Domain Name System (DNS)

·         Issues with the availability of a cluster resource, such as a clustered file share

·         Issues with the cluster storage

 

Network Load Balancing

The SCOM Network Load Balancing (NLB) Management Pack provides discoveries, monitors, alerts, and warnings to help the operator understand the state of NLB clusters and NLB servers running Windows Server 2008 and Windows Server 2008 R2. The Management Pack can provide early warnings that an operator can use to proactively monitor the state of the NLB servers in the computing environment.

 

The main site for the Management Pack for Network Load Balancing is at: http://www.microsoft.com/downloads/details.aspx?FamilyID=dc17e093-bdd7-4cb3-9981-853776ed90be

 

Some of the improvements include:

·         Support for discovery and monitoring of Windows Server 2008 R2 NLB clusters

·         Monitor the NLB Node status

·         Based on the status of individual cluster nodes, determine the overall state of the cluster.

·         Where an integration management pack exists, determine the health state of a cluster node by looking at the health state of the load balanced application, such as IIS

·         Alert on errors and warnings that are reported by the NLB driver, such as an incorrectly configured NLB cluster

·         Ability to the node out of the NLB cluster if the underlying load-balanced application becomes unhealthy, and add the node back to the cluster when the application becomes healthy again

·         Noise reduction on some alerts

 

Enjoy these improvements to your clustering experience!

 

Thanks,

Symon Perriman

Program Manager II

Clustering & High-Availability

Microsoft

Failover Clustering Performance Counters – Part 4 – Command Line

Hi Cluster Fans,

 

Most of you are familiar with the Performance Monitor that allows you to work with performance counters.  Details have been described in the three previous posts in the series: Part 1, Part 2 and Part 3.

 

Using the command line, there are several functions which you may find useful.  For information about how do this using PowerShell for Failover Clustering, please visit this earlier blog post: http://blogs.msdn.com/clustering/archive/2009/07/22/9844473.aspx.

 

Typeperf.exe allows you to enumerate, monitor and collect performance counters from the command line. The command bellow show how to enumerate all clustering performance counters

 

>typeperf.exe -q | findstr Cluster

 

\Cluster Shared Volumes(*)\Metadata IO Delta

\Cluster Shared Volumes(*)\Metadata IO

\Cluster Shared Volumes(*)\Redirected Read Bytes Delta

\Cluster Shared Volumes(*)\Redirected Read Bytes

\Cluster Shared Volumes(*)\Redirected Reads Delta

\Cluster Shared Volumes(*)\Redirected Reads

\Cluster Shared Volumes(*)\Redirected Write Bytes Delta

\Cluster Shared Volumes(*)\Redirected Write Bytes

\Cluster Shared Volumes(*)\Redirected Writes Delta

\Cluster Shared Volumes(*)\Redirected Writes

\Cluster Shared Volumes(*)\IO Read Bytes Delta

\Cluster Shared Volumes(*)\IO Read Bytes

\Cluster Shared Volumes(*)\IO Reads Delta

\Cluster Shared Volumes(*)\IO Reads

\Cluster Shared Volumes(*)\IO Write Bytes Delta

\Cluster Shared Volumes(*)\IO Write Bytes

\Cluster Shared Volumes(*)\IO Writes Delta

\Cluster Shared Volumes(*)\IO Writes

\Cluster Resource Control Manager\Groups Online

\Cluster Resource Control Manager\RHS Restarts

\Cluster Resource Control Manager\RHS Processes

\Cluster Global Update Manager Messages\Update Messages Delta

\Cluster Global Update Manager Messages\Update Messages

\Cluster Global Update Manager Messages\Database Update Messages Delta

\Cluster Global Update Manager Messages\Database Update Messages

\Cluster API Calls\Batch API Calls Delta

\Cluster API Calls\Network Interface API Calls Delta

\Cluster API Calls\Network API Calls Delta

\Cluster API Calls\Cluster API Calls Delta

\Cluster API Calls\Key API Calls Delta

\Cluster API Calls\Resource API Calls Delta

\Cluster API Calls\Group API Calls Delta

\Cluster API Calls\Node API Calls Delta

\Cluster API Calls\Notification API Calls Delta

\Cluster Checkpoint Manager\Crypto Checkpoints Restored Delta

\Cluster Checkpoint Manager\Crypto Checkpoints Restored

\Cluster Checkpoint Manager\Crypto Checkpoints Saved Delta

\Cluster Checkpoint Manager\Crypto Checkpoints Saved

\Cluster Checkpoint Manager\Registry Checkpoints Restored Delta

\Cluster Checkpoint Manager\Registry Checkpoints Restored

\Cluster Checkpoint Manager\Registry Checkpoints Saved Delta

\Cluster Checkpoint Manager\Registry Checkpoints Saved

\Cluster Network Messages(*)\Bytes Received Delta

\Cluster Network Messages(*)\Bytes Received

\Cluster Network Messages(*)\Bytes Sent Delta

\Cluster Network Messages(*)\Bytes Sent

\Cluster Network Messages(*)\Messages Received Delta

\Cluster Network Messages(*)\Messages Received

\Cluster Network Messages(*)\Messages Sent Delta

\Cluster Network Messages(*)\Messages Sent

\Cluster Network Reconnections(*)\Reconnect Count

\Cluster Network Reconnections(*)\Unacknowledged Message Queue Length Delta

\Cluster Network Reconnections(*)\Unacknowledged Message Queue Length

\Cluster Network Reconnections(*)\Normal Message Queue Length Delta

\Cluster Network Reconnections(*)\Normal Message Queue Length

\Cluster Network Reconnections(*)\Urgent Message Queue Length Delta

\Cluster Network Reconnections(*)\Urgent Message Queue Length

\Cluster Database\Flushes Delta

\Cluster Database\Flushes

\Cluster API Handles\Batch Handles Delta

\Cluster API Handles\Batch Handles

\Cluster API Handles\Network Interface Handles Delta

\Cluster API Handles\Network Interface Handles

\Cluster API Handles\Network Handles Delta

\Cluster API Handles\Network Handles

\Cluster API Handles\Cluster Handles Delta

\Cluster API Handles\Cluster Handles

\Cluster API Handles\Key Handles Delta

\Cluster API Handles\Key Handles

\Cluster API Handles\Resource Handles Delta

\Cluster API Handles\Resource Handles

\Cluster API Handles\Group Handles Delta

\Cluster API Handles\Group Handles

\Cluster API Handles\Node Handles Delta

\Cluster API Handles\Node Handles

\Cluster API Handles\Notification Handles Delta

\Cluster API Handles\Notification Handles

\Cluster Multicast Request-Response Messages\Messages Outstanding

\Cluster Multicast Request-Response Messages\Messages Sent Delta

\Cluster Multicast Request-Response Messages\Messages Sent

\Cluster Resources(*)\Resource Type Controls Delta

\Cluster Resources(*)\Resource Type Controls

\Cluster Resources(*)\Resource Controls Delta

\Cluster Resources(*)\Resource Controls

\Cluster Resources(*)\Resource Failure Deadlock

\Cluster Resources(*)\Resource Failure Access Violation

\Cluster Resources(*)\Resource Failure

\Cluster Resources(*)\Resources Online

 

The next command shows how you can use typeperf to watch counters from the command line.

>typeperf.exe "\Cluster API Calls\Resource API Calls Delta"

 

"(PDH-CSV 4.0)","\\VPCLUS01\Cluster API Calls\Resource API Calls Delta"

"06/03/2009 17:34:43.647","94.000000"

"06/03/2009 17:34:44.661","89.000000"

"06/03/2009 17:34:45.675","104.000000"

"06/03/2009 17:34:46.689","85.000000"

"06/03/2009 17:34:47.703","91.000000"

"06/03/2009 17:34:48.717","103.000000"

"06/03/2009 17:34:49.731","85.000000"

"06/03/2009 17:34:50.745","92.000000"

"06/03/2009 17:34:51.759","92.000000"

"06/03/2009 17:34:52.773","101.000000"

"06/03/2009 17:34:53.787","87.000000"

"06/03/2009 17:34:54.801","84.000000"

"06/03/2009 17:34:55.815","87.000000"

"06/03/2009 17:34:56.829","101.000000"

"06/03/2009 17:34:57.843","84.000000"

"06/03/2009 17:34:58.857","88.000000"

"06/03/2009 17:34:59.871","96.000000"

"06/03/2009 17:35:00.885","93.000000"

"06/03/2009 17:35:01.898","87.000000"

"06/03/2009 17:35:02.912","88.000000"

"06/03/2009 17:35:03.926","96.000000"

"06/03/2009 17:35:04.940","89.000000"

 

If you run “typeperf /?” you will find many other useful features you might like.

 

The other tool I’ve found handy is logman.exe. This is a very powerful utility and one of its features allows you to manipulate Data Collector Sets.  You have probably already seen this feature in the Performance Monitor, but in perfmon you have to do everything manually.  Logman allows you to create/start/stop/delete the sets from the command line.  You can learn from the help a variety of things you can do with this tool, so instead of repeating the help here I want to share with you a batch file that will help you to manipulate the Data Collector Sets for the Failover Cluster counters. Enjoy it!

 

The script is available here: http://blogs.msdn.com/clustering/pages/9915526.aspx.  Note that this script is a sample only and you should test this before using it in a production environment.  This is not an officially supported script from Microsoft, so please use this at your own risk.

 

The Performance Monitor team has a post about the logman that can be used to manipulate perfmon logging sessions: http://blogs.technet.com/askperf/archive/2008/05/13/two-minute-drill-logman-exe.aspx.

 

To find more information about this script, you can run it with the /? parameter and see the following output:

> ClustPerf.cmd /?

clusperf.cmd [-c] [-stop] [-start] [-d] [-show] [-q] [-n {session name}] [-o {path to the log files}] [-f {counters filter}] [{list of nodes}]

 

-c           - cerate new session.

               aliases: create.

-start       - starts session.

-stop        - stops session.

-d           - deletes session.

               aliases: delete.

-show        - prints session details.

               aliases: q, query.

-q           - same as show.

               aliases: show, query.

-n {name}    - Session name.

               aliases: name.

               default: WSFCCluster.

-o {path}    - path to the trace files.

               aliases: out.

               default: c:\WSFCCluster.

-f {str}     - filter for the counters we want to monitor.

               aliases: flt, filter.

               default: Cluster.

 

You can specify multiple commands and they will be executed in the following order

   1.stop

   2.dlete

   3.create

   4.start

   5.show

 

Note that if you specify create then it already implies stop and delete.

Note that if you specify delete then it already implies stop.

 

You can provide list of nodes using clusnodes or myclusnodes environment variables

 

 

Thanks,
Vladimir Petter
Senior Software Development Engineer
Clustering & High-Availability
Microsoft

Microsoft Site Recovery Solutions Featuring Windows Server 2008 R2 Failover Clustering

Hi cluster fans,

 

This week Microsoft is launching an end-to-end solution to help customers deploy integrated Site Recovery Solutions (http://blogs.technet.com/virtplanet/).  This launch shows how Microsoft and partners like  HP, EMC, HDS, NetApp, DataCore, DoubleTake and SteelEye are leveraging mature and proven technologies like clustering in combination with new virtualization technologies like Hyper-V to provide rich, high value solutions. 

 

Main article: http://blogs.technet.com/virtplanet/

 

[11/17 Update] The webcast is now available for viewing at http://searchwindowsserver.bitpipe.com/detail/RES/1256150149_996.html?asrc=CL_PRM_Microsoft.

 

For more information, please check the following links:

·         Cluster Resource Dlls: http://msdn.microsoft.com/en-us/library/aa372239(VS.85).aspx

·         Cluster Resources & how to write them: http://msdn.microsoft.com/en-us/library/aa372152(VS.85).aspx

·         Generic Application Installation: http://technet.microsoft.com/en-us/library/cc782179(WS.10).aspx & http://blogs.msdn.com/clustering/archive/2009/04/10/9542115.aspx

·         Generic Script Installation: http://technet.microsoft.com/en-us/library/cc736970(WS.10).aspx & http://blogs.msdn.com/clustering/archive/2009/09/28/9900574.aspx

·         Generic Service Installation: http://technet.microsoft.com/en-us/library/cc758806(WS.10).aspx & http://blogs.msdn.com/clustering/archive/2009/06/09/9712609.aspx

 

Thanks,

Jim Schwartz
Solutions Marketing Director

Virtualization

Microsoft

PowerShell for NLB: Part 1: Getting Started

Hi NLB Fans,

 

NLB provides users with various methods to manage clusters.  In Windows Server 2008, there are 3 ways to manage an NLB cluster:

 

1.       Network Load balancing Manager GUI (nlbmgr.exe)

2.       NLB command line tool (Nlb.exe)

3.       NLB WMI Provider (root\MicrosoftNLB namespace)

 

In Windows Server 2008 R2, the NLB team has introduced a PowerShell interface for configuring, managing and debugging NLB.  This awesome new feature makes it very easy to administer systems in an automated way.

 

In this blog post we will explore NLB's support for PowerShell.  We will elaborate on the original post PowerShell for NLB, providing more details on naming mechanism, samples and CMDlet discovery.

 

This blog post contains the following sections:

 

·         PowerShell Naming convention

·         Exploring NLB CMDlets

o   Using Get-Command

o   Using command Auto-completion

o   Using Argument auto completion

o   Getting examples to use

 

Future blog posts in this series will discuss:

·         NLB common scenarios

·         Basics of Debugging NLB with PowerShell

 

NLB PowerShell follows the PowerShell CMDlet guidelines in naming and execution of the NLB CMDlets. Here we will explore the general naming conventions that will make it easy to further understand and explore NLB CMDlets.

 

PowerShell Naming Convention

A CMDlet is made up of two parts a Noun and a Verb. These two parts of speech are combined together with a hyphen in between. A NLB example would be:

 

PS > Get-NlbCluster

  

The ‘Get’ example above is split into 2 parts, the verb (Get) and the noun (NlbCluster), and these 2 words are separated by a hyphen.  As rule of thumb, the verb defines the action to be performed on the noun.  In the above example, we want to "Get" all instances of "NlbCluster".

 

To view all the NLB CMDlets, run PS > Get-Command –module NetworkLoadBalancingClusters

 

 

 

A list of all the NLB supported verbs can be seen below:

 

 

 

A list of all the NLB supported nouns can be seen below:

 

 

 

Exploring NLB CMDlets

PowerShell makes it quite easy to use CMDlets, even if you have no prior knowledge of the NLB CMDlets.  PowerShell provides two main features that help with exploring/learning CMDlets.

 

Get-Command

You can use Get-Command to explore existing CMDlets that are available. This CMDlet, in conjunction with the knowledge of Verb-Noun pairing is a powerful way to getting to the CMDlet of interest.

 

Quick Syntax for this command

> Get-command -module NetworkLoadBalancingClusters [-Noun | -Verb <String>]

> Get-command <CommandFilter> -commandtype <commandtype>

 

Example usage

Let say we want to delete a node from the current cluster.  We know our end goal, but don’t know how to achieve it via PowerShell.  Using the above syntax we can try to reach our goal.  So the action we want to perform is "delete", and the noun that we want to act on is "NLB Cluster Node".

 

First we try to find all commands that start with "delete" verb and are of type CMDlet, by running > Get-Command delete-* -commandType cmdlet, but do not find any results.

 

Instead of "delete" let’s try “Remove".  Below we see that we found the CMDlet we are looking for.

 

 

 

We could have approached this in a different way.  We could have searched for the noun "Node" and filtered further on the exact verb.

 

 

 

As we can see with the above examples, we can intuitively guess the Verb-Noun pair for the NLB operation we want to perform, and use the Get-Command CMDlet to get the exact CMDlet.

 

The list below shows the usage of Get-Command to list out all the supported NLB CMDlets:

 

 

 

Command Auto-Completion

Another way to find out what CMDlets exist is to use the command auto-completion key <TAB>.

PowerShell provides a feature where the arguments of a CMDlet autocomplete.  

 

Examples

1.       Open PowerShell window with the NLB modules loaded.

2.       Type Add-NLBCluster<Tab>

 

This will automatically complete the above CMDlet, and display "Add-NLBClusterNode" on the screen.

 

This is another handy way to see what all CMDlets are supported.  Another example would be:

Start-NLB<TAB> would display Start-NLBCluster

Hitting <TAB> again, would display Start-NLBClusterNode

 

Argument Auto-Completion

Now that we know how to find the CMDlet of interest, let’s see how we further use this information to formulate the exact command that we need to execute.  PowerShell supports automatic expansion of the command arguments.  Once you have typed in a CMDlet you can type a hyphen (-) and hit <TAB> key to automatically expand the available arguments for the given CMDlet.

 

Examples

1.       Open a PowerShell Window with the NLB Module loaded

2.       Enter > Get-NLBCluster-<TAB>

3.       You will see that the “HostName” parameter will be auto-completed

4.       Hit <TAB> again and you will see the text “InterfaceName” appear by the text prompt.

 

Using the <TAB> you can cycle through all the available arguments that the give CMDlet supports.  If you went past an argument while hitting <TAB>, you can go back to it using the <SHIFT+TAB> key sequence.

 

This auto-completion can be further “filtered” by typing the first few characters of the argument you are interested in.  For example, if I want to look for a parameter “InterfaceName”, you can try the following:

1.       Open a PowerShell Window with the NLB Module loaded

2.       Type “Get-NLBCluster -i“ <TAB>

3.       This will directly show you all the CMDlets that begin with the letter “I”, in this case “InterfaceName”

 

 

Get-Help

 As you may know from the ‘Help Documentation’ section of the earlier NLB blog post that the Get-Help CMDlet is incredibly powerful.

 

The final thing that I would like to bring up in this section is the use of the –example argument for the help.  As the name suggests, you can quickly see the examples of a given CMDlet via the “-example” argument.

 

Example

 

 

 

Another awesome support option is the “-Online” option. This will launch the web browser with online content that is up-to-date with the latest information regarding the CMDlet (of course, this may not work if you are using a Server Core installation which cannot access Internet Explorer).

Example:

> Get-Help –Online New-NlbCluster


 

Rohan Mutagi & Gary Jackman
Clustering & High-Availability Test Team
Microsoft

Failover Clustering Performance Counters – Part 3 – Examples

Hi cluster fans,

This third post in our series about Failover Clustering Performance Counters will give some practical examples of how to use this new Windows Server 2008 R2 feature to help troubleshoot your cluster.

In Part 1 of this blog series we discussed Performance Counters and their interaction with the Network, Multicast Request Reply, Global Update Manager and Database clustering components.  In Part 2 we looked at monitoring some additional cluster components:  the Checkpoint Manager, Resource Control Manager, Resource Types, APIs and Cluster Shared Volumes.  Stay tuned for the fourth part of this series where we will discuss implementing Performance Counters using PowerShell.

Example 1: Cluster Handle Leaks

 

On my cluster I’ve observed that the Resource handles keep going up, which is an indication of a potential handle leak. I am sure that no external clients can be connected to the node, so it must be something running on the node. When I look at the ClusterAPI Performance Monitor I observe 500 resource handles:

 

 

All calls to the cluster API are coming from the ClusterAPI and in Windows Server 2008 R2 we have made tracing in this component available to all customers. To enable this tracing you need to run Event Viewer (or eventvwr from the command line).  In the Event Viewer go to the View menu and check “Show Analytics and Debug Logs”.

  

 

On the tree navigate to the “Applications and Services Log\Microsoft\FailoverClustering-Client\Diagnostic”. Right click on the Diagnostic and click on Enable Log. You may see a notification reminding you when the data will be collected, which can be ignored.

 

 

Now wait for the handle to get increased again, right click on the Diagnostic and select Disable Log.  Now on the right pane you can see list of the collected traces.  Select an event, switch to the Details view and find information on what process this event came from.

 

 

Now since we know the Process ID we can go to the Task Manager and find this process.  In this case it happens to be PowerShell.exe, which was running a script enumerating the resources from time to time.

 

If you are familiar with PowerShell you probably know that it is .NET based, and if you are familiar with .NET you probably know that it uses garbage collection to lazily collect freed memory.  In this case, PowerShell was just taking a while to kick off garbage collection, but once garbage collection begins, all the opened handles no longer used by the process are collected and number of handles on the server decreases.

At the end it was not a handle leak, but hopefully it gives you some ideas on how you can approach this class of issues.

Thing will become harder if the client is running remotely.  In this case you might first use NetMon (http://search.microsoft.com/Results.aspx?qsc0=0&q=netmon&mkt=en-US&FORM=QBME1&l=1) or Process Monitor (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to find where traffic to this node is coming from and then examine the clients to see if they run any apps that might cause the cluster handle leak.

Example 2: Overload of Cluster API Calls

 

 

On this particular 4 node cluster with 800 resource groups we have observed CPU utilization caused by the Cluster Service (clussvc.exe) at 95%.  The puzzling part was that all the resources were offline and there was no known clients connected to this cluster, so we’re going to use performance counters to see if we can find out what component is causing the large CPU consumption.

Looking at the Cluster API calls we have observed that the cluster is getting hit with about 130 Resource API calls per second and about 70 Group API calls per second.

Looking at the Cluster Multicast Request Response (MRR) Messages we have observed that Messages Send Delta is around 90 MRR messages per second.

Examining the Cluster Global Update Manager Messages showed that there are no GUM updates going on, so most likely all the activity is coming from the API calls that are hitting resources not hosted on this node so the node forwards the request to the owning node using MRR.

Looking at the Cluster Network Messages confirms that there is lots of traffic passing between the nodes (see Bytes Received Delta and Bytes Sent Delta on the picture above).

This leaves two unanswered questions: What calls are being made, and who is the caller?

We will now run a Process Monitor.  In the Process monitor we put a filter to show only events for the clussvc.exe registry and networking.  In a minute we will stop collecting traces and look at the Network Summary and Registry Summary.  The Network Summary shows us that there is no traffic besides the traffic between the nodes, so it has to be an application running on one of the nodes.  Registry Summary demonstrates that something repeatedly opens group keys in the cluster database.  So it looks like the caller is trying to enumerate all groups on the cluster.

We have started Event Viewer and collected ClusAPI logs as described in the previous example. This immediately pointed us to a process that was making most of API calls.  By stopping this process we have confirmed that CPU consumption went down.

 

I hope you find this information helpful in troubleshooting issues using performance counters on your Windows Server 2008 R2 Failover Cluster.

 

Thanks,
Vladimir Petter
Senior Software Development Engineer
Clustering & High-Availability
Microsoft

More Posts Next page »
 
Page view tracker