Failover Clustering and Network Load Balancing Team Blog
I am Rohan Mutagi. My job at Microsoft is to do something that everyone likes: criticize others J, specifically, other people’s code. Yes, I am a tester and it’s my role to find bugs in Network Load Balancing (NLB). Over the next few months I will be blogging more about changes that NLB went through in Windows Server 2008 R2. In this blog, I will focus NLB Extended Affinity (TCP).
To understand how NLB does load balancing, please refer to this TechNet article about various forms of affinity and their impact on load balancing decisions.
Extended Affinity is an extension to the Single and Network affinity. NLB does not rely on any network protocol’s state to make its load balancing decisions. As a result, NLB will work with a wide variety of protocols, documented and undocumented, stateless (HTTP/UDP etc.) and stateful (RDP/SSL etc). This makes NLB more flexible in deployment and easier to manage since we don’t have to configure the load balancer to work with every protocol that it needs to handle. However, some applications would benefit from being able to explicitly associate a connection to a server.
An example would be using IIS by online retailer using shopping carts. When a customer shops at their store, they save the intended purchases in a shopping cart which is stored on one of the nodes in the cluster. To keep the products in the shopping cart, the customer must stay connected to that same node. However configuration changes to the cluster (such as adding a new VIP or node) which cause cluster convergence may then directed customers to another cluster node, and they have lost the purchases saved in that shopping cart. Now the customer may become frustrated and the retailer may lose money.
Another instance could be with SSL where the SSL session can consist of multiple TCP Connections. In normal operations, if single affinity is used, NLB will guarantee that all connections coming from the same source IP will hit the same server. This would include multiple TCP connections of the same SSL session. However, configuration changes might cause NLB to accept different connections of the same SSL session by different servers during the convergence. As a result, the SSL session is broken.
With Extended Affinity, NLB provides the ability to associate a client connection to a NLB server over re-convergence. This association holds true until the timeout specified by admin for the given port rule expires without any new traffic on the same connection.
1. We have a 2 node NLB cluster. (VIP: 220.127.116.11)
2. Web browser Client (18.104.22.168) connects via SSL to NLB VIP (22.214.171.124).
3. That particular connection is handled by IIS Server on NLB NODE1.
4. Client Requests a web page that involves filling a web form.
5. Client spends 20 minutes filling this form that would, once submitted, need to be stored on NODE1.
6. In the meantime, on the server, admin adds a new node (NODE3) to the NLB Cluster.
7. Now the connection (126.96.36.199 -> 188.8.131.52) is owned by NODE3
8. The client submits his web form.
9. Since the ownership of the connection (184.108.40.206 -> 220.127.116.11) has moved to NODE3, The server rejects the packet from the client.
10. The browser tries to re-establish the SSL connection and this time hits a new server
11. The new server will reject the “form data” that the browser provides since there is no authentication for this client on this node (NODE3). Thus the data that client filled-in is lost.
9. The server notices that stickiness in enabled for that particular connection (18.104.22.168 -> 22.214.171.124) and will route the connection to the correct owner of the connection (NODE1) despite the configuration change that caused the connection ownership to move to the new node (NODE3).
10. The browser successfully communicates with the server and the transaction completes.
The following sections detail how to use Extended Affinity in your Windows Server 2008 R2 NLB Cluster.
Extended Affinity can be modified by following the below steps:
1. Right click on the cluster and Select "Cluster Properties"
2. In the Cluster properties dialog box, Click on the "Port Rules" tab.
3. Choose the appropriate port rule and click Edit:
4. Select the appropriate affinity and set the "Timeout" value to required value. Click OK.
5. Now you should see the new “Timeout” to be the set amount (10 mins).
Using Powershell, you can set the timeout for the default port rule using the CMDlet Set-NLBClusterPortRule. For more information about using PowerShell with NLB, visit: http://blogs.msdn.com/clustering/archive/2008/12/26/9253786.aspx.
The below CMDlet will display all the port rules that are configured on a cluster on the current machine. The “Timeout” shows the currently configured “Extended Affinity” timeout. If this value is set to 0, that would mean that Extended Affinity is currently not enabled for the given port rule. The below example shows that the timeout for all the port rules is set to 0. This means that Extended Affinity is not enabled on any of the 3 port rules.
Now let’s enable Extended Affinity for the 2nd port rule using PowerShell.
1. Get the required port role using the Get-NlbClusterPortRule command we used above, but this time lets add a filter to find a port rule that is configured on a port 443 and bound to cluster on network interface Test-4
2. Apply Extended Affinity to this port rule by using Set-NlbclusterPortRule to modify its timeout value.
Get-NlbClusterPortRule -Port <YourPortNumberHere> -InterfaceName <NetworkInterfaceName> | Set-NlbClusterPortRule –NewTimeout <NewTimeoutValueInMinutes>
That concludes the overview of the new Extended Affinity feature for NLB in Windows Server 2008 R2. Thanks for reading this blog post. If you have any questions feel free to contact is by clicking the ‘Email’ link on the upper right-corner of the page.
Rohan MutagiSoftware Development Engineer in TestClustering & High-AvailabilityMicrosoft
In the example you gave, what happens is NODE1 is taken out of the NLB cluster or is otherwise unavailable before the extended affinity timeout occurs? Will another node take over or will the other nodes say to themselves, "I'm not responsible for that IP address, so I will disregard the packets."? Said another way, does Extended Affinity screw up high availability if a failure occurs before the timeout?
Q: In the example you gave, what happens is NODE1 is taken out of the NLB cluster or is otherwise unavailable before the extended affinity timeout occurs?
A: The other node will take over and will start accepting connections. High availability is preferred. This might lead to some interesting scenarios in case when 2 nodes go into split brain, both take over all buckets and accept connections from the same client. Later on they start seeing each other, and attempt to converge. At that phase they might detect that both have state from a given client IP. To resolve that we employ a deterministic algorithm where on such conflicts one of the nodes will deterministically give up its ownership of that client, and another will keep handling that client.