Today I want to talk about my experience with one of the most strange setup errors installing SQL Server in a cluster. In this particular case I was helping the customer to install a new SQL Server 2008 R2 instance in a three-node cluster. The Windows cluster was already running with two instances, each one on a different cluster node and the customer wanted to install this third instance on the third cluster node. The rest of the instances on the other cluster nodes have been running for a while with no issues.

The customer was trying to install the third cluster node but the setup was consistently failing with the following error:

SQL Server Setup has encountered the following error: The IP Address ‘10.246.18.118’ is already in use. To continue, specify a different IP address. Error code 0x84B40000.

image

The initial question, “is there any other machine with that IP address on the network?”, was quickly answered: neither PING nor NSLOOKUP shown any other host owning that IP address. As usual with setup problems I looked into the setup log file for SQL Server. The “Summary.txt” file had the same error reported by the GUI:

Exception type: Microsoft.SqlServer.Chainer.Infrastructure.InputSettingValidationException
    Message:
        The IP Address '10.246.18.118' is already in use. To continue, specify a different IP address
.

The “Detail.txt” setup log file had more information. We were able to see that indeed the IP address 10.246.18.118 did not exist on the network during the initial setup phase:

2010-11-12 14:53:09 Slp: IP Addresses have been specified so no defaults will be generated.
2010-11-12 14:53:38 Slp: SendARP didn't return a MAC address for IP address '10.246.18.118'.  The message was 'The network name cannot be found.'.  This indicates the address is valid to create.

[…]

2010-11-12 14:59:33 Slp: SendARP didn't return a MAC address for IP address '10.246.18.118'.  The message was 'The network name cannot be found.'.  This indicates the address is valid to create.

… but all of a sudden, the ARP request succeeded in finding a valid host with that same IP address, causing the setup to halt:

2010-11-12 15:00:25 Slp: SendARP for IP Address '10.246.18.118' succeeded.  The found MAC address is '00:00:5e:00:01:65'.  The IP address is already in use.  Pick another IP address to continue.
2010-11-12 15:00:28 Slp: SendARP didn't return a MAC address for IP address '10.246.16.118'.  The message was 'The network name cannot be found.'.  This indicates the address is valid to create.
2010-11-12 15:00:28 Slp: Hosting object: Microsoft.SqlServer.Configuration.ClusterConfiguration.ClusterIPAddressPrivateConfigObject failed validation
2010-11-12 15:00:28 Slp: Validation for setting 'FAILOVERCLUSTERIPADDRESSES' failed. Error message: The IP Address '10.246.18.118' is already in use. To continue, specify a different IP address.
2010-11-12 15:00:28 Slp: Error: Action "Microsoft.SqlServer.Configuration.SetupExtension.ValidateFeatureSettingsAction" threw an exception during execution.
2010-11-12 15:00:28 Slp: Microsoft.SqlServer.Setup.Chainer.Workflow.ActionExecutionException: The IP Address '10.246.18.118' is already in use. To continue, specify a different IP address. ---> Microsoft.SqlServer.Chainer.Infrastructure.InputSettingValidationException: The IP Address '10.246.18.118' is already in use. To continue, specify a different IP address. ---> Microsoft.SqlServer.Chainer.Infrastructure.InputSettingValidationException: The IP Address '10.246.18.118' is already in use. To continue, specify a different IP address.
2010-11-12 15:00:28 Slp:    --- End of inner exception stack trace ---

[…]

2010-11-12 15:05:26 Slp: Error result: -2068578304
2010-11-12 15:05:26 Slp: Result facility code: 1204
2010-11-12 15:05:26 Slp: Result error code: 0

In SQL Server 2008 a cluster installation is divided into two main phases; the first one takes care of copying the instance files into the target machine and register all the components while the second one takes care of creating the clustered resources. In our case the setup was failing at the very end of the setup where the cluster resources are created. As you can see, the “Detail.txt” file was also pointing to the MAC address of the offending host.

As you may know, ARP is a network protocol that takes care of resolving IP addresses based on physical or MAC addresses. This information is stored in memory into the ARP table. The command prompt ARP -a instruction checks the information stored by Windows on the ARP cache table. After uninstalling one more time the components left by the failed setup, we did a quick ARP check based on what the setup was doing and found the following information:

image

The IP addresses 10.246.16.100 and 10.246.18.100 were the two virtual IP addresses that we were trying to configure on the cluster. I am far for being an expert in networking but having an IP address under the x.x.18.x/24 network in the x.x.16.x/24 interface was strange enough to make me think in a network resolution problem. The IP address and MAC address was in fact the same that was causing the setup to fail. We tried to PING and NSLOOKUP again the same IP address but nothing come back.

I double-checked with one of the SQL Server 2008 R2 clusters in my lab and found that every network address was correctly shown under its corresponding network interface:

clip_image002

Strange enough the “bogus” ARP entry was created as dynamic in the case of the customer’s cluster so we were expecting that to be removed from the ARP cache table several seconds, but this was not happening. I was not sure what was announcing the IP address via ARP on the network, I can only think in a problem with the NIC teaming but we did run out of time and were not able to test this hypothesis. After removing the offending ARP entry with ARP –d we run a new setup that finished successfully this time. If you have a similar experience with this error or have any idea of where that ARP entry could come from please, let me know.