“The domain group cannot be validated for the service SQL Server” SQL Server 2005 cluster SP3 setup error

Here again I am trying to shed some light on another SQL Server 2005 setup error that you may encounter during your Service Pack or Hotfix deployment.

As you may know, one of the differences between installing a SQL Server instance on a standalone server vs. a cluster server is the service accounts required to run the different SQL Server services. While we are free to use local Windows accounts to run a non-clustered SQL Server instance, we are required to use a Windows Domain account to run a clustered SQL Server instance. This is better explained in the following KB article (KB910708, You must specify the domain groups for the clustered SQL Server services when you install a SQL Server failover cluster):

When you install a stand-alone instance of SQL Server, the SQL Server Installation Wizard creates a set of local groups. The wizard also adds service accounts to the groups. When you install a SQL Server failover cluster, SQL Server requires domain accounts to start the services. The domain accounts must be added to a domain group.

So it is not only important to have domain accounts in place to run the different SQL Server services but also to have a Domain group where these accounts are placed. It is also highly recommended to do not use this group to store other application’s accounts.

In this case the domain group and accounts were already in place but the installation of SQL Server 2005 Service Pack 3 (SP3) was failing with the following error message on the Summary.txt setup log file:

Product : Database Services (MSSQLSERVER)
Product Version (Previous): 4035
Product Version (Final) :
Status : Failure
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB955706_sqlrun_sql.msp.log
Error Number : 28130
Error Description : MSP Error: 28130 A domain group is missing for one or more services. To install SQL Server 2005 as a failover cluster, domain groups must be specified for all the clustered services being installed .To proceed, enter the missing domain group information.
The domain group cannot be validated for the service SQL Server.

What was wrong then? To better understand this issue we need to take into account what information for these groups is stored on Windows registry. The setup program writes this information in the registry during the initial SQL Server installation, here you can find a description of these keys, found under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.X\Setup:

12Nov09A

As you can see, there are three different registry keys called AGTGroup, FTSGroup and SQLGroup that store the SID information for the SQL Server Agent, Fulltext Search and SQL Server groups in Active Directory respectively. These entries are used by the SQL Server setup during subsequent Hotfix or Service Pack updates.

In our case we could verify that these three registry keys were missing in both cluster nodes.

Ok, so I just need to add the missing registry keys for the Domain group but, how can I get the SQL Server group SID? You can use different techniques to get this information but in my case I found getsid.exe utility to be extremely easy to use. You can get getsid.exe on the Windows Support Tools; after downloading the tool you just need to type the following instruction on any server or PC that belongs to the same domain:

getsid.exe \\localhost CONTOSO\SQLGroup \\localhost CONSOTSO\SQLGroup

Where CONTOSO refers to the Windows Domain name and SQLGroup refers to the group used to store the SQL Server service accounts. You should get an output similar to the following one:

getsid \\localhost CONTOSO\SQLGROUP \\localhost CONTOSO\SQLGROUP
The SID for account CONTOSO\SQLGROUP matches account CONTOSO\SQLGROUP
The SID for account CONTOSO\SQLGROUP is S-1-5-21-3321254763-463695806-1538882281-2787112
The SID for account CONTOSO\SQLGROUP is S-1-5-21-3321254763-463695806-1538882281-2787112

Sysinternals PsGetSid will be also of help if you cannot make getsid running for some reason. After you get this information, you can add the missing registry keys into HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.X\Setup as follows:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\Setup]
"AGTGroup"="S-1-5-21-2777104479-2157580501-946951792-1110"
"SQLGroup"="S-1-5-21-2777104479-2157580501-946951792-1110"
"FTSGroup"="S-1-5-21-2777104479-2157580501-946951792-1110"
...

Important: You probably do not want to end up with your cluster down just because you incorrectly tampered the registry to fix a setup problem right? Please, take extra-ordinary caution when modifying the registry and always make sure you have a backup of it in place. For more information please, refer to KB310516.

Note: If your server is a 64-bits server you do not need to take into account any registry below HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node.

Another Note: The registry information for these keys is not replicated during cluster failover so you will need to add the same information on each one of the clustered nodes.

Disclaimer Remainder: Please, be aware that according to Microsoft Knowledge Base article KB925976 the recommended workaround to fix this issue is to reinstall the clustered instance of SQL Server 2005. The information described here is provided with no warranty and it may not work on your environment.