“The domain group cannot be validated for the service SQL Server” SQL Server 2005 cluster SP3 setup error
19 November 09 03:35 PM | jorgepc | 0 Comments   

Here again I am trying to shed some light on another SQL Server 2005 setup error that you may encounter during your Service Pack or Hotfix deployment.

As you may know, one of the differences between installing a SQL Server instance on a standalone server vs. a cluster server is the service accounts required to run the different SQL Server services. While we are free to use local Windows accounts to run a non-clustered SQL Server instance, we are required to use a Windows Domain account to run a clustered SQL Server instance. This is better explained in the following KB article (KB910708, You must specify the domain groups for the clustered SQL Server services when you install a SQL Server failover cluster):

When you install a stand-alone instance of SQL Server, the SQL Server Installation Wizard creates a set of local groups. The wizard also adds service accounts to the groups. When you install a SQL Server failover cluster, SQL Server requires domain accounts to start the services. The domain accounts must be added to a domain group.

So it is not only important to have domain accounts in place to run the different SQL Server services but also to have a Domain group where these accounts are placed. It is also highly recommended to do not use this group to store other application’s accounts.

In this case the domain group and accounts were already in place but the installation of SQL Server 2005 Service Pack 3 (SP3) was failing with the following error message on the Summary.txt setup log file:

Product : Database Services (MSSQLSERVER)
Product Version (Previous): 4035
Product Version (Final) :
Status : Failure
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB955706_sqlrun_sql.msp.log
Error Number : 28130
Error Description : MSP Error: 28130 A domain group is missing for one or more services. To install SQL Server 2005 as a failover cluster, domain groups must be specified for all the clustered services being installed .To proceed, enter the missing domain group information.
The domain group cannot be validated for the service SQL Server.

What was wrong then? To better understand this issue we need to take into account what information for these groups is stored on Windows registry. The setup program writes this information in the registry during the initial SQL Server installation, here you can find a description of these keys, found under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.X\Setup:

12Nov09A

As you can see, there are three different registry keys called AGTGroup, FTSGroup and SQLGroup that store the SID information for the SQL Server Agent, Fulltext Search and SQL Server groups in Active Directory respectively. These entries are used by the SQL Server setup during subsequent Hotfix or Service Pack updates.

In our case we could verify that these three registry keys were missing in both cluster nodes.

Ok, so I just need to add the missing registry keys for the Domain group but, how can I get the SQL Server group SID? You can use different techniques to get this information but in my case I found getsid.exe utility to be extremely easy to use. You can get getsid.exe on the Windows Support Tools; after downloading the tool you just need to type the following instruction on any server or PC that belongs to the same domain:

getsid.exe \\localhost CONTOSO\SQLGroup \\localhost CONSOTSO\SQLGroup

Where CONTOSO refers to the Windows Domain name and SQLGroup refers to the group used to store the SQL Server service accounts. You should get an output similar to the following one:

getsid \\localhost CONTOSO\SQLGROUP \\localhost CONTOSO\SQLGROUP
The SID for account CONTOSO\SQLGROUP matches account CONTOSO\SQLGROUP
The SID for account CONTOSO\SQLGROUP is S-1-5-21-3321254763-463695806-1538882281-2787112
The SID for account CONTOSO\SQLGROUP is S-1-5-21-3321254763-463695806-1538882281-2787112

Sysinternals PsGetSid will be also of help if you cannot make getsid running for some reason. After you get this information, you can add the missing registry keys into HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.X\Setup as follows:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\Setup]
"AGTGroup"="S-1-5-21-2777104479-2157580501-946951792-1110"
"SQLGroup"="S-1-5-21-2777104479-2157580501-946951792-1110"
"FTSGroup"="S-1-5-21-2777104479-2157580501-946951792-1110"
...

Important: You probably do not want to end up with your cluster down just because you incorrectly tampered the registry to fix a setup problem right? Please, take extra-ordinary caution when modifying the registry and always make sure you have a backup of it in place. For more information please, refer to KB310516.

Note: If your server is a 64-bits server you do not need to take into account any registry below HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node.

Another Note: The registry information for these keys is not replicated during cluster failover so you will need to add the same information on each one of the clustered nodes.

Disclaimer Remainder: Please, be aware that according to Microsoft Knowledge Base article KB925976 the recommended workaround to fix this issue is to reinstall the clustered instance of SQL Server 2005. The information described here is provided with no warranty and it may not work on your environment.

Filed under: ,
“Failed to find a cluster group that owned shared disk” SQL Server 2005 setup error
12 November 09 03:07 PM | jorgepc | 0 Comments   

I think it could be interesting to share this SQL Server 2005 setup error since the message written in the setup logs was not clearly exposing the root-cause of the issue.

In this case my customer was trying to update to Service Pack 3 (SP3) a SQL Server 2005 clustered instance. The error that we could see on the Hotfix.log SP3 setup log:

10/20/2009 12:05:18.655 MSP Error: 29527 The setup has encountered an unexpected error in datastore. The action is RestoreSetupParams. The error is : Source File Name: datastore\cachedpropertycollection.cpp
Compiler Timestamp: Mon Nov 17 17:05:55 2008
Function Name: CachedPropertyCollection::findProperty
Source Line Number: 130
----------------------------------------------------------
Failed to find property "OwningGroup" {"VirtualServerInfo", "", "CONTOSOSQL"} in cache
Source File Name: datastore\clusterinfocollector.cpp
Compiler Timestamp: Mon Nov 17 17:05:55 2008
Function Name: ClusterInfoCollector::collectClusterVSInfo
Source Line Number: 888
----------------------------------------------------------
Failed to detect VS info due to datastore exception.
Source File Name: datastore\clustergroupsproperties.cpp
Compiler Timestamp: Mon Nov 17 17:05:55 2008
Function Name: ClusterGroupScope.SharedDisks
Source Line Number: 56
----------------------------------------------------------
Failed to fin
10/20/2009 12:05:18.843 Attempting to continue the 32 bit ngen queue
10/20/2009 12:05:18.905 Attempting to continue the 64 bit ngen queue

Please, pay attention to the button of the log highlighted in blue color; as you can see the latest paragraph seems to be incomplete. You can find this situation with some error messages in setup log files and the best option it to look up the same error on the Application Event Log, where we will usually see the complete message; in our case this was the error:

Failed to find property "OwningGroup" {"VirtualServerInfo", "", "CENSVXISQL"} in cache
Source File Name: datastore\clusterinfocollector.cpp
Compiler Timestamp: Mon Nov 17 17:05:55 2008
Function Name: ClusterInfoCollector::collectClusterVSInfo
Source Line Number: 888
----------------------------------------------------------
Failed to detect VS info due to datastore exception.
Source File Name: datastore\clustergroupsproperties.cpp
Compiler Timestamp: Mon Nov 17 17:05:55 2008
Function Name: ClusterGroupScope.SharedDisks
Source Line Number: 56
----------------------------------------------------------
Failed to find a cluster group that owned shared disk: D:

Clearly, the next step was to check what was wrong with server’s D: drive. In our case this drive was a clustered shared disk, not a local one and it was part of the SQL Server 2005 cluster group. At first sight, everything was fine with this resource, but looking in more detail into the cluster configuration we could see what was wrong:

Resource:
---------
Disk D: { Physical Disk }
    +(1)-----Depends On-> SQL Server Fulltext  { Generic Service }

 

For some reason, the SQL Server Fulltext Search service (FTS) was configured as a dependency for "Disk D:" resource. This is not correct, as you can verify on one of my previous post and in Microsoft Knowledge Base article KB835185, Failover cluster resource dependencies in SQL Server:

The SQL IP address resource and the physical disk resources do not depend on any resources

After eliminating this dependency on cluster resource "Disk D:" we were able to successfully install SQL Server 2005 SP3.

Filed under: ,
Lesson Learnt while Configuring Security on a Job Running through Linked Server
19 October 09 03:24 PM | jorgepc | 0 Comments   

This case taught me a very useful lesson about how impersonation works when executing a SQL Server Job using a linked server. If this scenario is new for you too, I hope you enjoy learning as much as I did.

In this case my customer was trying to run a stored procedure using a linked server from a local SQL Server 2005 instance to a remote SQL Server 2000. The stored procedure ran just fine when executed from SQL Server Management Studio (SSMS) but failed when ran as a SQL Server Agent Job.

These kind of mistakes are common in SQL Server when the SQL Server Agent account does not have enough permissions to execute the required Job. For example, when running a T-SQL statement that performs some kind of action on a remote server’s share, the credentials used to logon on the remote share are the ones of the user that opens SSMS; in other words, the user we have used to logon in our Windows session. However, when the statement is executed within a SQL Server Job, the credentials being used are the ones for the SQL Server Agent account, which is the account in charge of running jobs. This can be found on the following link:

[…] if you create a job that requires a drive letter or a universal naming convention (UNC) path, the job may run under your Microsoft Windows NT® 4.0 or Windows® 2000 user account while testing it. When SQL Server Agent then runs the job, it may not have the necessary permissions, drive letter configurations, or access to the required drive to execute the job.

There is additional information that needs to be taken into account when running a statement in SQL Server Management Studio vs. running the T-SQL code within a SQL Server Agent Job, this is described in Knowledge Base article KB269074:

The security context in which the job is run is determined by the owner of the job […] If the job is owned by an account (either a SQL Server login or a Windows NT authenticated login) that is a member of the Sysadmin role, the SQL Agent job runs under the context of the account used to start the SQL Agent service.

When the job is not owned by an account with sysadmin privileges, we can find a problem similar to the following one when running the job:

Remote access not allowed for Windows NT user activated by SETUSER. [SQLSTATE 42000] (Error 7410). The step failed.

There is a Knowledge Base article that explains how to workaround this error (see KB811031). In our case, the error was quite different though:

Error: 7416, Severity: 16, State: 1
Access to the remote server is denied because no login-mapping exists

I did a basic review to understand what was the configuration of both the SQL Server Agent Job and Linked Server, in our case this was what we had:

  • Job owner was the sa SQL Server account
  • Job was made up of a single step with a simple stored procedure execution that took care of copying several rows from a remote table (2000) into a local table (2005)
  • A Windows Domain account was used to run SQL Server Agent service in both the SQL Server 2000 and 2005
  • Both servers was located on the same Windows Domain and the Agent service account was the same.
  • In both SQL Server 2000 and 2005 there was a SQL Server account called "elearning" associated to an "elearning" login. "elearning” login had “datareader" and "public" roles

Concerning the linked server security configuration, my customer only had the "elearning" login on the security properties for the Linked Server, any other login attempt was denied:

19102009B

 

In this case we discovered that adding the SQL Server Agent account to the list of authorized users in the linked server security properties allowed us to run the job successfully. This makes sense because the SQL Server Account credentials are the ones used to execute the job in this case however, this lead to a security problem since a non-authorized user on the remote server could use the linked server connection to perform actions he has no rights to perform by default. In other words, having the SQL Server Agent account as authorized on the linked server allows the use of that linked server connection by any job where the owner is part of the sysadmin group.

My first thought was to use a SQL Server Agent Proxy account to work around the issue but T-SQL statements cannot make use of Proxies:

Job steps that execute Transact-SQL do not use SQL Server Agent proxies. Transact-SQL job steps run in the security context of the owner of the job.

In our case the job owner was sa account so the Job was being run under the credentials of SQL Server Agent account however, the linked server only granted remote access to the "elearning" login. In addition to this, when configuring the advanced Job step properties using SSMS, the GUI does not allow to specify an alternate SQL Server login, only allows to specify an alternate SQL Server user:

19102009A

However, the EXECUTE AS documentation clearly states that using both SLQ Server user accounts and logins are a valid approach:

SQL Server 2005 supports the ability to impersonate another principal either explicitly by using the stand-alone EXECUTE AS statement […]. The stand-alone EXECUTE AS statement can be used to impersonate server-level principals, or logins, by using the EXECUTE AS LOGIN statement. The stand-alone EXECUTE AS statement can also be used to impersonate database level principals, or users, by using the EXECUTE AS USER statement.

The solution for the customer’s requirements was as easy as using the EXECUTE AS LOGIN instruction in the T-SQL code for the stored procedure:

19102009C

How the SQL Server Agent and linked server security work is not difficult once you make sense of all the pieces in the puzzle. Knowing where these pieces fit will help you to better asset your SQL Server’s security.

Filed under: ,
SQL Server 2005 Service Pack does not Update all the Installed Components
07 September 09 02:58 PM | jorgepc | 0 Comments   

Some weeks ago I received a call from one of our customers concerning the SP3 installation on SQL Server 2005. The customer was choosing to install the Service Pack from Microsoft Update (as you probably know, Microsoft Update is a superset of Windows Update that takes care of updating not only Windows OS components but other Microsoft software as Office or SQL Server) and the installation completed successfully as far as he can see however, Windows Update site was still reporting SP3 for SQL Server as a required update. No matter how many times he tried to install SP3, the installation from Microsoft Update always completed successfully but always offered SP3 as required afterwards.

In our case the original SQL Server build was on SP2 (build 2005.90.3042) so it was clear that the error was not on the Microsoft Update detection engine. I asked the customer to download the SP3 from the Microsoft Download site and check what were the results in this case, installing the SP this way we have more control on each one of the steps performed by the setup. The customer reported that again the SP installation completed without errors but the SQL Server version was still build 3042.

We looked into the Service Pack summary.txt installation file, located by default under the C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\Log directory, and found the following information:

Products Detected Language Level Patch Level Platform Edition
Setup Support Files ENU 9.3.4035 x86
SQL Server Native Client ENU 9.00.4035.00 x86
MSXML 6.0 Parser ENU x86
SQLXML4 ENU 9.00.4035.00 x86
Microsoft SQL Server VSS Writer ENU 9.00.4035.00 x86

As you may have noticed there is a missing component on this list: Database Services. The SQL Server number that we can see in SQL Server Management Studio or by querying the built-in function @@version corresponds with the version of this component (the same version shown by the sqlservr.exe executable under the SQL Server program directory). The SP3 (build 4035) setup was therefore reporting a correct installation for all the shared SQL Server components, that were the only components detected in our case were.

Running a Process Monitor while executing the SP3 setup process we discovered several ACCESS DENIED processes that I could not reproduce on my test environment, most of this events were related to the Wmiprvse.exe component. We decided to check the different WMI (Windows Management Instrumentation) repositories status with the help of WMI Diagnostics Utility WMIDiag; as expected, WMIDiag reported several errors with the WMI service configuration:

34153 10:34:38 (0) ** WMIDiag v2.0 started on dinsdag 21 juli 2009 at 10:25.
34154 10:34:38 (0) **
34155 10:34:38 (0) ** Copyright (c) Microsoft Corporation. All rights reserved - January 2007.
[...]
35235 10:34:39 (0) ** DCOM security warning(s) detected: ........... 0.
35236 10:34:39 (0) ** DCOM security error(s) detected: ............. 2.
35237 10:34:39 (0) ** WMI security warning(s) detected: ............ 0.
35238 10:34:39 (0) ** WMI security error(s) detected: .............. 52.
35239 10:34:39 (0) **
35240 10:34:39 (1) !! ERROR: Overall DCOM security status: ......... ERROR!
35241 10:34:39 (1) !! ERROR: Overall WMI security status: .......... ERROR!
[...]
35401 10:34:39 (0) ** ERROR: WMIDiag detected issues that could prevent WMI to work properly!.  Check 'C:\DOCUMENTS AND SETTINGS\ADMINISTRATOR\LOCAL SETTINGS\TEMP\WMIDIAG-V2.0_2003_.SRV.RTM.32_NF04_2009.07.21_10.24.40.LOG' for details.
35402 10:34:39 (0) **
35403 10:34:39 (0) ** WMIDiag v2.0 ended on dinsdag 21 juli 2009 at 10:34 (W:181 E:57 S:1).

At this point we were pretty sure the error was in the WMI service configuration. Thanks to the help of Carlos Carrolo from the Microsoft Platform Support Team we were able to trace down the problem to a known issue in SQL Server:

KB941823, Some or all SQL Server 2005 services are not listed in SQL Server Configuration Manager, or you receive a "No SQL Server 2005 components were found" error message when you perform operations in SQL Server 2005 Surface Area Configuration

The fact that none of the SQL Server service components was listed under the SQL Server Configuration Manager console confirmed our finding (this is described as one of the Symptoms on the KB).

This Knowledge Base article did not mention our setup scenario, but the workaround provided worked in our case. This KB goes very specific explaining the root-cause of the issue: during the very beginning of the Service Pack setup process, the existing SQL Server services are discovered via WMI queries by the means of Wmiprvse.exe process; without the proper permissions, the process will fail to discover the installed components. This is further discussed at the end of this Knowledge Base article:

[…] These tools iterate through the services collection to obtain the information about the SQL Server 2005 services. When these tools iterate through the services collection, these tools generate the following Windows Management Instrumentation (WMI) queries:
SELECT * FROM RegServices
SELECT * FROM SqlService
When these tools generate the WMI queries, the SQL Server Web-Based Enterprise Management (WBEM) provider (Sqlmgmprovider.dll) is loaded into the Wmiprvse.exe process. Then, the SQL Server WBEM provider extracts and processes the information about the services of every instance of SQL Server 2005. […] The Wmiprvse.exe process in which the Sqlmgmprovider.dll file is loaded runs under the security context of the NETWORK SERVICE account. […]

It seemed that at some point after the SQL Server SP2 installation, the server was updated with a new Security Policy that broke the Discretionary Access Control List (DACL) for SQL Server leading to a broken WMI detection mechanism.

Filed under: ,
Open Windows Firewall Ports for SQL Server the Easy Way
25 August 09 11:17 AM | jorgepc | 0 Comments   

After installing a new SQL Server 2008 instance on my Windows 7 laptop I find myself unable to logon locally into the instance by using SQL Server Management Studio (SSMS). I know that due to the changes introduced in Windows Server 2008 and Windows Vista Firewall this was possibly due to an open port missing.

The Configuring the Windows Firewall to Allow SQL Server Access Books Online entry contains the information to open the required ports in Windows Firewall for each one of the SQL Server services. However, I recently discover a much easier way to open the required Windows Firewall ports for SQL Server on the following Knowledge Base Article (KB968872):

25082009A

This article includes the Microsoft "Fix it" troubleshooter that allows to quickly fix the related problem using an easy-to-follow setup. The "Fix it" button is designed for Windows Server 2008 but the script that is executed by the setup is included on the same KB article and can be executed on Windows Vista or Windows 7 as well:

@echo =========  SQL Server Ports  ===================
@echo Enabling SQLServer default instance port 1433
netsh firewall set portopening TCP 1433 "SQLServer"
@echo Enabling Dedicated Admin Connection port 1434
netsh firewall set portopening TCP 1434 "SQL Admin Connection"
@echo Enabling conventional SQL Server Service Broker port 4022 
netsh firewall set portopening TCP 4022 "SQL Service Broker"
@echo Enabling Transact-SQL Debugger/RPC port 135
netsh firewall set portopening TCP 135 "SQL Debugger/RPC"
@echo =========  Analysis Services Ports  ==============
@echo Enabling SSAS Default Instance port 2383
netsh firewall set portopening TCP 2383 "Analysis Services"
@echo Enabling SQL Server Browser Service port 2382
netsh firewall set portopening TCP 2382 "SQL Browser"
@echo =========  Misc Applications  ==============
@echo Enabling HTTP port 80
netsh firewall set portopening TCP 80 "HTTP"
@echo Enabling SSL port 443
netsh firewall set portopening TCP 443 "SSL"
@echo Enabling port for SQL Server Browser Service's 'Browse' Button
netsh firewall set portopening UDP 1434 "SQL Browser"
@echo Allowing multicast broadcast response on UDP (Browser Service Enumerations OK)
netsh firewall set multicastbroadcastresponse ENABLE

 

Executing this on Windows 7 will show several warnings since the netsh firewall command in deprecated on this Windows version (netsh advfirewall firewall is the recommended method) but the script will perform its function nevertheless.

Kudos for the Microsoft "Fix it" team!

Filed under:
Do Not Use Cluster Server Recovery Utility on 64-bits Windows Server
13 July 09 12:06 PM | jorgepc | 0 Comments   

You probably have heard about Cluster Server Recovery Utility before, a nice Windows utility that alleviate some of the burden associated with Cluster changes (restoring checkpoint files, changing disk signatures, storage migration, etc.).

The Microsoft download page clearly states this utility is not suited for Windows Server 64-bits systems but I have recently found several SQL Server issues that are caused by the use of the utility on 64-bits SQL Server clustered installations.

Using this tool on 64-bits SQL Server clustered installations could potentially lead to hard-to-fix and time-consuming troubleshooting scenarios. For this reason I think it is worth the time to highlight here the importance of not using this tool on 64-bits.

What errors could I find when using this tool on 64-bits cluster?

Some of the SQL Server errors that can result from the use of Cluster Server Recovery Utility under 64-bits are:

SQLBrowser service was unable to establish SQL instance and connectivity discovery (Application Event Log)

2009-05-13 08:11:13.04 Server Error: 17182, Severity: 16, State: 1.
2009-05-13 08:11:13.04 Server TDSSNIClient initialization failed with error 0x2, status code 0x35.
(SQL Server Errorlog)

Please, keep in mind these errors could have a different root-cause, not necessarily Cluster Server Recovery.

How do I know if I have used Cluster Server Recovery Utility on my 64-bits cluster?

You will see that one ore more entries like this are created under the Wow6432Node registry entry:

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Microsoft SQL Server\MSSQL.x]

Oh-oh, I think I have use this on my 64-bit Cluster, what should I do?

Fixing these errors requires several registry changes and it is not a straightforward process. Please, contact Microsoft PSS for help on this case.

Filed under:
“SQLBrowser start request failed” Error Installing SQL Server 2008
28 May 09 10:51 AM | jorgepc | 0 Comments   

I worked last day on an very tricky SQL Server 2008 setup problem and think it is worth the time sharing here what I discovered. In this case my customer was trying to install SQL Server 2008 on a Windows XP SP3 computer to support a locally installed ERP software package.

The SQL Server 2008 setup was working fine at the beginning but failing in the middle of the process with the following error message:

SQLBrowser start request failed. Click Retry to retry the failed action, or click Cancel to cancel this action and continue setup

Clicking on the Retry button did retry the start operation but failed again with the same error so we had no other option but cancel the setup process.

Checking the Windows System Event log did not provide any hint about the root-cause of the problem so I decide to check for the SQL Server 2008 setup log that can be found in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log folder; there are different files included in this folder and each one of them provide information about each one of the components installed (SSIS, AS, Engine, Client Tools, etc.). The Detail.txt log file, as the name implies, includes very detailed information of the the overall setup process and it is usually the first thing I look into when dealing with setup issues; here is what we found in our case:

2009-05-13 17:25:13 SQLBrowser: The last attempted operation: Starting the SQL Server Browser
service 'SQLBrowser', and waiting for up to '900' seconds for the process to complete.
2009-05-13 17:25:13 Slp: Prompting user if they want to retry this action due to the following failure:
2009-05-13 17:25:13 Slp: --------------------------------------
2009-05-13 17:25:13 Slp: The following is an exception stack listing the exceptions in outermost to innermost order
2009-05-13 17:25:13 Slp: Inner exceptions are being indented
2009-05-13 17:25:13 Slp:
2009-05-13 17:25:13 Slp: Exception type: Microsoft.SqlServer.Configuration.Sco.ScoException 2009-05-13 17:25:13 Slp: Message:
2009-05-13 17:25:13 Slp: Service ‘SQLBrowser’ start request failed.
2009-05-13 17:25:13 Slp: Data:
2009-05-13 17:25:13 Slp: Feature = SQL_Browser_Redist_SqlBrowser_Cpu32
2009-05-13 17:25:13 Slp: Timing = Startup
2009-05-13 17:25:13 Slp: DisableRetry = true

So I could see that indeed the SQLBrowser service was being created but could not be started for some reason. I decided then to check the Summary.txt setup log file for a more comprehensive list of what components were and were not installed successfully; this file can be also located under the \LOG folder and contains a brief summary of the actions performed by the SQL Server setup. Interestingly enough, the error on this summary setup log file was not pointing to the Browser service but to MSXML 6.0 (Microsoft XML engine):

Detailed results:
Feature:                       Database Services
Status:                        Failure                    
MSI status:                    Failure
  MSI error code:                0x5EBE5729
  MSI log file location:         C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20090514_170659\Msxml6_Cpu32_1.log

So I looked up into this setup log file (Msxml6_Cpu32_1.log):

MSI (s) (30:58) [17:22:37:661]: Note: 1: 1708
MSI (s) (30:58) [17:22:37:661]: Product: MSXML 6.0 Parser (KB933579) – Installation error.

MSI (s) (30:58) [17:22:37:661]: Windows Installer installed the product. Product Name: MSXML 6.0 Parser (KB933579). Product Version: 6.10.1200.0. Product Language: 3082. Installation success or error status: 1603.

MSI (s) (30:58) [17:22:37:661]: Cleaning up uninstalled install packages, if any exist
MSI (s) (30:58) [17:22:37:661]: MainEngineThread is returning 1603
MSI (s) (30:68) [17:22:37:770]: No System Restore sequence number for this installation.

As you can see, SQL Server Browser was not the only component failing during the setup, the MSXML 6.0 engine was failing too. Comparing time from both Detail.txt and Msxml6_Cpu32_1.log files I could see the later was newer; in other words, the MSXML setup error was happening before the SQL Server Browser error. I went back to the Detail.txt file to confirm this point:

2009-05-13 17:22:36 Slp: Running Action: Install_Msxml6_Cpu32_Action
2009-05-13 17:22:36 Slp: Target package: "D:\x86\setup\x86\msxml6.msi"
2009-05-13 17:22:37 Slp: InstallPackage: MsiInstallProduct returned the result code 1603.
2009-05-13 17:22:38 Slp: Sco: Attempting to write hklm registry key SOFTWARE\Microsoft\Microsoft SQL Server to file C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20090514_170659\Registry_SOFTWARE_Microsoft_Microsoft SQL Server.reg_
2009-05-13 17:22:38 Slp: Sco: Attempting to write hklm registry key SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall to file C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20090514_170659\Registry_SOFTWARE_Microsoft_Windows_CurrentVersion_Uninstall.reg_
2009-05-13 17:22:38 Slp: Sco: Attempting to write hklm registry key SOFTWARE\Microsoft\MSSQLServer to file C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20090514_170659\Registry_SOFTWARE_Microsoft_MSSQLServer.reg_
2009-05-13 17:22:43 Slp:
2009-05-13 17:22:43 Slp: Watson bucket for Msi based failure has been created
2009-05-13 17:22:43 Slp: No retry-able MSI return code detected.
2009-05-13 17:22:43 Slp: Checkpoint: INSTALL_MSXML6_CPU32_ACTION
2009-05-13 17:22:43 Slp: Completed Action: Install_Msxml6_Cpu32_Action, returned False
2009-05-13 17:22:43 Slp: Error: Action "Install_Msxml6_Cpu32_Action" failed during execution.

Checking into Add/Remove Programs in Control Panel I could see the MSXML 6.0 engine listed as an installed. This is one of the shared component installed by SQL Server 2008 but in my case the customer was unsure if this component was previously installed on the Windows XP computer. I decided to remove the MSXML 6.0 engine from the PC using the Add/Remove Programs Control Panel applet and try the SQL Server 2008 setup again. This time the installation completed with no errors.

In this case the initial SQL Server Browser error was misleading because it was the first error that was shown by the Setup GUI. Only by looking inside the SQL Server setup log files we were able to discover the MSXML 6.0 installation issue the took place before.

Another possible way of tracing down this error is by looking into the Watson Bucket log file that exists on the same setup “Log” directory:

In our case this was the content of this file:

Watson bucket data:
  Bucket param 1: 10.0.1600.22
  Bucket param 2: 6.10.1200.0
  Bucket param 3: msxml6.msi
  Bucket param 4: 0x2D2816FE
  Bucket param 5: 0x5EBE5729
  Bucket param 6: Install_Msxml6_Cpu32_Action
  Bucket param 7:
  Bucket param 8:
  Bucket param 9:
  Bucket param 10:

I have reviewed the SQL Server 2008 SP1 notes and this issue has been solved there so another way to avoid this problem would be by installing a slipstreamed SQL Server 2008 media, slipstreaming is a new SQL Server functionality that allows the database administrator to integrate Service Packs into the installation media.

Filed under:
How to Quickly Review your SQL Server Cluster Configuration on Windows Server 2003
21 January 09 06:52 PM | jorgepc | 0 Comments   

Working at Microsoft Customer Support Services (CSS), we are frequently asked by our customers to check their SQL Server cluster configuration.  Event though the task of installing and configuring a SQL Server cluster have been simplified every new Windows Server version, it is not strange to find clusters with non-optimal configurations. Part of this problem relies on the amount of settings the IT professional has to take into account when setting up a new cluster. Fortunately, Windows Server 2008 has introduced significant changes into the cluster setup, configuration and maintenance with the aim of simplifying most of the hard work for the IT professional.

The goal of this post is to provide a quick guide for checking for some of the most common mistakes when configuring SQL Server on a Windows Server 2003 failover cluster. For this task we will use the CLUSMPS.EXE utility included in MPSReports.

MPSReports to the Rescue

Microsoft Product Support’s Reporting Tools (aka MPSReports) is probably one of the most popular tools among Microsoft Support professionals. This tool is used in most of the support incidents to gather SQL Server and general Windows logs and settings, including the failover cluster ones.  When executing MPSReports (for SQL Server you will have to download and run MPSRPT_SQL.exe executable file) we are actually executing a group of scripts and executables behind the scenes; one of these executables, CLUSMPS.EXE, will dump the SQL Server cluster configuration to a text file in the form of SERVERNAME_CLUSTER_MPS_INFORMATION.txt.

The setup of MPSReports is pretty straightforward but is always recommended you check the ReadMe.txt file to have a better understanding of what test are performed and what log files are generated. You can find this ReadMe.txt here and additional information about this tool here.

 

After running this tool we will find the cluster configuration file into the resultant CAB file in the MPSReports folder (default path for this folder is %WINDIR%\MPSReports\). We can optionally execute CLUSMPS.EXE directly by opening a Command Prompt windows on the active cluster node and running the tool from %WINDIR%\MPSReports\SQLServer\BIN\x86 path. The following picture describes the command-line options available:

 

It is important to note that neither CLUSMPS.EXE nor the rest of the tests performed by MPSReports will change the Windows or SQL Server configuration in any way.

General Cluster Configuration

The first section of the resultant SERVERNAME_CLUSTER_MPS_INFORMATION.txt file provides a general overview of what nodes are part of the cluster and what cluster groups and resources are located on these nodes. This is an at-a-glance view of the information we can find by using Windows Cluster Administrator (cluadmin.exe) tool.

This section includes Nodes Information, Group and Resource Information, which will be useful if you need to dig deep for cluster events into cluster.log, (see my previous post for more information on this) and Resource to Guid Information.

Network Configuration

The Network Configuration section includes Cluster Networks Information, Cluster Networks Priority, Network Interfaces, and Network Interfaces Binding Order. It is important to do not overlook these settings. In most of the cases not configuring network settings correctly will cause your SQL Server instance to do not perform optimally.

Cluster Networks Information will show a list of the networks recognized by Microsoft Cluster Services (MCS). On a typical cluster configuration, each one of the cluster nodes will include at least two Network Interface Cards (NICs): One is used for internal cluster communications between nodes (commonly referred as "Heartbeat” network) and the other is used for external communications (“public” network).

Pay attention to the Network Role column, by default you will see that both the Heartbeat and Public network are configured to support All Communications (“All Comm”) but it is a best practice to modify the Heartbeat configuration so it listens to Internal Communications only. To do so open the Heartbeat network properties on the Cluster Administrator utility (cluadmin.exe) and select the option Internal Cluster Communications Only (Private Network):

 

Cluster Networks Priority shows the priority of each one of this networks. As a best practice, ensure that the private network is listed with the highest priority. This setting can be changed by accessing the cluster properties on Cluster Administrator utility:

 

The Network Interfaces Binding Order determines how each network protocol is bound to each services that make use of it. As a general rule, the most-used protocols should be listed first. The recommended binding order for a Windows cluster is as follows:

          • External public network
          • Internal private network (Heartbeat)
          • Other connections, if any

Network bindings can be configured using the Advanced > Advanced Settings option in Control Panel > Network and Dial-up Connections.

The following Knowledge Base article provides specific guidelines to configure network components on a Windows cluster: Recommended private “Heartbeat” configuration on a cluster server, Server Clusters: Network Configuration Best Practices for Windows 2000 and Windows Server 2003.

You can also find best practices to configure the different network components on a cluster on the following TechNet link: Network Configuration Best Practices for Windows 2000 and Windows Server 2003.

Cluster Resources Dependencies

On each one of the cluster groups, it is frequent to find resources that depend on others to function properly. These dependencies are depicted on the Dependency Tree List and Dependency List sections of the SERVERNAME_CLUSTER_MPS_INFORMATION.txt file. We usually do not need to take SQL Server cluster dependencies into account, these are automatically configured at setup time and should not be changed from defaults but it is worth the time to check our current dependencies tree against the recommended settings. You can find this information and other special considerations on Knowledge Base article KB835185.

Here you can find default dependencies for the SQL Server 2000 failover cluster components:

SQL Server (SHILOH) { SQL Server }
    +(1)-----Depends On-> SQL Network Name(SQL2000)  { Network Name }
        +(2)-----Depends On-> SQL IP Address1(SQL2000)  { IP Address }
    +(1)-----Depends On-> Disk S:  { Physical Disk }


SQL Server Agent (SHILOH) { SQL Server Agent }
    +(1)-----Depends On-> SQL Server (SHILOH)  { SQL Server }
        +(2)-----Depends On-> SQL Network Name(SQL2000)  { Network Name }
            +(3)-----Depends On-> SQL IP Address1(SQL2000)  { IP Address }
        +(2)-----Depends On-> Disk S:  { Physical Disk }


SQL Server Fulltext (SHILOH) { Microsoft Search Service Instance }
    +(1)-----Depends On-> SQL Server (SHILOH)  { SQL Server }
        +(2)-----Depends On-> SQL Network Name(SQL2000)  { Network Name }
            +(3)-----Depends On-> SQL IP Address1(SQL2000)  { IP Address }
        +(2)-----Depends On-> Disk S:  { Physical Disk }

And here are the default dependencies for the SQL Server 2005 failover cluster components:

SQL Server (YUKON) { SQL Server }
    +(1)-----Depends On-> Disk Y:  { Physical Disk }
    +(1)-----Depends On-> SQL Network Name (SQL2005)  { Network Name }
        +(2)-----Depends On-> SQL IP Address 1 (SQL2005)  { IP Address }


SQL Server Agent (YUKON) { SQL Server Agent }
    +(1)-----Depends On-> SQL Server (YUKON)  { SQL Server }
        +(2)-----Depends On-> Disk Y:  { Physical Disk }
        +(2)-----Depends On-> SQL Network Name (SQL2005)  { Network Name }
            +(3)-----Depends On-> SQL IP Address 1 (SQL2005)  { IP Address }


SQL Server Fulltext (YUKON) { Generic Service }
    +(1)-----Depends On-> Disk Y:  { Physical Disk }

Cluster Service Account Rights

The Account Privileges section includes a list of the different right required by the Microsoft Cluster Service account as well as any missing permissions. The following example illustrates an scenario where two of the permissions are missing for this account:

              _____________________________________________
             |/////////////////////////////////////////////|
             |//         Current Effective Rights        //|
             |/////////////////////////////////////////////|

Act as part of the operating system.
Back up files and directories.
Restore files and directories.
Adjust memory quotas for a process.
Increase scheduling priority.
Log on as Service.
Debug programs.
Manage auditing and security log.
Access this computer from the network.

              _____________________________________________
             |/////////////////////////////////////////////|
             |//     Missing Current Effective Rights    //|
             |/////////////////////////////////////////////|

Load and unload device drivers.
Impersonate a client after authentication.

The required permissions should be configured on each one of the cluster nodes. Failure in configuring proper rights for the Cluster Services account will cause you SQL Server cluster to show permissions error message or to fail completely during startup or failover operations. For a list of what permissions are required by this account in Windows Server 2003 please, refer to the Knowledge Base article KB269229.

Other Information

If you plan to make use of Distributed Transaction Services within SQL Server you should plan way ahead how to install the Microsoft Distributed Transaction Coordinator (MSDTC) cluster resource. The Knowledge Base article How to configure Microsoft Distributed Transaction Coordinator on a Windows Server 2003 cluster is the best place to start. In case you need to revert or modify an existing MSDTC clustered installation so it can be used by SQL Server, refer to this link.

For a more in-depth analysis of the general SQL Server cluster does and don’ts I highly recommended to check the following Microsoft Knowledge Base article: Clustered SQL Server do’s, don’ts, and basic warnings.

The following TechNet link contains a quick reference guide to install and configuring new Microsoft cluster in Windows Server 2003: Quick Start Guide for Server Clusters. For specific SQL Server guidelines refer to SQL Server Books Online.

During this post I have tried to summarize the key points to look for when checking SQL Server cluster configuration on Windows Server 2003. This is not a complete checklist and working over the different links provided in the post will give you a more complete information. The new Cluster Validation Wizard introduced in Windows Server 2008 cluster simplifies and reduced significantly the overhead associated with cluster configuration in previous Windows versions. If you haven’t had the opportunity to work with Server 2008 cluster, I encourage you to setup your lab and get familiar with this new cluster model.

Filed under:
Import Export Wizard is not Copy Database Wizard
19 January 09 11:33 AM | jorgepc | 0 Comments   

During a recent work with one of our customers I did notice that it is not always clear what tools are available and required to copy SQL Server database objects, understanding “object” as a whole database or only part of it (table, view, user, etc.)

As part of this work, my customer was trying to copy several databases between two different SQL Server instances but after several tests he was finding that although the tables were correctly copied, other objects as views and primary keys were not correctly transferred. It come to my attention that he was using the SQL Server Import Export Wizard to transfer the databases between one instance and another. Import Export Wizard can be launched from within SQL Server Management Studio (SSMS) or directly from a command prompt window typing DTSWizard.exe (further information about how to start Import Export Wizard can be found on this MSDN link)

 

As indicated on the startup page on the Import Export Wizard, this tool can be used to copy data between different data sources and we have the option to recreate the destination database and tables, however this is not the right tool to use if we want to copy the whole database (not only tables, but all other objects) from one instance to another. The following Books Online link tries to clarify this difference:

The purpose of the SQL Server Import and Export Wizard is to copy data from a source to a destination. The wizard can also create a destination database and destination tables for you. However, if you have to copy multiple databases or tables, or other kinds of database objects, you should use the Copy Database Wizard instead

Copy Database Wizard will allow you to copy entire databases from one SQL Server instance into another (or the same, if you desire to do so) retaining the full database structure and objects. You can use two different methods to copy the databases and it is possible to keep the source database online while the copy is taking place, the following image shows these options:

 

You can launch Copy Database Wizard directly from SSMS right-clicking on any database > Tasks > Copy database. You will see that this option (red) is side-by-side with the Import Export Wizard (green):

 

Copy Database Wizard support the copy of databases between SQL Server 2000 and SQL Server 2005 providing an effective and simple way of upgrading databases between these two versions. For this wizard to work, SQL Server 2005 Service Pack 2 should be installed not only on the server where we are executing the tool but also on the destination server. For more information about this option please, refer to SQL Server Books Online.

In summary, if you want to copy entire databases between different SQL Server instances, Copy Database Wizard is your tool. If you want to export data or import data from/into SQL Server or other data sources or just want to copy some tables, use Import Export Wizard. If you need to copy data and perform modifications during this process with a fine-grain control you will find a better solution by building and Integration Services package in Business Intelligence Development Studio.

Filed under:
"Unable to create computer account on DC" error installing SQL Server 2005 cluster on Windows Server 2008
01 October 08 09:38 PM | jorgepc | 1 Comments   

Windows Server 2008 introduces significant changes to the Windows Cluster Service. The new Windows Cluster security model is one of the most important of these changes and will require seasoned SQL Server administrator to update her/his old-and-trusted knowledge.

Some days ago I was facing a problem trying to install a SQL Server 2005 cluster on Windows Server 2008. The SQL Server installation seemed to complete successfully on both cluster nodes however, the SQL Server service was not starting as a clustered service. As soon as the installation completes, the clustered database resource was failing over the second cluster node and failing completely right after.

The setup log file Summary.txt (located by default under C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG folder) showed no installation problems and the SQL Server ERRORLOG was not showing any useful information about the error:

2008-07-09 14:33:10.81 Server Microsoft SQL Server 2005 - 9.00.1399.06 (X64)
Oct 14 2005 00:35:21
Copyright (c) 1988-2005 Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.0 (Build 6001: Service Pack 1)

[...]

2008-07-09 14:33:39.39 spid5s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.

We did check that al the installation requirements for SQL Server 2005 were satisfied on the environment; the new "Validate Cluster Configuration Wizard" report recently run on the cluster was not showing any error either. Since many cluster installation errors in SQL Server are caused by incorrect settings at the Windows cluster level, I decided to look into the cluster.log file for any possible hint. You can find guidelines to start working on this cluster.log file, as well as a procedure to generate this file in Windows Server 2008 on my previous post.

Reviewing the cluster.log proved to be a good idea:

10:29:14.021 INFO [RES] Network Name <SQL Network Name (C2K5NET1)>: Failed to find a computer account for C2K5NET1. Attempting to create one on DC \\dccorp.contoso.lab.
000003f4.00001108::2008/07/10-10:29:14.021 INFO [RES] Network Name <SQL Network Name (C2K5NET1)>: Trying NetUserAdd() to create computer account C2K5NET1 on DC \\dccorp.contoso.lab in default Computers container
000003f4.00001108::2008/07/10-10:29:14.037 ERR [RES] Network Name <SQL Network Name (C2K5NET1)>: Unable to create computer account C2K5NET1 on DC \\dccorp.contoso.lab, in default Computers container, status 5
000003f4.00001108::2008/07/10-10:29:14.053 ERR [RHS] Online for resource SQL Network Name (C2K5NET1) failed.

It was clear that "something" was trying to create the computer account for the SQL Server Network Name on the DC and was failing on the attempt.

A look into the Application Event log showed also the same error (XML data is excluded):

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 09-Jul-08 10:47:21 AM
Event ID: 1194
Task Category: (19)
Level: Error
Keywords:
User: SYSTEM
Computer: CNODE2.contoso.lab
Description:
The description for Event ID 1194 from source Microsoft-Windows-FailoverClustering cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

SQL Network Name (C2K5NET1)
contoso.lab
Unable to create computer account
Access is denied.

CCLUSTER$

The Application Event log was indeed of great help since it included and Event ID (1194). A quick search into the Microsoft Support Web site showed detailed information for this event. According to this information, each clustered service or application on Windows Server 2008 is associated with a computer account in Active Directory (with the exception on a Hyper-V virtual machine), this account, called Virtual Computer Object (VCO) is created during installation by the High Availability wizard. If this VCO cannot be created for any reason, the clustered service won't work.

This VCO is created by the cluster account, known as Cluster Name Object (CNO, a different computer account with the same name as cluster), which is created during the initial Create Cluster wizard. These relationships are better explained in the following diagram:

 Source: Microsoft TechNet Library

If the "application account" does not have the "Create account" permission into Active Directory, the VCO (computer account for service or application) will not be created. In our case the VCO was non-existent in the Directory, a quick look into the CNO security properties showed the "Create account" right was missing. We did add the "Create account" permission and provisioned the VCO manually as described in TechNet Library. After these changes the SQL Server clustered service started successfully, we didn't need to re-install or change any other thing at cluster level.

You can find more information about the new security model introduced by Windows Server 2008 cluster in the following Knowledge Base article:

Description of the failover cluster security model in Windows Server 2008, http://support.microsoft.com/default.aspx?scid=kb;EN-US;947049

 

Filed under: ,
Using Windows Cluster log to troubleshoot SQL Server cluster errors
27 July 08 05:17 PM | jorgepc | 0 Comments   

There are times where all the SQL Server knowledge in the world won't help us to find a solution to our cluster problem. The following post will show you how to use the Microsoft Windows cluster log file to find the root-cause of a common SQL Server issue (it does not matter what SQL Server version we use for the purpose of this post).

In this case the customer called to Microsoft PSS due to a problem in which a SQL Server 2000 clustered instance was not starting up for an unknown reason. Looking at the customer Cluster Administrator, this was the situation of the SQL Server 2000 cluster group, called "SHILOH":

 

We tried to diagnose what was wrong using the SQL Server ERRORLOG file, but we could not find any useful information there. After a few seconds online, the SQL Server service was stopped by Service Control Manager for no apparent reason:

2008-07-27 17:28:04.51 server    Copyright (C) 1988-2002 Microsoft Corporation.
2008-07-27 17:28:04.51 server    All rights reserved.
2008-07-27 17:28:04.51 server    Server Process ID is 304.
2008-07-27 17:28:04.51 server    Logging SQL Server messages in file 'S:\Program Files\Microsoft SQL Server\MSSQL$SHILOH\log\ERRORLOG'.
2008-07-27 17:28:04.53 server    SQL Server is starting at priority class 'normal'(1 CPU detected).
2008-07-27 17:28:04.55 server    SQL Server configured for thread mode processing.
2008-07-27 17:28:04.56 server    Using dynamic lock allocation. [2500] Lock Blocks, [5000] Lock Owner Blocks.
2008-07-27 17:28:04.57 server    Attempting to initialize Distributed Transaction Coordinator.
2008-07-27 17:28:18.76 spid3     Starting up database 'master'.
2008-07-27 17:28:19.30 server    Using 'SSNETLIB.DLL' version '8.0.2039'.
2008-07-27 17:28:19.32 server    SQL server listening on 192.168.0.60: 1131.
2008-07-27 17:28:19.32 spid3     Server name is 'SHILOH\SHILOH'.
2008-07-27 17:28:19.32 spid5     Starting up database 'model'.
2008-07-27 17:28:19.33 spid8     Starting up database 'msdb'.
2008-07-27 17:28:19.33 spid9     Starting up database 'pubs'.
2008-07-27 17:28:19.34 server    SQL server listening on TCP, Shared Memory, Named Pipes.
2008-07-27 17:28:19.34 server    SQL Server is ready for client connections
2008-07-27 17:28:19.34 spid10    Starting up database 'Northwind'.
2008-07-27 17:28:20.30 spid5     Clearing tempdb database.
2008-07-27 17:28:21.49 spid5     Starting up database 'tempdb'.
2008-07-27 17:28:21.87 spid3     Recovery complete.
2008-07-27 17:28:21.87 spid3     SQL global counter collection task is created.
2008-07-27 17:28:42.29 spid3     SQL Server is terminating due to 'stop' request from Service Control Manager.

 

Application Event log was full of the following error, but the cryptic message description wasn't of any help to us:

Event Type:    Error
Event Source:    MSSQL$SHILOH
Event Category:    (3)
Event ID:    19019
Date:        7/27/2008
Time:        5:28:19 PM
User:        N/A
Computer:    JORGEPC-CN1
Description:
The description for Event ID ( 19019 ) in Source ( MSSQL$SHILOH ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: [sqsrvres] ODBC sqldriverconnect failed
.
Data:
0000: 4b 4a 00 40 01 00 00 00   KJ.@....
0008: 07 00 00 00 53 00 48 00   ....S.H.
0010: 49 00 4c 00 4f 00 48 00   I.L.O.H.
0018: 00 00 00 00 00 00         ......
 

 

System Event log did show a very generic error about Cluster service failing to bring SQL Server service online:

Event Type:    Error
Event Source:    ClusSvc
Event Category:    Failover Mgr
Event ID:    1069
Date:        7/27/2008
Time:        5:27:38 PM
User:        N/A
Computer:    JORGEPC-CN2
Description:
Cluster resource 'SQL Server (SHILOH)' in Resource Group 'SHILOH' failed.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

 

It was clear that something related to the Cluster service was causing the SQL Server service to go offline.

Windows Server 2003 includes two useful cluster log files where all cluster activity is logged on: cluster.log and cluster.oml. These two files can be located into the C:\Windows\Cluster folder. cluster.log is the file where all the cluster activity is collected in a very verbose way, while cluster.oml is where we can find mappings between globally unique identifiers (GUIDs) for each cluster resource and resource names (a kind of Rosetta stone to understand cluster.log file). You can find more information about these and other cluster log files at the Microsoft TechNet article How a Server Cluster Works.

At first sight cluster.log can look very intimidating but we just need to take the specific time where the SQL Server event happened and search for that time on the cluster log. Be aware that cluster.log files are stored in GMT time while Windows Event Logs are stored at local time, depending on your time zone you will have to add/subtract one or more hours. In my case I am located on GMT+2 so I know I I have to look for events logged on cluster.log two hours in advance.

I knew the first Application Event log took place around 17:28 so I did look for anything abnormal around 15h28m:

0000067c.00000870::2008/07/27-15:28:07.524 INFO [Qfs] QfsCloseHandle 424, status 0
0000067c.00000870::2008/07/27-15:28:07.524 INFO [Qfs] QfsDeleteFile C:\DOCUME~1\CLUADMIN\LOCALS~1\Temp\CLSA8.tmp, status 0
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Native Client][SQL Server]Login failed for user 'CONTOSO\CLUADMIN'.
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Named Pipes Provider: No process is on the other end of the pipe.

00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Communication link failure
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Named Pipes Provider: No process is on the other end of the pipe.

00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Communication link failure
00000718.00000ad4::2008/07/27-15:28:19.678 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
00000718.00000ad4::2008/07/27-15:28:19.678 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Named Pipes Provider: No process is on the other end of the pipe.

There were plenty of errors (code ERR) on the cluster.log file, but usually the first ones are the more interesting to look for:

(SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Native Client][SQL Server]Login failed for user 'CONTOSO\CLUADMIN'.

In our case CONTOSO\CLUADMIN was the Cluster service account use by Windows Cluster. As you probably know, this account needs to have access to SQL Server in order to perform the IsAlive and LooksAlive poll intervals to detect a resource failure.

In this case the problem was that the BUILTIN\Administrators SQL Server account has been removed as part of a Company security guideline. In a default installation, removing this group from the SQL Server logins effectively removes the ability for the Cluster service to log into SQL Server and perform the IsAlive and LooksAlive checks. This is a fairly well known issue discussed on Knowledge Base article KB291255.

In other cases you will see events in cluster.log related to some "cryptic" resources, like the following ones:

0000067c.00000f5c::2008/07/27-15:27:40.070 INFO [FM] FmpOnlineResourceList: trying to bring resource c7018774-2673-4350-ad88-196e4aca3f95 online
0000067c.00000f5c::2008/07/27-15:27:40.070 INFO [FM] OnlineResource: c7018774-2673-4350-ad88-196e4aca3f95 depends on 582210d0-c29d-40a5-af8a-bbbc08c3e08d. Bring online first.

cluster.log does not usually show the resource name but the resource GUI (like "resource ce12eb0d-a9d0-4e4a-8531-f3d70cad8c6c"). By using the cluster.oml file we can translate the GUID string to a specific cluster resource:

00000644.000006b0::2008/05/15-11:41:40.621 OBRENAME "Resource" "c7018774-2673-4350-ad88-196e4aca3f95" "SQL Server Fulltext (SHILOH)"

00000664.000006b8::2008/05/15-11:36:29.428 OBRENAME "Resource" "582210d0-c29d-40a5-af8a-bbbc08c3e08d" "SQL Server (SHILOH)"

In this cluster.log sample section Cluster service is reporting that it needs to start SQL Server service first in order to bring Full Text Search service online.

Additional Information

To make your sysadmin life easier, Microsoft has released a utility to simplify cluster.log analysis called Cluster Diagnostics and Verification Tool (ClusDiag.exe). This tool can be used to read in a more user-friendly way cluster log files from Windows Server 2003 and 2000 clusters. Here you can see a screen capture of this tool in action:

 

In Windows Server 2008 cluster.log file is not present in Cluster folder anymore. Even though you can still access cluster.log file on a Windows Server 2008 cluster; please refer to the Overview of Failover Clustering with Windows Server 2008 document in this Microsoft web page for more information about the changes incorporated into Windows Server 2008 cluster. I have extracted here the relevant part of the text:

The text-file-based cluster log is also gone. Event trace logging (.etl) is now enabled via Event Tracing for Windows (ETW). Default log sizes vary and can be modified with cluster installs logs (Operational and ClusterLog). There is new functionality built into the command line. The cluster.exe tool allows you to dump the trace log into a text file. This file looks similar to the cluster log used in previous versions of failover clustering. Use the cluster.exe Log /Generate command to see this log.

Filed under:
Enabling Certificate for SSL on a SQL Server 2005 Clustered Installation
19 February 08 11:27 AM | jorgepc | 0 Comments   

This week I has been working on a case about the installation of a certificate on a SQL Server cluster. My customer wanted to use certificates to encrypt client connections via Secure Sockets Layer (SSL), a method supported by SQL Server 2005 that allows to secure communications between clients and database servers (you can read more information about this functionality on Books Online). Although my customer was currently working with SSL on SQL Server standalone servers he was finding problems installing the certificate on a new cluster. Although setting up the certificate and encryption in a standalone installation is pretty straightforward (see KB316898) cluster installation differs, and requires extra configuration steps.

Obtaining the certificate

Since I was not familiar with the certificate installation on a clustered SQL Server instance, I started to read the available documentation in SQL Server Books Online, and more specifically, the how-to provided in the Enable Encrypted Connections to the Database Engine (SQL Server Configuration Manager) article and the Microsoft Knowledge Base article KB316898, How to enable SSL encryption for an instance of SQL Server by using Microsoft Management Console (this one only intended for standalone installations).

First I needed a certificate to do the testing. I started installing an stand-alone root certification authority (CA) on my lab domain following the guidelines describe here. I did also installed the convenient web enrollment feature, which allows a client to request a certificate from a CA using a web browser (if you do not want to deal with all the CA stuff, read "A Simpler Method to Obtain Certificates for Testing" at the end of this post).

I did request the certificate from one of the SQL Server nodes. According to the Books Online article, the certificate should be installed using the SQL Service account and it has to be placed on the local computer > personal certificates store:

Certificates are stored locally for the users on the computer. To install a certificate for use by SQL Server, you must be running SQL Server Configuration Manager under the same user account as the SQL Server service unless the service is running as LocalSystem, NetworkService, or LocalService, in which case you may use an administrative account.

This is because the certificate contains a private key that needs to be accessed by SQL Server service account for the SSL communication to work. If we use any other user account (say, local administrator) SQL Server service account will not be able to access the private key and the authentication will fail. The certificate can be optionally installed on the current user certificate store, but installing it on the computer certificate store makes the certificate available for any user.

I did logon in the server using the SQL Service account, since this is a cluster, this has to be a domain account. I did open the web enrollment site (by default under http://CAName/certsvr, where CAName is the Certification Authority's hostname) and requested a new Server Authentication Certificate. The most important thing at this point is to specify the fully qualify DNS name of the clustered instance in the "Name" property for this certificate; the SSL encryption will not work if we specify the name of the cluster node:

If you want to use encryption with a failover cluster, you must install the server certificate with the fully qualified DNS name of the failover clustered instance on all nodes in the failover cluster. For example, if you have a two-node cluster, with nodes named test1.your company.com and test2. your company.com and a failover clustered instance of SQL Server named fcisql, you must obtain a certificate for fcisql.your company.com and install the certificate on both nodes.

In my case, the SQL Network Name cluster resource was called "SQL-2005" so I did specify sql-2005.contoso.lab as Name property:

Certficate Request using Web Enrolment

 

While requesting the certificate, make sure that the option "Mark keys as exportable" is enabled so the private key is available when exporting and importing the certificate in the rest of the cluster nodes. I did also check the "Store certificate in the local computer certificate store" so I do not have to import the certificate later. After submitting the certificate request I did back to my CA and approved the request. Finally I returned to the web enrollment site to install the certificate.

At this time the server authentication certificate was installed but I still needed to install the root CA certificate; the Books Online article linked earlier explains why this is needed:

If the instance of SQL Server is running on a computer that has been assigned a certificate from a public certification authority, identity of the computer and the instance of SQL Server is vouched for by the chain of certificates that lead to the trusted root authority. Such server validation requires that the computer on which the client application is running be configured to trust the root authority of the certificate that is used by the server.

If the root CA certificate is not installed, the SQL virtual server certificate cannot be verified against a a trusted certification authority, this can be quickly checked accessing the installed certificate properties:

Certificate Properties

 

I did obtain a CA root certificate from my test Certification Authority and installed it on the Trusted Root Certification Authority store for the local computer.

The certificate must meet some requirements to be used for SSL (the requirements can be found on the Certificate Requirements section of this Books Online article). To make sure this certificate fulfilled al the requirements I did export the certificate to a local directory and check its details using certutil.exe command-line utility (more info about this utility can be found here). Additionally, I did also export the SSL certificate from the first cluster node (we will need to import this certificate on each of the cluster nodes later).

Make the Certificate Works for SSL

The certificate used by SQL Server to encrypt connections is specified in the following registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.x\MSSQLServer\SuperSocketNetLib\Certificate

This key contains a property of the certificate known as thumbprint that identifies each certificate in the server. I did find out that in a SQL Server clustered installation this key is not correctly updated with the corresponding certificate thumbprint. In my case, after importing the certificate, the registry key did show a null value:

Certificate Registry Key in SQL Server

 

This registry key should contain the thumbprint property of the certificate we want to use so I did copy the thumbprint from the certificate details tab and pasted it into Notepad to remove the extra spaces:

Certificate Thumprint

 

and finally copied the corresponding thumbprint string on the "Certificate" registry key (the "Certificate" registry key requires the certificate thumbprint to be included with no extra spaces):

Editing certificate thumbprint in Notepad

 

Just after the registry change, I performed a cluster failover on the second cluster node and rebooted the first cluster node.

On the second cluster node I imported both the certificate issued for the virtual SQL Server and the root CA certificate. I did perform the previous steps to have the certificates installed and the thumbprint string copied on the corresponding registry key. Finally, I did also rebooted this second cluster node. Once all the cluster nodes were up and running I checked that all of them had the same thumbprint string on the "Certificate" registry key. If your certificate has been correctly configured, you will see the following message logged on the SQL Server ERRORLOG file on the next service startup:

2008-02-18-23:37:25.01 Server     The certificate was successfully loaded for encryption

To enable SSL in SQL, I did change SQL Server network protocol properties for the clustered instance setting the "Force Encryption option" to "Yes". You will need to restart the SQL Server service for the change to take effect, but once it is done all the communication attempts with your database will need to be encrypted:

Protocol Properties Configuration

 

Note that if you try to select the corresponding SSL certificate on the "Certificate" tab of the SQL Server Protocols properties, you will see that the installed certificate does not show up. This behavior is a known issue in a clustered installation. SQL Server configuration manager search by default on the local computer personal certificates store and tries to mach an existing certificate with the fully qualified domain name (FQDN) of the local computer. Since the installed certificate is not associated to the cluster node FQDN but with the virtual SQL Server FQDN, the corresponding certificate is not shown on the GUI. To make sure what certificate is in use for SSL check the corresponding thumbprint string on the abovementioned registry key.

In addition to server, the client has to be also configured to support communication over SSL. To do so I imported the certificate on the client computer using the Certificates snap-in and configured the "Force protocol encryption" to "Yes" under the SQL Native Client Configuration (this option is available via the "Flags" page).

Testing the Connection

From my client PC I did test the connection to the remote SQL Server using the "Encrypt Connection" option available in SQL Server Configuration Manager. To verify that the connection was indeed encrypted I fired up a network analyzer to have a view of the traffic between my computer and the remote SQL Server. The SSL protocol was indeed being use in this connection, as we can see in Network Monitor screen capture:

 

In this post I have tried to provide you with the required procedure to have a certificate installed and configured for SSL in a clustered SQL Server installation. The links included in this post will provide you with a better understanding of this procedure and a much better foundation to implement encrypted client-server communications on your infrastructure.

A Simpler Method to Obtain Certificates for Testing

If you are in trouble trying to find a Certification Authority for your tests or if you do not want to deal with the CA installation and configuration process, you can use the MakeCert.exe certificate creation tool. This Books Online article describes how to quickly setup you certificate requirements for your own testing using MakeCert.exe.

Filed under:
SSIS error DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER when connecting to Oracle data source
12 February 08 08:02 PM | jorgepc | 1 Comments   

This time I want to discuss a support case that took me some days to get fixed. This case may be of interest for anyone having problems connecting a SQL Server installation to an Oracle data source using SQL Server Integration Services (SSIS).

In this case my customer was trying to connect to an Oracle Server in order to copy several tables into SQL Server 2005. SQL Server Integration Services is a great tool for this purpose, you can configure how the copy job takes place using SQL Server Business Intelligence Development Studio (aka BIDS) and check from an user-friendly interface how each step is executed. BIDS is in fact a version of Visual Studio 2005 (the executable name is devenv.exe) with the components required to work with the different SQL Server services and components.

My customer was executing BIDS from her own computer without issues but was finding a problem while working with BIDS from the server where SQL Server was installed. The errors she was getting were:

[Connection manager "mib.bigcorp.com.gca"] Error: SSIS Error Code DTS_E_OLEDBERROR. 
An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available.
Source: "Microsoft OLE DB Provider for Oracle"  Hresult: 0x80004005
Description: "ORA-06413: Connection not open.".

[OLE DB Source [1]] Error:
SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. 
The AcquireConnection method call to the connection manager "mib.bigcorp.com.gca"
failed with error code 0xC0202009. 
There may be error messages posted before this with more information on why the
AcquireConnection method call failed.

Since the customer was using the Microsoft OLE DB provider for Oracle to setup the connection, I did a research on the available documentation in Microsoft Support website using the terms "oracle", "oledb" and "provider". I did find several documents discussing different problems but one of them was of special interest here:

INFO: Limitations of Microsoft Oracle ODBC Driver and OLEDB Provider (KB244661)

SUMMARY: This article describes the limitations of the current version of Microsoft ODBC Driver and OLE DB Provider for Oracle. These components are currently in maintenance mode. No new updates are planned for future releases of these components including their support against versions later than Oracle 8i.

According to this article, it is not recommended to make use of the Microsoft OLE DB Provider for Oracle if the Oracle server is a version beyond 8i (my customer was using version 9i). This does not mean that the Microsoft OLE DB Provider did not work at all (indeed I was able to preview the remote table in the Oracle server and test connectivity successfully from within SSIS using this provider) but the functionality offered will be reduced and there was a good chance some incompatibility problem happened.

I asked the customer to install the Oracle client on the server (different versions of the Oracle client software are available here). This client provides the required OLE DB Provider from Oracle that enables connectivity to an Oracle data source. Only some hours later the customer back to me saying "I have installed the Oracle client, but the only Oracle OLE DB component I am able to select in SSIS is the Microsoft one". Ok, I knew that one from a different support incident I had in the past. SQL Server Integration Services runs as a 32-bit application on a 64-bit server, as you can quickly check using Windows Task Manager:

 

This means that, unless you install the Oracle Client for 32-bit on the 64-bit server, you won't be able to access the Oracle OLE DB Provider. This applies not only for the Oracle OLE DB, but for any other 32-bit provider. You can read a full explanation about why you need the 32-bit components in your 64-bit server here:

The 32-bit SSIS Designer [BIDS] displays only 32-bit providers that are installed on the local computer. To configure a connection manager to use a specific 64-bit provider, you must install the 32-bit version of the provider on the development computer for use at design time. Even though the 32-bit version of the provider is installed, you can still run the package in 64-bit mode, both in the development environment and after deployment. The 32-bit and 64-bit versions of a provider have the same ID. Therefore, the SSIS runtime will select the appropriate version of the provider to use.

But having the 32-bit component is not enough, we need both the 32-bit and 64-bit components installed side-by-side. Wit only the 32-bit Oracle client the package will fail when executing from BIDS, this is because there is only one SSIS service on a server, and it is indeed a 64-bit process when running on a 64-bit server as we can see on this Process Explorer screen capture (process name is MsDtsSrvr.exe):

 

 

Ok, we have now the SSIS package using the Oracle OLE DB provider from Oracle and we have verified that both the 32-bit and 64-bit version of the Oracle client are installed on the Windows server.

However, the package kept failing with the same error.

After several hours tracing down the error with the help of Sysinternals Process Explorer, Process Monitor and the Internet we did find the a possible cause for this error. Apparently some Oracle client versions are affected by a bug (Oracle bug number 3807408) that causes the client networking layer to incorrectly parse program locations that contain parenthesis in the path. You can find a brief discussion of this behavior on this thread in Oracle forums.

As you probably know, on a Windows Server x64 installation, the default program files directory is "C:\Program Files\" but all the 32-bit components are installed in "C:\Program Files (x86)\" directory. Since my customer did not have access to Oracle support at that time, we decide to test if this was the root cause of our problem copying the contents of the "C:\Program Files (x86)\Microsoft Visual Studio 8" folder under a new Program Files folder called "C:\ProgramFilesx86". We started BIDS (devenv.exe) from the new directory and executed the package both from BIDS and SQL Server, this time successfully.

The trick finally did the job and my customer was able to successfully import the required tables into SQL Sever, however, I do not think this is a good solutions (much less "elegant" solution) and I recommended her to contact Oracle support for a version of the Oracle client no affected by this bug.

Filed under:
Conocer el valor de la Edición Enterprise a través de sencillos ejemplos
11 January 08 02:49 PM | jorgepc | 0 Comments   

This is the Spanish version of this other post.

Esta vez sólo deseo anunciar que mi compañero Ignacio Alonso Portillo (Nacho) y yo mismo tenemos planes para comenzar una serie de artículos con ejemplos simples que demuestren, desde un punto de vista práctico, los enormes beneficios proporcionados por aquellas funciones que están sólo disponible en la Edición Enterprise de SQL Server 2005.

Últimamente, hemos escuchado con más frecuencia a los usuarios decir: “¿Por qué habría de elegir la Edición Enterprise de SQL Server 2005? Por lo que recuerdo de versiones anteriores, el único beneficio de la Edición Enterprise, en comparación con la Estándar, era la posibilidad de instalación en clúster y ahora, con SQL Server 2005, la edición Estándar ya incorpora esta funcionalidad; no necesito “gastar” un dinero extra adquiriendo una licencia de la Edición Enterprise.”

Sinceramente, esta afirmación me deja un tanto descolocado. :-P

En primer lugar, no es cierto que el soporte de clúster fuese la única diferencia entre la versión Estándar y la Enterprise en versiones anteriores. Había mucho más que eso. Pero no comentaremos cuestiones sobre SQL Server 2000 ahora, más de 7 años después de que fuese publicado. Esto no te aportaría demasiado valor. En su lugar, preferimos centrarnos inicialmente en SQL Server 2005 y posiblemente extender el estudio a SQL server 2008 más adelante.

En 2005 hay tanto valor añadido en la Edición Enterprise que preferimos creer que aquellos usuarios que están de acuerdo con el enunciado anterior lo están porque no han recibido la información suficiente acerca del valor del producto, cuáles son sus implicaciones y como pueden ellos beneficiarse realmente de las mismas.

Por esa razón, nos sentimos en la obligación de mostrarte todos estos beneficios, usando ejemplos sencillos y claros, con el ánimo de ayudarte a tomar mejores decisiones en el momento de elegir la edición de SQL Server más apropiada para satisfacer las necesidades de tu negocio.

Permanece atento y no pierdas ninguna de estas entregas si quieres tomar las mejores decisiones. ;-)

Filed under:
Understanding the value of the Enterprise Edition with simple examples
01 January 08 09:19 AM | jorgepc | 1 Comments   

This time I just wanted to announce that my colleague Ignacio Alonso Portillo (aka "Nacho") and myself have plans to start posting some simple examples to demonstrate, from a practical standpoint, the huge benefits provided by all the features which are only available with the Enterprise Edition of SQL Server 2005.

Recently, we have been, more and more, hearing SQL Server users saying: "Why would I want to choose the Enterprise Edition in SQL Server 2005? It sounds to me that from previous versions of the product, the only benefit I got from the Enterprise edition, compared to the Standard, was that it was cluster aware, and now in SQL Server 2005, the Standard edition also supports failover clustering, therefore I don't need to "waste" any extra money in buying an Enterprise edition license."

To be honest, that assertion knocks me out. :-P

First of all, it isn't true that failover clustering support was the only difference between Standard and Enterprise editions in previous versions. There was much more than that. But we won't touch the SQL Server 2000 topic nowadays, after over seven years since it was released. That wouldn't be of too much value for you. Instead, we prefer to focus in SQL Server 2005 initially, and will possibly extend it to cover SQL Server 2008 as well.

In 2005, there is so much extra value in the Enterprise Edition, that we prefer to believe that those users who agree with the statement above, is just because they haven't really been explained what all that value is, which form it adopts within the product, and how they can actually benefit from it.

For that reason, we feel it's our obligation to show you all these benefits, using simple and clear examples, with the aim of helping you to take better decisions when it comes to choosing the most appropriate edition of SQL Server to satisfy your business requirements.

So, keep tuned and don't miss any of those deliveries if you want to make the best decisions. ;-)

Filed under:
More Posts Next page »

Search

This Blog

Syndication

Page view tracker