• SQL Server Cast

    "Unable to create computer account on DC" error installing SQL Server 2005 cluster on Windows Server 2008

    • 1 Comments

    Windows Server 2008 introduces significant changes to the Windows Cluster Service. The new Windows Cluster security model is one of the most important of these changes and will require seasoned SQL Server administrator to update her/his old-and-trusted knowledge.

    Some days ago I was facing a problem trying to install a SQL Server 2005 cluster on Windows Server 2008. The SQL Server installation seemed to complete successfully on both cluster nodes however, the SQL Server service was not starting as a clustered service. As soon as the installation completes, the clustered database resource was failing over the second cluster node and failing completely right after.

    The setup log file Summary.txt (located by default under C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG folder) showed no installation problems and the SQL Server ERRORLOG was not showing any useful information about the error:

    2008-07-09 14:33:10.81 Server Microsoft SQL Server 2005 - 9.00.1399.06 (X64)
    Oct 14 2005 00:35:21
    Copyright (c) 1988-2005 Microsoft Corporation
    Enterprise Edition (64-bit) on Windows NT 6.0 (Build 6001: Service Pack 1)

    [...]

    2008-07-09 14:33:39.39 spid5s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.

    We did check that al the installation requirements for SQL Server 2005 were satisfied on the environment; the new "Validate Cluster Configuration Wizard" report recently run on the cluster was not showing any error either. Since many cluster installation errors in SQL Server are caused by incorrect settings at the Windows cluster level, I decided to look into the cluster.log file for any possible hint. You can find guidelines to start working on this cluster.log file, as well as a procedure to generate this file in Windows Server 2008 on my previous post.

    Reviewing the cluster.log proved to be a good idea:

    10:29:14.021 INFO [RES] Network Name <SQL Network Name (C2K5NET1)>: Failed to find a computer account for C2K5NET1. Attempting to create one on DC \\dccorp.contoso.lab.
    000003f4.00001108::2008/07/10-10:29:14.021 INFO [RES] Network Name <SQL Network Name (C2K5NET1)>: Trying NetUserAdd() to create computer account C2K5NET1 on DC \\dccorp.contoso.lab in default Computers container
    000003f4.00001108::2008/07/10-10:29:14.037 ERR [RES] Network Name <SQL Network Name (C2K5NET1)>: Unable to create computer account C2K5NET1 on DC \\dccorp.contoso.lab, in default Computers container, status 5
    000003f4.00001108::2008/07/10-10:29:14.053 ERR [RHS] Online for resource SQL Network Name (C2K5NET1) failed.

    It was clear that "something" was trying to create the computer account for the SQL Server Network Name on the DC and was failing on the attempt.

    A look into the Application Event log showed also the same error (XML data is excluded):

    Log Name: System
    Source: Microsoft-Windows-FailoverClustering
    Date: 09-Jul-08 10:47:21 AM
    Event ID: 1194
    Task Category: (19)
    Level: Error
    Keywords:
    User: SYSTEM
    Computer: CNODE2.contoso.lab
    Description:
    The description for Event ID 1194 from source Microsoft-Windows-FailoverClustering cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event:

    SQL Network Name (C2K5NET1)
    contoso.lab
    Unable to create computer account
    Access is denied.

    CCLUSTER$

    The Application Event log was indeed of great help since it included and Event ID (1194). A quick search into the Microsoft Support Web site showed detailed information for this event. According to this information, each clustered service or application on Windows Server 2008 is associated with a computer account in Active Directory (with the exception on a Hyper-V virtual machine), this account, called Virtual Computer Object (VCO) is created during installation by the High Availability wizard. If this VCO cannot be created for any reason, the clustered service won't work.

    This VCO is created by the cluster account, known as Cluster Name Object (CNO, a different computer account with the same name as cluster), which is created during the initial Create Cluster wizard. These relationships are better explained in the following diagram:

     Source: Microsoft TechNet Library

    If the "application account" does not have the "Create account" permission into Active Directory, the VCO (computer account for service or application) will not be created. In our case the VCO was non-existent in the Directory, a quick look into the CNO security properties showed the "Create account" right was missing. We did add the "Create account" permission and provisioned the VCO manually as described in TechNet Library. After these changes the SQL Server clustered service started successfully, we didn't need to re-install or change any other thing at cluster level.

    You can find more information about the new security model introduced by Windows Server 2008 cluster in the following Knowledge Base article:

    Description of the failover cluster security model in Windows Server 2008, http://support.microsoft.com/default.aspx?scid=kb;EN-US;947049

     

  • SQL Server Cast

    Using Windows Cluster log to troubleshoot SQL Server cluster errors

    • 1 Comments

    There are times where all the SQL Server knowledge in the world won't help us to find a solution to our cluster problem. The following post will show you how to use the Microsoft Windows cluster log file to find the root-cause of a common SQL Server issue (it does not matter what SQL Server version we use for the purpose of this post).

    In this case the customer called to Microsoft PSS due to a problem in which a SQL Server 2000 clustered instance was not starting up for an unknown reason. Looking at the customer Cluster Administrator, this was the situation of the SQL Server 2000 cluster group, called "SHILOH":

     

    We tried to diagnose what was wrong using the SQL Server ERRORLOG file, but we could not find any useful information there. After a few seconds online, the SQL Server service was stopped by Service Control Manager for no apparent reason:

    2008-07-27 17:28:04.51 server    Copyright (C) 1988-2002 Microsoft Corporation.
    2008-07-27 17:28:04.51 server    All rights reserved.
    2008-07-27 17:28:04.51 server    Server Process ID is 304.
    2008-07-27 17:28:04.51 server    Logging SQL Server messages in file 'S:\Program Files\Microsoft SQL Server\MSSQL$SHILOH\log\ERRORLOG'.
    2008-07-27 17:28:04.53 server    SQL Server is starting at priority class 'normal'(1 CPU detected).
    2008-07-27 17:28:04.55 server    SQL Server configured for thread mode processing.
    2008-07-27 17:28:04.56 server    Using dynamic lock allocation. [2500] Lock Blocks, [5000] Lock Owner Blocks.
    2008-07-27 17:28:04.57 server    Attempting to initialize Distributed Transaction Coordinator.
    2008-07-27 17:28:18.76 spid3     Starting up database 'master'.
    2008-07-27 17:28:19.30 server    Using 'SSNETLIB.DLL' version '8.0.2039'.
    2008-07-27 17:28:19.32 server    SQL server listening on 192.168.0.60: 1131.
    2008-07-27 17:28:19.32 spid3     Server name is 'SHILOH\SHILOH'.
    2008-07-27 17:28:19.32 spid5     Starting up database 'model'.
    2008-07-27 17:28:19.33 spid8     Starting up database 'msdb'.
    2008-07-27 17:28:19.33 spid9     Starting up database 'pubs'.
    2008-07-27 17:28:19.34 server    SQL server listening on TCP, Shared Memory, Named Pipes.
    2008-07-27 17:28:19.34 server    SQL Server is ready for client connections
    2008-07-27 17:28:19.34 spid10    Starting up database 'Northwind'.
    2008-07-27 17:28:20.30 spid5     Clearing tempdb database.
    2008-07-27 17:28:21.49 spid5     Starting up database 'tempdb'.
    2008-07-27 17:28:21.87 spid3     Recovery complete.
    2008-07-27 17:28:21.87 spid3     SQL global counter collection task is created.
    2008-07-27 17:28:42.29 spid3     SQL Server is terminating due to 'stop' request from Service Control Manager.

     

    Application Event log was full of the following error, but the cryptic message description wasn't of any help to us:

    Event Type:    Error
    Event Source:    MSSQL$SHILOH
    Event Category:    (3)
    Event ID:    19019
    Date:        7/27/2008
    Time:        5:28:19 PM
    User:        N/A
    Computer:    JORGEPC-CN1
    Description:
    The description for Event ID ( 19019 ) in Source ( MSSQL$SHILOH ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: [sqsrvres] ODBC sqldriverconnect failed
    .
    Data:
    0000: 4b 4a 00 40 01 00 00 00   KJ.@....
    0008: 07 00 00 00 53 00 48 00   ....S.H.
    0010: 49 00 4c 00 4f 00 48 00   I.L.O.H.
    0018: 00 00 00 00 00 00         ......
     

     

    System Event log did show a very generic error about Cluster service failing to bring SQL Server service online:

    Event Type:    Error
    Event Source:    ClusSvc
    Event Category:    Failover Mgr
    Event ID:    1069
    Date:        7/27/2008
    Time:        5:27:38 PM
    User:        N/A
    Computer:    JORGEPC-CN2
    Description:
    Cluster resource 'SQL Server (SHILOH)' in Resource Group 'SHILOH' failed.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

     

    It was clear that something related to the Cluster service was causing the SQL Server service to go offline.

    Windows Server 2003 includes two useful cluster log files where all cluster activity is logged on: cluster.log and cluster.oml. These two files can be located into the C:\Windows\Cluster folder. cluster.log is the file where all the cluster activity is collected in a very verbose way, while cluster.oml is where we can find mappings between globally unique identifiers (GUIDs) for each cluster resource and resource names (a kind of Rosetta stone to understand cluster.log file). You can find more information about these and other cluster log files at the Microsoft TechNet article How a Server Cluster Works.

    At first sight cluster.log can look very intimidating but we just need to take the specific time where the SQL Server event happened and search for that time on the cluster log. Be aware that cluster.log files are stored in GMT time while Windows Event Logs are stored at local time, depending on your time zone you will have to add/subtract one or more hours. In my case I am located on GMT+2 so I know I I have to look for events logged on cluster.log two hours in advance.

    I knew the first Application Event log took place around 17:28 so I did look for anything abnormal around 15h28m:

    0000067c.00000870::2008/07/27-15:28:07.524 INFO [Qfs] QfsCloseHandle 424, status 0
    0000067c.00000870::2008/07/27-15:28:07.524 INFO [Qfs] QfsDeleteFile C:\DOCUME~1\CLUADMIN\LOCALS~1\Temp\CLSA8.tmp, status 0
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Native Client][SQL Server]Login failed for user 'CONTOSO\CLUADMIN'.
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Named Pipes Provider: No process is on the other end of the pipe.

    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Communication link failure
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Named Pipes Provider: No process is on the other end of the pipe.

    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
    00000718.00000ad4::2008/07/27-15:28:19.668 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Communication link failure
    00000718.00000ad4::2008/07/27-15:28:19.678 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] ODBC sqldriverconnect failed
    00000718.00000ad4::2008/07/27-15:28:19.678 ERR  SQL Server <SQL Server (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 08S01; native error = e9; message = [Microsoft][SQL Native Client]Named Pipes Provider: No process is on the other end of the pipe.

    There were plenty of errors (code ERR) on the cluster.log file, but usually the first ones are the more interesting to look for:

    (SHILOH)>: [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Native Client][SQL Server]Login failed for user 'CONTOSO\CLUADMIN'.

    In our case CONTOSO\CLUADMIN was the Cluster service account use by Windows Cluster. As you probably know, this account needs to have access to SQL Server in order to perform the IsAlive and LooksAlive poll intervals to detect a resource failure.

    In this case the problem was that the BUILTIN\Administrators SQL Server account has been removed as part of a Company security guideline. In a default installation, removing this group from the SQL Server logins effectively removes the ability for the Cluster service to log into SQL Server and perform the IsAlive and LooksAlive checks. This is a fairly well known issue discussed on Knowledge Base article KB291255.

    In other cases you will see events in cluster.log related to some "cryptic" resources, like the following ones:

    0000067c.00000f5c::2008/07/27-15:27:40.070 INFO [FM] FmpOnlineResourceList: trying to bring resource c7018774-2673-4350-ad88-196e4aca3f95 online
    0000067c.00000f5c::2008/07/27-15:27:40.070 INFO [FM] OnlineResource: c7018774-2673-4350-ad88-196e4aca3f95 depends on 582210d0-c29d-40a5-af8a-bbbc08c3e08d. Bring online first.

    cluster.log does not usually show the resource name but the resource GUI (like "resource ce12eb0d-a9d0-4e4a-8531-f3d70cad8c6c"). By using the cluster.oml file we can translate the GUID string to a specific cluster resource:

    00000644.000006b0::2008/05/15-11:41:40.621 OBRENAME "Resource" "c7018774-2673-4350-ad88-196e4aca3f95" "SQL Server Fulltext (SHILOH)"

    00000664.000006b8::2008/05/15-11:36:29.428 OBRENAME "Resource" "582210d0-c29d-40a5-af8a-bbbc08c3e08d" "SQL Server (SHILOH)"

    In this cluster.log sample section Cluster service is reporting that it needs to start SQL Server service first in order to bring Full Text Search service online.

    Additional Information

    To make your sysadmin life easier, Microsoft has released a utility to simplify cluster.log analysis called Cluster Diagnostics and Verification Tool (ClusDiag.exe). This tool can be used to read in a more user-friendly way cluster log files from Windows Server 2003 and 2000 clusters. Here you can see a screen capture of this tool in action:

     

    In Windows Server 2008 cluster.log file is not present in Cluster folder anymore. Even though you can still access cluster.log file on a Windows Server 2008 cluster; please refer to the Overview of Failover Clustering with Windows Server 2008 document in this Microsoft web page for more information about the changes incorporated into Windows Server 2008 cluster. I have extracted here the relevant part of the text:

    The text-file-based cluster log is also gone. Event trace logging (.etl) is now enabled via Event Tracing for Windows (ETW). Default log sizes vary and can be modified with cluster installs logs (Operational and ClusterLog). There is new functionality built into the command line. The cluster.exe tool allows you to dump the trace log into a text file. This file looks similar to the cluster log used in previous versions of failover clustering. Use the cluster.exe Log /Generate command to see this log.

  • SQL Server Cast

    Enabling Certificate for SSL on a SQL Server 2005 Clustered Installation

    This week I has been working on a case about the installation of a certificate on a SQL Server cluster. My customer wanted to use certificates to encrypt client connections via Secure Sockets Layer (SSL), a method supported by SQL Server 2005 that allows to secure communications between clients and database servers (you can read more information about this functionality on Books Online). Although my customer was currently working with SSL on SQL Server standalone servers he was finding problems installing the certificate on a new cluster. Although setting up the certificate and encryption in a standalone installation is pretty straightforward (see KB316898) cluster installation differs, and requires extra configuration steps.

    Obtaining the certificate

    Since I was not familiar with the certificate installation on a clustered SQL Server instance, I started to read the available documentation in SQL Server Books Online, and more specifically, the how-to provided in the Enable Encrypted Connections to the Database Engine (SQL Server Configuration Manager) article and the Microsoft Knowledge Base article KB316898, How to enable SSL encryption for an instance of SQL Server by using Microsoft Management Console (this one only intended for standalone installations).

    First I needed a certificate to do the testing. I started installing an stand-alone root certification authority (CA) on my lab domain following the guidelines describe here. I did also installed the convenient web enrollment feature, which allows a client to request a certificate from a CA using a web browser (if you do not want to deal with all the CA stuff, read "A Simpler Method to Obtain Certificates for Testing" at the end of this post).

    I did request the certificate from one of the SQL Server nodes. According to the Books Online article, the certificate should be installed using the SQL Service account and it has to be placed on the local computer > personal certificates store:

    Certificates are stored locally for the users on the computer. To install a certificate for use by SQL Server, you must be running SQL Server Configuration Manager under the same user account as the SQL Server service unless the service is running as LocalSystem, NetworkService, or LocalService, in which case you may use an administrative account.

    This is because the certificate contains a private key that needs to be accessed by SQL Server service account for the SSL communication to work. If we use any other user account (say, local administrator) SQL Server service account will not be able to access the private key and the authentication will fail. The certificate can be optionally installed on the current user certificate store, but installing it on the computer certificate store makes the certificate available for any user.

    I did logon in the server using the SQL Service account, since this is a cluster, this has to be a domain account. I did open the web enrollment site (by default under http://CAName/certsvr, where CAName is the Certification Authority's hostname) and requested a new Server Authentication Certificate. The most important thing at this point is to specify the fully qualify DNS name of the clustered instance in the "Name" property for this certificate; the SSL encryption will not work if we specify the name of the cluster node:

    If you want to use encryption with a failover cluster, you must install the server certificate with the fully qualified DNS name of the failover clustered instance on all nodes in the failover cluster. For example, if you have a two-node cluster, with nodes named test1.your company.com and test2. your company.com and a failover clustered instance of SQL Server named fcisql, you must obtain a certificate for fcisql.your company.com and install the certificate on both nodes.

    In my case, the SQL Network Name cluster resource was called "SQL-2005" so I did specify sql-2005.contoso.lab as Name property:

    Certficate Request using Web Enrolment

     

    While requesting the certificate, make sure that the option "Mark keys as exportable" is enabled so the private key is available when exporting and importing the certificate in the rest of the cluster nodes. I did also check the "Store certificate in the local computer certificate store" so I do not have to import the certificate later. After submitting the certificate request I did back to my CA and approved the request. Finally I returned to the web enrollment site to install the certificate.

    At this time the server authentication certificate was installed but I still needed to install the root CA certificate; the Books Online article linked earlier explains why this is needed:

    If the instance of SQL Server is running on a computer that has been assigned a certificate from a public certification authority, identity of the computer and the instance of SQL Server is vouched for by the chain of certificates that lead to the trusted root authority. Such server validation requires that the computer on which the client application is running be configured to trust the root authority of the certificate that is used by the server.

    If the root CA certificate is not installed, the SQL virtual server certificate cannot be verified against a a trusted certification authority, this can be quickly checked accessing the installed certificate properties:

    Certificate Properties

     

    I did obtain a CA root certificate from my test Certification Authority and installed it on the Trusted Root Certification Authority store for the local computer.

    The certificate must meet some requirements to be used for SSL (the requirements can be found on the Certificate Requirements section of this Books Online article). To make sure this certificate fulfilled al the requirements I did export the certificate to a local directory and check its details using certutil.exe command-line utility (more info about this utility can be found here). Additionally, I did also export the SSL certificate from the first cluster node (we will need to import this certificate on each of the cluster nodes later).

    Make the Certificate Works for SSL

    The certificate used by SQL Server to encrypt connections is specified in the following registry key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.x\MSSQLServer\SuperSocketNetLib\Certificate

    This key contains a property of the certificate known as thumbprint that identifies each certificate in the server. I did find out that in a SQL Server clustered installation this key is not correctly updated with the corresponding certificate thumbprint. In my case, after importing the certificate, the registry key did show a null value:

    Certificate Registry Key in SQL Server

     

    This registry key should contain the thumbprint property of the certificate we want to use so I did copy the thumbprint from the certificate details tab and pasted it into Notepad to remove the extra spaces:

    Certificate Thumprint

     

    and finally copied the corresponding thumbprint string on the "Certificate" registry key (the "Certificate" registry key requires the certificate thumbprint to be included with no extra spaces):

    Editing certificate thumbprint in Notepad

     

    Just after the registry change, I performed a cluster failover on the second cluster node and rebooted the first cluster node.

    On the second cluster node I imported both the certificate issued for the virtual SQL Server and the root CA certificate. I did perform the previous steps to have the certificates installed and the thumbprint string copied on the corresponding registry key. Finally, I did also rebooted this second cluster node. Once all the cluster nodes were up and running I checked that all of them had the same thumbprint string on the "Certificate" registry key. If your certificate has been correctly configured, you will see the following message logged on the SQL Server ERRORLOG file on the next service startup:

    2008-02-18-23:37:25.01 Server     The certificate was successfully loaded for encryption

    To enable SSL in SQL, I did change SQL Server network protocol properties for the clustered instance setting the "Force Encryption option" to "Yes". You will need to restart the SQL Server service for the change to take effect, but once it is done all the communication attempts with your database will need to be encrypted:

    Protocol Properties Configuration

     

    Note that if you try to select the corresponding SSL certificate on the "Certificate" tab of the SQL Server Protocols properties, you will see that the installed certificate does not show up. This behavior is a known issue in a clustered installation. SQL Server configuration manager search by default on the local computer personal certificates store and tries to mach an existing certificate with the fully qualified domain name (FQDN) of the local computer. Since the installed certificate is not associated to the cluster node FQDN but with the virtual SQL Server FQDN, the corresponding certificate is not shown on the GUI. To make sure what certificate is in use for SSL check the corresponding thumbprint string on the abovementioned registry key.

    In addition to server, the client has to be also configured to support communication over SSL. To do so I imported the certificate on the client computer using the Certificates snap-in and configured the "Force protocol encryption" to "Yes" under the SQL Native Client Configuration (this option is available via the "Flags" page).

    Testing the Connection

    From my client PC I did test the connection to the remote SQL Server using the "Encrypt Connection" option available in SQL Server Configuration Manager. To verify that the connection was indeed encrypted I fired up a network analyzer to have a view of the traffic between my computer and the remote SQL Server. The SSL protocol was indeed being use in this connection, as we can see in Network Monitor screen capture:

     

    In this post I have tried to provide you with the required procedure to have a certificate installed and configured for SSL in a clustered SQL Server installation. The links included in this post will provide you with a better understanding of this procedure and a much better foundation to implement encrypted client-server communications on your infrastructure.

    A Simpler Method to Obtain Certificates for Testing

    If you are in trouble trying to find a Certification Authority for your tests or if you do not want to deal with the CA installation and configuration process, you can use the MakeCert.exe certificate creation tool. This Books Online article describes how to quickly setup you certificate requirements for your own testing using MakeCert.exe.

  • SQL Server Cast

    SSIS error DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER when connecting to Oracle data source

    • 3 Comments

    This time I want to discuss a support case that took me some days to get fixed. This case may be of interest for anyone having problems connecting a SQL Server installation to an Oracle data source using SQL Server Integration Services (SSIS).

    In this case my customer was trying to connect to an Oracle Server in order to copy several tables into SQL Server 2005. SQL Server Integration Services is a great tool for this purpose, you can configure how the copy job takes place using SQL Server Business Intelligence Development Studio (aka BIDS) and check from an user-friendly interface how each step is executed. BIDS is in fact a version of Visual Studio 2005 (the executable name is devenv.exe) with the components required to work with the different SQL Server services and components.

    My customer was executing BIDS from her own computer without issues but was finding a problem while working with BIDS from the server where SQL Server was installed. The errors she was getting were:

    [Connection manager "mib.bigcorp.com.gca"] Error: SSIS Error Code DTS_E_OLEDBERROR. 
    An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available.
    Source: "Microsoft OLE DB Provider for Oracle"  Hresult: 0x80004005
    Description: "ORA-06413: Connection not open.".

    [OLE DB Source [1]] Error:
    SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. 
    The AcquireConnection method call to the connection manager "mib.bigcorp.com.gca"
    failed with error code 0xC0202009. 
    There may be error messages posted before this with more information on why the
    AcquireConnection method call failed.

    Since the customer was using the Microsoft OLE DB provider for Oracle to setup the connection, I did a research on the available documentation in Microsoft Support website using the terms "oracle", "oledb" and "provider". I did find several documents discussing different problems but one of them was of special interest here:

    INFO: Limitations of Microsoft Oracle ODBC Driver and OLEDB Provider (KB244661)

    SUMMARY: This article describes the limitations of the current version of Microsoft ODBC Driver and OLE DB Provider for Oracle. These components are currently in maintenance mode. No new updates are planned for future releases of these components including their support against versions later than Oracle 8i.

    According to this article, it is not recommended to make use of the Microsoft OLE DB Provider for Oracle if the Oracle server is a version beyond 8i (my customer was using version 9i). This does not mean that the Microsoft OLE DB Provider did not work at all (indeed I was able to preview the remote table in the Oracle server and test connectivity successfully from within SSIS using this provider) but the functionality offered will be reduced and there was a good chance some incompatibility problem happened.

    I asked the customer to install the Oracle client on the server (different versions of the Oracle client software are available here). This client provides the required OLE DB Provider from Oracle that enables connectivity to an Oracle data source. Only some hours later the customer back to me saying "I have installed the Oracle client, but the only Oracle OLE DB component I am able to select in SSIS is the Microsoft one". Ok, I knew that one from a different support incident I had in the past. SQL Server Integration Services runs as a 32-bit application on a 64-bit server, as you can quickly check using Windows Task Manager:

     

    This means that, unless you install the Oracle Client for 32-bit on the 64-bit server, you won't be able to access the Oracle OLE DB Provider. This applies not only for the Oracle OLE DB, but for any other 32-bit provider. You can read a full explanation about why you need the 32-bit components in your 64-bit server here:

    The 32-bit SSIS Designer [BIDS] displays only 32-bit providers that are installed on the local computer. To configure a connection manager to use a specific 64-bit provider, you must install the 32-bit version of the provider on the development computer for use at design time. Even though the 32-bit version of the provider is installed, you can still run the package in 64-bit mode, both in the development environment and after deployment. The 32-bit and 64-bit versions of a provider have the same ID. Therefore, the SSIS runtime will select the appropriate version of the provider to use.

    But having the 32-bit component is not enough, we need both the 32-bit and 64-bit components installed side-by-side. Wit only the 32-bit Oracle client the package will fail when executing from BIDS, this is because there is only one SSIS service on a server, and it is indeed a 64-bit process when running on a 64-bit server as we can see on this Process Explorer screen capture (process name is MsDtsSrvr.exe):

     

     

    Ok, we have now the SSIS package using the Oracle OLE DB provider from Oracle and we have verified that both the 32-bit and 64-bit version of the Oracle client are installed on the Windows server.

    However, the package kept failing with the same error.

    After several hours tracing down the error with the help of Sysinternals Process Explorer, Process Monitor and the Internet we did find the a possible cause for this error. Apparently some Oracle client versions are affected by a bug (Oracle bug number 3807408) that causes the client networking layer to incorrectly parse program locations that contain parenthesis in the path. You can find a brief discussion of this behavior on this thread in Oracle forums.

    As you probably know, on a Windows Server x64 installation, the default program files directory is "C:\Program Files\" but all the 32-bit components are installed in "C:\Program Files (x86)\" directory. Since my customer did not have access to Oracle support at that time, we decide to test if this was the root cause of our problem copying the contents of the "C:\Program Files (x86)\Microsoft Visual Studio 8" folder under a new Program Files folder called "C:\ProgramFilesx86". We started BIDS (devenv.exe) from the new directory and executed the package both from BIDS and SQL Server, this time successfully.

    The trick finally did the job and my customer was able to successfully import the required tables into SQL Sever, however, I do not think this is a good solutions (much less "elegant" solution) and I recommended her to contact Oracle support for a version of the Oracle client no affected by this bug.

  • SQL Server Cast

    Conocer el valor de la Edición Enterprise a través de sencillos ejemplos

    This is the Spanish version of this other post.

    Esta vez sólo deseo anunciar que mi compañero Ignacio Alonso Portillo (Nacho) y yo mismo tenemos planes para comenzar una serie de artículos con ejemplos simples que demuestren, desde un punto de vista práctico, los enormes beneficios proporcionados por aquellas funciones que están sólo disponible en la Edición Enterprise de SQL Server 2005.

    Últimamente, hemos escuchado con más frecuencia a los usuarios decir: “¿Por qué habría de elegir la Edición Enterprise de SQL Server 2005? Por lo que recuerdo de versiones anteriores, el único beneficio de la Edición Enterprise, en comparación con la Estándar, era la posibilidad de instalación en clúster y ahora, con SQL Server 2005, la edición Estándar ya incorpora esta funcionalidad; no necesito “gastar” un dinero extra adquiriendo una licencia de la Edición Enterprise.”

    Sinceramente, esta afirmación me deja un tanto descolocado. :-P

    En primer lugar, no es cierto que el soporte de clúster fuese la única diferencia entre la versión Estándar y la Enterprise en versiones anteriores. Había mucho más que eso. Pero no comentaremos cuestiones sobre SQL Server 2000 ahora, más de 7 años después de que fuese publicado. Esto no te aportaría demasiado valor. En su lugar, preferimos centrarnos inicialmente en SQL Server 2005 y posiblemente extender el estudio a SQL server 2008 más adelante.

    En 2005 hay tanto valor añadido en la Edición Enterprise que preferimos creer que aquellos usuarios que están de acuerdo con el enunciado anterior lo están porque no han recibido la información suficiente acerca del valor del producto, cuáles son sus implicaciones y como pueden ellos beneficiarse realmente de las mismas.

    Por esa razón, nos sentimos en la obligación de mostrarte todos estos beneficios, usando ejemplos sencillos y claros, con el ánimo de ayudarte a tomar mejores decisiones en el momento de elegir la edición de SQL Server más apropiada para satisfacer las necesidades de tu negocio.

    Permanece atento y no pierdas ninguna de estas entregas si quieres tomar las mejores decisiones. ;-)

  • SQL Server Cast

    Understanding the value of the Enterprise Edition with simple examples

    • 1 Comments

    This time I just wanted to announce that my colleague Ignacio Alonso Portillo (aka "Nacho") and myself have plans to start posting some simple examples to demonstrate, from a practical standpoint, the huge benefits provided by all the features which are only available with the Enterprise Edition of SQL Server 2005.

    Recently, we have been, more and more, hearing SQL Server users saying: "Why would I want to choose the Enterprise Edition in SQL Server 2005? It sounds to me that from previous versions of the product, the only benefit I got from the Enterprise edition, compared to the Standard, was that it was cluster aware, and now in SQL Server 2005, the Standard edition also supports failover clustering, therefore I don't need to "waste" any extra money in buying an Enterprise edition license."

    To be honest, that assertion knocks me out. :-P

    First of all, it isn't true that failover clustering support was the only difference between Standard and Enterprise editions in previous versions. There was much more than that. But we won't touch the SQL Server 2000 topic nowadays, after over seven years since it was released. That wouldn't be of too much value for you. Instead, we prefer to focus in SQL Server 2005 initially, and will possibly extend it to cover SQL Server 2008 as well.

    In 2005, there is so much extra value in the Enterprise Edition, that we prefer to believe that those users who agree with the statement above, is just because they haven't really been explained what all that value is, which form it adopts within the product, and how they can actually benefit from it.

    For that reason, we feel it's our obligation to show you all these benefits, using simple and clear examples, with the aim of helping you to take better decisions when it comes to choosing the most appropriate edition of SQL Server to satisfy your business requirements.

    So, keep tuned and don't miss any of those deliveries if you want to make the best decisions. ;-)

  • SQL Server Cast

    CA eTrust Antivirus reports infection by JS/Snz.A on sqlcast blog!

    • 1 Comments

    Update (1st. January): CA has released a new eTrust definitions update that corrects the error (build 5421), apparently this was a false positive; thanks to a reader for reporting on this. No word from CA so far (issue number 16683491 01).

    I have discovered the following virus detection message appearing on Windows computers using CA eTrust antivirus software:

     

     

    This problem seems to appear using the latest eTrust antivirus update (version 31.3, build 5417).

    I have made a brand new installation of my PC with Windows Vista and eTrust only, updating both the OS and antivirus software with the latest updates. I have found that the virus detection pop-up appears whenever I browse to this blog web address.

    A quick search on the Web talks about false positives being reported by CA eTrust antivirus software on some javascript code but I have not been able to find a trusted source of information about this so far. CA does not currently report any information about this virus detection on the support web site.

    I have run two different online antivirus scanners, Windows Live OneCare and Trend Micro HouseCall and none have reported an infection.

    I have contacted CA Technical Support requesting more information about this issue. I will update the blog with a new post as soon as I have more information.

  • SQL Server Cast

    SQL Server Configuration Manager behavior in Clustered installation

    According to Books Online reference "SQL Server Configuration Manager is a tool to manage the services associated with SQL Server, to configure the network protocols used by SQL Server, and to mange the network connectivity configuration".

    One of the greatest things about SQL Server Configuration Manager (SSCM for the sake of brevity) is that you can manage different SQL Server settings in one tool, as opposed to SQL Server 2000 where different tools are needed to perform the same action. The following table summarizes the different configuration options in SQL Server 2000 and 2005:

    Setting

    SQL Server 2000

    SQL Server 2005

    Client Networking

    Client Network Utility

    SSCM

    Server Networking

    Server Network Utility

    SSCM

    Services

    Service Manager

    SSCM

    Startup parameters

    Enterprise Manager

    SSCM

    However if you have made use of SSCM in a clustered installation you have probably noticed something weird...

    Managing Network Components

    Let's start our test with a SQL Server 2005 cluster with two cluster nodes: NEWYORK and BOSTON sharing a single SQL Server 2005 virtual instance called "SQL-2005":

    Suppose we want to enable Named Pipes network protocol on "SQL-2005" virtual instance using SSCM, we will see the following result in NEWYORK node after performing the change and restarting virtual SQL Server instance:

    Oddly enough, if we check now Named Pipes configuration on BOSTON cluster node we will see this protocol as disabled, even after restarting SQL Server service:

    This information can be confusing but this is the expected behavior for SSCM in a clustered instance. The changed network parameter is stored on cluster Quorum and is only refreshed by the rest of cluster nodes during the virtual instance failover. This does not mean we need to move the SQL Server cluster group to each owner, we just need to remember the information will not persist on other cluster nodes until the service is moved to the rest of possible owners, and therefore, the configuration shown by local SSCM will not be accurate.

    Managing Services

    Let's now look at how managing clustered SQL Server services looks in SSCM. On the following image we can find Cluster Administrator and SSCM side by side running on "NEWYORK" node:

    As you can see SQL Server instance "YUKON" is shown as Online in both Cluster Administrator and SSCM. Let's move now SQL Server 2005 Cluster Group resources from NEWYORK node to BOSTON node. After refreshing SQL Server services under SQL Server Configuration Manager we will see the services as Stopped:

    This is happening because SSCM relies on Windows Service Control Manager (SCM) to show the services status and from a local (non-cluster) point of view, SQL Server service is stopped on NEWYORK and running on BOSTON.

    If we want to know the status of SQL Server service from a cluster point of view (non-local) we will need to open the SSCM Microsoft management console using the SQL Server Network Name as parameter:

    SQLServerManager.msc /computer:SQL-2005

    This can be performed from any cluster node and will open a new console where SQL Server services status appear as it is actually shown in Cluster Administrator. This can also be used to check for the virtual server network configuration settings at cluster level.

    Obviously, this will only inform of the status of the services under the specified virtual server after the /computer switch ("SQL-2005" in our example) but will not provide accurate information for other possible instances. In our example, BOSTON cluster node contains an additional non-clustered instance called "YUKON2" which is currently stopped. If we open SSCM in BOSTON using the abovementioned command, we will see the following information:

    As we can see, while the information shown for virtual instance YUKON is accurate while the services status for local instance YUKON2 is not.

    In this post I have tried to explain the limitations you should be aware of when configuring a SQL Server 2005 cluster using Configuration Manager. The behavior explained here can be easily understood keeping in mind how Windows Cluster registry information is replicated among nodes (more information on KB174070). Remember that only using the virtual SQL Server instance as SSCM startup parameter will guarantee accurate information from this tool.

  • SQL Server Cast

    Reducing tempdb size in a SQL Server Cluster (after ALTER DATABASE execution)

    • 1 Comments

    Some weeks ago a colleague from another company called me for help in fixing a database performance problem. The root cause was located on a possible contention issue on tempdb that was causing database to perform badly from time to time. He did find a match between the times of bad performance and the following messages in the SQL Server ERRORLOG file:

    2007-12-03 09:34:31.07 spid1     SQL Server has encountered 2 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL$YUKON\Data\tempdb.mdf] in database [tempdb] (7).  The OS file handle is 0x00000530.  The offset of the latest long IO is: 0x0000008ef65a00

    2007-12-03 09:34:57.04 spid104   SQL Server has encountered 8186 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL$YUKON\Data\tempdb.mdf] in database [tempdb] (2).  The OS file handle is 0x000005FC.  The offset of the latest long IO is: 0x0000004d242000

    [...]

    They have already moved tempdb SQL Server database to a different system partition as recommended in situations where contention problems exist. They did also changed tempdb size from 250 MB to 2,500 MB, which was the maximum size observed for this database during peak time usage. Although the overall database performance was better now, it was not brilliant and the "IO requests taking longer than 15 seconds" messages continued popping up on ERRORLOG. They wanted now to try splitting tempdb in several files, which is also a recommended procedure in tempdb contention problems.

    Unfortunately the partition where tempdb was moved was only 10 GB in size and he was in the need of splitting tempdb in 8 different files (8 was the number of logical processors on the server). Creating 8 tempdb files of 2,500 MB each was not only impossible in this cased but hard to justify from a logical point of view.

    As you probably know, a database cannot be reduced to a size that is smaller that the current size using a simple ALTER DATABASE ... MODIFY FILE instruction. If you try to change tempdb system database size from a Query window you will see the following error message:

    Msg 5039, Level 16, State 1, Line 1
    MODIFY FILE failed. Specified size is less than current size.

    Surprisingly, when reducing the database size from SQL Server Management Studio, the GUI does not report any error, but the database size is not changed. This seems like a step back from SQL Server 2000 where the same action in Enterprise Manager caused the following error to be raised:

    Error 21335: [SQL-DMO]The new DBFile size must be large than the current size.

    (UPDATE: An in depth explanation of this behavior can be found on this MSDN blog)

    This error message is expected. For data and log files the new size should be larger than the current size. If we want to reduce the database size we have to shrink it and return the freed up space to the operating system (providing the database files can be shrunk). My first though was to point my colleague to Knowledge Base article KB307487 where the shrinking process for tempdb is explained, but while reading the article I discovered the following paragraph:

    "There are limitations for use of the DBCC SHRINKDATABASE command on the tempdb database. The target size for data and log files cannot be smaller than the size specified when the database was created or the last size explicitly set with a file-size changing operation such as ALTER DATABASE with the MODIFY FILE option or the DBCC SHRINKFILE command."

    They indeed had executed and ALTER DATABASE command with the MODIFY FILE option when they changed the original tempdb database size. Shrinking tempdb was not an option in this case. The only option was to execute a new ALTER DATABASE command with a smaller size than the current one, as specified in the "Method 1" section in the KB.

    This requires the SQL Server service to be started in minimal configuration mode with -c and -f startup options, but how to do this in a cluster? For standalone servers we have to execute the following instruction:

    sqlservr -c -f [ -s%InstanceName% ]

    where -s will be a required parameter for a named instance. If we are using a default instance (MSSQLSERVER) -s switch is not needed. If we have installed a SQL Server named instance the instruction to use under a standalone server would be:

    sqlservr -c -f -sMSSQL$YUKON

    where "YUKON" corresponds with the instance name. On a cluster installation, the instance named is compound using the "SQL Server Network Name" cluster resource. In my example the Cluster Administrator shows the SQL Server resources as follows:

    so the instruction to startup the SQL Server service would be:

    sqlservr -c -f -sSQL-2005\YUKON

    This instruction should be executed from a command prompt window (cmd.exe), the SQL Server service will start logging the ERRORLOG information on the console window. Of course, we can also add the -f startup option to the SQL Server service in SQL Server Configuration Manager (2005) or Enterprise Manager (2000) and restart; in this case we do not need to use -c. In SQL Server Configuration Manager we have to go SQL Server properties on Advanced tab, Startup Parameters while on SQL Server Enterprise Manager we go to the instance properties, General tab, Startup Parameters button.

    Note: From a cmd.exe window use the full path to sqlservr.exe adding -c and -f (with hyphen). If you use the NET START command use the SQL Server service name with the /c /f (with slash)

    Once SQL Server service is started, we will be able to change the tempdb database size. You can use sqlcmd command-line utility (or osql if you are running under SQL Server 2000) opening a new cmd.exe windows to perform this change. The following example will configure the initial tempdb database size to 70 MB:

    C:\>sqlcmd -S SQL-2005\YUKON
    1> USE master
    2> GO
    Changed database context to 'master'.
    1> ALTER DATABASE tempdb
    2> MODIFY FILE ( NAME = tempdev, SIZE = 70MB )
    3> GO
    1> exit

    Alternatively you can execute this same ALTER DATABASE instruction from a Query window in SQL Server Management Studio or Enterprise Manager. If you have used the GUI to change the startup options do not forget to remove the -f startup parameter from the SQL Server service option in Configuration Manager or Enterprise Manager once the change is done.

  • SQL Server Cast

    How to check Fragmentation on SQL Server 2005

    • 5 Comments

    I have been asked this question several times: Is there a tool in SQL Server where I can easily check database fragmentation? The answer, as with many other things, would be "Yes" and "No".

    Fragmentation is a "natural" process in the database lifecycle. If the database is frequently updated via INSERT, UPDATE, or DELETE statements we can expect it to become fragmented over the time. If database indexes are fragmented, SQL Server query optimizer will take not-so-optimal decisions when using and index to resolve a query, affecting the overall query performance. SQL Server provides tools to check and fix database fragmentation but we need first to understand how to use these tools. This is something just a little bit more difficult than simply firing up Windows defrag.exe to check for filesystem fragmentation.

    Let's start with some theory so we can better understand what fragmentation is and how it affects database performance. There are two different types of fragmentation in SQL Server: Internal and External. Internal fragmentation is the result of index pages taking up more space than needed. It is like having a book where some of the pages are left blank; we do not know what pages are blank until we read the entire book and same applies for SQL Server, which has to read all the pages in the index wasting extra-time and server resources in the empty pages. External fragmentation occurs when the pages are not contiguous on the index. Following the book analogy, it is like having a book where pages are not ordered in a logical way (page 1, then page 2, then page 3 and so on) causing you to go back and forward to compound the information and make sense of the reading. Heavily used tables that contains fragmented indexes will impact your database performance. If you are still unsure about what external and internal fragmentation means, refer to this article published on SQL Server Magazine by Kalen Delaney (although written for SQL Server 2000, definitions are still valid for SQL Server 2005).

    In our minds we associate fragmentation to something bad that should be avoided at any cost but, is this always the case? Not with internal fragmentation. If your tables are frequently changed via UPDATE and INSERT operations, having a small amount of free space on the index or data pages (having a small amount of internal fragmentation) will cause a new page addition (page split) in order to allocate that new data. This leads ultimately to external fragmentation since the new added data page won't be probably adjacent to the original page. Internal fragmentation, therefore, can be desirable at low levels in order to avoid frequent page split, while external fragmentation, however, should always be avoided. Please understand that by 'low levels' I simply mean 'low levels'. The amount of free space that can be reserved on a index can be controlled using the Fill Factor.

    It is also important to understand that by external fragmentation we do not mean filesystem fragmentation or disk fragmentation.

    SQL Server 2005 introduces a new DMV (Dynamic Management View) to check index fragmentation levels: sys.dm_db_index_physical_stats. Although SQL Server 2005 still supports the SQL Server 2000 DBCC SHOWCONTING command, this feature will be removed on a future version of SQL Server. Here you can check the differences between both instructions when checking for fragmentation on the HumanResources.Employee table in the sample database AdventureWorks:

    • DBCC SWOWCONTING example:

    USE AdventureWorks;
    GO
    DBCC SHOWCONTIG ('HumanResources.Employee')
    GO

    DBCC SHOWCONTIG scanning 'Employee' table...
    Table: 'Employee' (869578136); index ID: 1, database ID: 6
    TABLE level scan performed.
    - Pages Scanned................................: 7
    - Extents Scanned..............................: 3
    - Extent Switches..............................: 2
    - Avg. Pages per Extent........................: 2.3
    - Scan Density [Best Count:Actual Count].......: 33.33% [1:3]
    - Logical Scan Fragmentation ..................: 14.29%
    - Extent Scan Fragmentation ...................: 33.33%
    - Avg. Bytes Free per Page.....................: 172.6
    - Avg. Page Density (full).....................: 97.87%

    • sys.dm_db_index_physical_stats DMV example:

    USE AdventureWorks
    GO
    SELECT object_id, index_id, avg_fragmentation_in_percent, page_count
    FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'), OBJECT_ID('HumanResources.Employee'), NULL, NULL, NULL);

    In this last example I have selected only relevant information to show from the DMV, you will see that DMV can provide much more details about the index structure. In case you wanted to show fragmentation details for all the objects in the AdventureWorks database, the command would be as follows:

    SELECT *
    FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'), NULL, NULL, NULL , NULL);

    Please, refer to SQL Server 2005 Books Online for more information on sys.dm_db_index_physical_stats syntax.

    How do we know if our database is fragmented? We have to pay attention to the avg_fragmentation_in_percent value. A value between 5-30% indicates moderate fragmentation, while any value over 30% indicates high fragmentation (book pages missing any order).

    The avg_page_space_used_in_percent is another value that it is worth to look closely. This value represent the amount of spaced used in the indexes. A value below 75% is usually associated to internal fragmentation (more blank pages on our book than recommended).

    In order to reduce fragmentation we will have to reorganize or rebuild the indexes. Choosing between reorganizing or rebuilding depends on the resultant values. For moderate fragmentation index reorganization will be enough, for heavily fragmented indexes a rebuild process is needed. The following table summarizes when to use each one (refer to this Books Online article for an in-depth coverage of this information):

    Reference Values (in %) Action SQL statement

    avg_fragmentation_in_percent > 5 AND < 30

    Reorganize Index ALTER INDEX REORGANIZE

    avg_fragmentation_in_percent > 30

    Rebuild Index ALTER INDEX REBUILD

    REORGANIZE statement is always executed online while REBUILD index is executed offline by default and can be optionally executed while database is accessed by users with the ONLINE statement, which can be a plus for 24x7 environments. This approach, which is only available in SQL Server 2005, has some limitations; refer to the ALTER INDEX statement in SQL Sever Books Online for more details.

    If you are looking for an easy way to automate these processes the SQL Server Books Online reference for the sys.dm_db_index_physical_stats contains a sample script you can implements within minutes. This script will take care of reorganizing any index where avg_fragmentation_in_percent is below 30% and rebuilding any index where this values is over 30% (you can change this parameters for your specific needs). Add a new SQL Server Execute T-SQL statement task to your weekly or daily maintenance plan containing this script so you can keep you database fragmentation at optimum level.

Page 3 of 3 (30 items) 123