Unable to create Maintenance Plans or DBMail Fails after service pack/Hotfix is applied on SQL Server 2005 fail over cluster instance running on Windows 2008 cluster

Unable to create Maintenance Plans or DBMail Fails after service pack/Hotfix is applied on SQL Server 2005 fail over cluster instance running on Windows 2008 cluster

  • Comments 11

We have seen a couple of cases, wherein we are unable to create maintenance plan or DBMail Fails after applying SP on SQL Server 2005 failover cluster instance running on Windows 2008 Cluster.

Resource Database and other system databases may not be updated when SP and other hotfixes are applied to SQL Server 2005 clustered instance running on Windows 2008 cluster servers. 

The setup will complete successfully and will not report any errors. This may lead to unforeseen errors since the system objects are not upgraded.

An example of such a problem can be not being able to create maintenance plans. You may notice below error while creating a maintenance plan in such scenario,

Invalid column name 'from_msx'.
Invalid column name 'has_targets'. (Microsoft SQL Server, Error: 207)

OR DMail Fails with error

The activated proc [dbo].[sp_sysmail_activate] running on queue msdb.dbo.ExternalMailQueue output the following:  'Error executing extended stored procedure: Invalid Parameter' 

Once the SP setup or any hotfix setup is completed, you should check the version of SQL Server through the following query:

SELECT @@VERSION               -- This should return the upgraded build number

To check the version of Resource Database:

SELECT SERVERPROPERTY('ResourceVersion')          -- This should return the upgraded build number

If the resource database version is not consistent with SQL Server version after SP or hotfix is applied successfully. Please check the SQL Server service pack or hotfix setup logs:

Error from SQL9_Hotfix_KB955706_sqlrun_sql.msp:

 

MSI (s) (B4!B4) [18:34:26:092]: PROPERTY CHANGE: Adding SqlClusterSec property. Its value is '1'.

.

.

.

.

 MSI (s) (B4!1C) [18:35:21:520]: Note: 1: 2262 2: _sqlAction 3: -2147287038 Function=ComponentUpgradeInstall

Skipping Action: ComponentUpgradeInstall (Condition is false)

<EndFunc Name='LaunchFunction' Return='0' GetLastError='203'>

MSI (s) (B4:38) [18:35:21:520]: Doing action: RSComponentUpgrade.3EA9D9BF_D9D2_4023_B2A7_9E2137B2FB1B

Action ended 18:35:21: ComponentUpgradeInstall.3EA9D9BF_D9D2_4023_B2A7_9E2137B2FB1B. Return value 1.

MSI (s) (B4:38) [18:35:21:520]: Transforming table CustomAction.

 

If you see the above messages in the log, then service pack or hotfix install is incomplete. You can check the above log at the following location: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix and you can search for the string SqlClusterSec and if its value is 1 as shown above then please refer to cause and resolution section.

 

Cause:
This issue occurs if the node names are in lower case as displayed in computer properties. The datastore of the SQL Server setup presents the node names in the same case, however Windows Installer presents them in the upper case. The setup thinks that we are running it from the passive node and it does not upgrade Resource Database and other system databases.

 

Resolution :

This issue has been fixed in Cumulative Update 9 for SQL Server 2005 SP3. Please refer to the KB below for more information:

http://support.microsoft.com/kb/978308

 

 

Workaround:

1.        Move all groups To Node 1.

2.        Modify the possible owners for the SQL Server Instances so that Only Node 1 is the possible owner.

3.        Evict Node 2.

4.        On Node 2 Change the Computer Name so it is all Upper Case

5.        Join it back into the cluster.

6.       Modify the possible owners for the SQL Server Instances so that Node 1 and Node 2 are now the possible owners.

7.        Move All Groups To Node 2.

8.        Modify the possible owners for the SQL Server Instances so that Only Node 2 is the possible owner.

9.        Evict Node 1.

10.     On Node 1 Change the Computer Name so it is all Upper Case.

11.     Join it back into the cluster.

12.     Modify the possible owners for the SQL Server Instances so that Node 1 and Node 2 are now the possible owners.

13.     Move SQL Back to the correct nodes etc.

14.     Re-install Service Pack/Hotfix so that the system databases are upgraded and to avoid the issue.

 Note:

The above workaround may not work in all the cases. On evicting and changing the node name to UPPER Case, the 'add node' wizard might present the node name in lower case. If you encounter the node name is still coming back in lower or mixed case, add the node from the command prompt.

Below is the command which worked for us:

CLUSTER /Cluster: <ClusterName> /ADDNODE /NODE: <NodeNameInUpperCase>

 

Santosh Kumar Goli
SE, Microsoft SQL Server.

 

Reviewed by

Anurag Sharma
SQL Server Escalation Services

 

Leave a Comment
  • Please add 7 and 4 and type the answer here:
  • Post
  • We have this issue, and we also, today, had some database corruption (allocation errors).

    Microsoft SQL Server 2005 - 9.00.4035.00 (X64)   Nov 24 2008 16:17:31   Copyright (c) 1988-2005 Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 6.0 (Build 6001: Service Pack 1)

    Resource Version 9.00.1399

    1.  Can these two be related (versioning issue and corruption)--I'm leaning no, but asking anyway?

    2.  This is a production system, and you're telling us this is the only, current, resolution (I realize I could set up an another cluster and move the db, but that takes a lot of time, too)?

    Thanks

    Here is just a bried output from the DBCC CHECKDB:

    Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 5 pages from (1:2337432) to (1:2345519). See other errors for cause.

    CHECKDB found 32 allocation errors and 0 consistency errors not associated with any single object.

  • Hi Kevin,

    We have not seen any corruption issues on user databases with the resource database not being updated when the SP is applied. I don't think they are related.

    Yes, the only resolution is to either uninstall and reinstall SQL Server after updating the node names to upper case or move databases to different server.

    Thanks,

    Santosh Goli

  • I've got the same exact problem: what if I proceed one node at a time?

    1) un-join first node from the cluster (should this also remove SQL server from the unjoined node?)

    2) Remove SQL server, if not automatically removed by the unjoin procedure

    3) change server name to uppercase

    4) join again to cluster (sql server should install automatically?)

    5) install SQL server

    6) switch the sql server instance to this node

    repeat for every other nodes

    What do you think about this procedure?

    Bye, Dario

  • I just did something else... upgraded the sql server to sp3 and copied brutally the resourcedb files from another isntallation to "data" dir in the cluster. Now all seems to work. Did i something crazy??

    Bye, Dario

  • Hi Dario,

    When SQL Server 2005 service pack or hotfix setup runs it replaces resource database files and then runs upgrade scripts in system databases.

    We could replace the resource database files but we also need to run upgrade scripts. Which we do not recommend to run manually. If the upgrade scripts do not run we might see adverse results. For example DBMail does not work when the upgrade scripts do not run.

    If you have any questions, let me know.

    Thanks,

    Santosh Goli

  • Hi Dario,

    You could proceed removing SQL and unjoining the node one at a time.

    1. You have to run the setup to uninstall SQL from one of the nodes. Steps in the below site:

    http://msdn.microsoft.com/en-us/library/ms191545(SQL.90).aspx

    2. Unjoin the node from the cluster. Rename the node to upper case and then add it to Windows Cluster.

    3. Install SQL again on this node. You can follow the steps in the above mentioned web site.

    4. Once, it is installed successfully, you can failover SQL and follow the same steps from 1 to 3.

    5. Then you can install service packs and hotfixes.

    Thanks,

    Santosh Goli

  • I have also seen the following issue in Replication due to the root issue discussed in this blog (system databases not updated):

    'Procedure or function sp_MSadd_logreader_history has too many arguments specified.'

    The workaround provided here resolved the issue.

    I imagine one could run into other issues where a system SP or Function doesn't have the correct pamaters expected by other parts of the system.

    Thanks,

    Gary

  • You should never uninstall SQL from one Node for this situation. You instructed this person to re-install on the one Node after re-joining the cluster. It is my experience that you should let SQL manage the cluster install. If it installed on both originally then you should uninstall on both before removing from the cluster. This solution will cause you bigger problems than the original issue. If you remove a node from the cluster then you should add it back in with SQL still on it. If not you'd be better off doing a database backup and removing SQL from both but I wouldnt suggest that either. Just resolve the issue using other recommendations not the one above.

  • I've got exactly the same problem. But what happens when changing computernames? We have different ressources on the cluster which partly depends on node name. (SAP, Backup, msdtc) and what's about the default cluster ressource when changing nodenames? Is there a way to remove SQL2k5 Cluster and reinstall it with computernames in lower case? We need to go produktive in one week and we can't rebuild the cluster in this time.

    Thanks,

    Birgit

  • We suggested only to change the case of the node names to UPPER. All the cluster resources including SQL Server 2005 resources are not case sensitive. While the service pack/hotfix setup is running it converts the node name to upper case in Win 2008 and does a case sensitive comparison, which fails if the node names are in small case or mixed case and thinks that the setup is being run on Passive Node. You can change the name of the node to UPPER CASE and it should not cause any issues to other resources also on cluster.

    Thanks,

    Santosh Goli.

  • Under Cause - "This issue occurs if the node names are in lower case as displayed in computer properties" needs a small correction. Its lower as well as Mixed case.

Page 1 of 1 (11 items)