I recently came across this interesting issue with SQL Replication. We were trying to create a new publication, and the new publication wizard would just hang. Upon doing some investigation, we found that we were hitting the connect article mentioned here. However, the connect article mentions that the bug is closed as “won’t fix”, so we had to somehow find a way out of the situation. Let me first describe how we narrowed down into the issue:

  • First, check sysprocesses to see which spid is blocking the new publication wizard (or whatever replication operation you’re trying to perform).
  • If you see a system spid, such as 5 or 7 (all spids less than 50 are system spids, as a general rule), then do a DBCC Opentran and see if the same spid shows up.
  • If you see something like this in the output:
    Transaction information for database 'master'.
  • Oldest active transaction:

        SPID (server process ID): 5s

        UID (user ID) : -1

        Name          : user_transaction

        LSN           : (3286:3576:1)

        Start time    : Aug  2 2012  8:04:46:603AM

        SID           : 0x01

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

then you’re likely hitting the same problem.

  • Another thing you might want to check is the locks held by that spid. I checked them using the sp_lock <spid> command, and found this (notice the last one):
    spid dbid ObjId         IndId     Type    Resource        Mode   Status
  • 5      5      0                      0       DB                                   S         GRANT

    5      10    0                      0       DB                                   S         GRANT

    5      1      60                    0      TAB                                  IX        GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      5      1663344990    0      TAB                                  Sch-M GRANT

    5      1      60                    1     KEY (fa00cace1004)          X         GRANT

  • Next, check the SQL Server errorlog, and see if you can spot any messages point towards “script upgrade”. An example would be:
  • 2012-08-02 08:04:06.500 Logon        Error: 18401, Severity: 14, State: 1.

    2012-08-02 08:04:06.500 Logon        Login failed for user 'maverick'. Reason: Server is in script upgrade mode. Only administrator can connect at this time. [CLIENT: 150.232.101.86]

  • Also, see if you can spot messages related to upgrading replication in the errorlog. In my case I found quite a few:
    2012-08-02 08:04:11.780 spid5s       Database 'master' is upgrading script 'repl_upgrade.sql' from level 167774691 to level 167777660.
  • 2012-08-02 08:04:13.010 spid5s       Upgrading distribution settings and system objects in database distribution.

    2012-08-02 08:04:17.590 spid5s       Upgrading publication settings and system objects in database [Cash].

    2012-08-02 08:04:18.270 spid5s       Upgrading publication settings and system objects in database [Sellers].

    2012-08-02 08:04:18.620 spid5s       Upgrading publication settings and system objects in database [Revenue].

  • What this tells us is that there was a patch applied at some point, and it failed while upgrading replication. Now, every time SQL Server starts up, it tries to upgrade the replication. Let’s see if we can find an upgrade failure message as well. For example, you may find something that looks like this:
    2012-08-02 08:04:46.470 spid5s       Upgrading subscription settings and system objects in database [XYZ].
    2012-08-02 08:04:46.600 spid5s       Index cannot be created on object 'MSreplication_subscriptions' because the object is not a user table or view.
    2012-08-02 08:04:46.600 spid5s       Error executing sp_vupgrade_replication.
    2012-08-02 08:04:46.600 spid5s       Saving upgrade script status to 'SOFTWARE\Microsoft\MSSQLServer\Replication\Setup'.
    2012-08-02 08:04:46.600 spid5s       Saved upgrade script status successfully.
    2012-08-02 08:04:46.600 spid5s       Recovery is complete. This is an informational message only. No user action is required.
  • Also notice the spid in the aforementioned failure messages. See it? So, because the replication upgrade fails, this system spid holds the lock on some resource, and as a result, we’re unable to perform any replication related activities.

Troubleshooting

So how do we troubleshoot this? Let me list out the steps:

  • Let’s first focus on the exact error we see in the errorlog, which seems to be the reason behind the replication upgrade failing:
  • 2012-08-02 08:04:46.470 spid5s Upgrading subscription settings and system objects in database [XYZ].
    2012-08-02 08:04:46.600 spid5s Index cannot be created on object 'MSreplication_subscriptions' because the object is not a user table or view.
  • We can clearly see that it has an issue with the MSReplication_Susbcriptions object in the XYZ database. I checked on the object using sp_help, and found that it was a synonym.
  • Next, we dropped the offending synonym, and scripted out the MSReplication_Subscriptions object from one of the other databases that had replication enabled. We ran this script in the XYZ database to create the object.
  • As a test, we ran the sp_vupgrade_replication stored procedure explicitly from SSMS, and it completed fine.
  • Next, we restarted SQL, and saw that the script upgrade had completed successfully this time. Subsequent restarts did not result in SQL Server going into script upgrade mode. This meant that the system spid was no longer holding the lock, and we could now perform replication related activities successfully.

Hope this helps. Comments/feedback are welcome.