I thought it would be helpful to post some Replication Distribution Agent troubleshooting steps to help you not only understand more about Replication components, but to also show how I approach Replication problems.

Problem:

SQL 2005 Distribution Agent was failing with following messages

2009-12-29 00:17:53.917 Agent message code 8144. Procedure or function sp_MSupd_dboAddress has too many arguments specified.
2009-12-29 00:17:53.932 ErrorId = 35, SourceTypeId = 0
ErrorCode = '8144'
ErrorText = 'Procedure or function sp_MSupd_dboAddress has too many arguments specified.'
2009-12-29 00:17:53.948 Adding alert to msdb..sysreplicationalerts: ErrorId = 35,
Transaction Seqno = 000a261100001560013e00000000, Command ID = 1

This error appeared in Replication Monitor.  To capture text of the message we added the following parameters to the Distribution Agent job.  The –output parameter writes Agent log to text file.  This makes it easier to copy and paste actual text of message when searching for solutions.

-output c:\temp\dist.log and -commitbatchsize 1


Background:

Customer had setup a SQL 2005 Transactional Replication publication.  The database had a few very large tables which were generating timeouts when being applied to the subscriber.  To get around this problem, the subscriber was setup using a Backup/Restore from the Publisher.  Steps-by-step is documented in SQL BOL.  Distribution Agent worked for a few minutes moving changes, then failed with error above.

How is works:

In Transactional Replication, the LogReader agent is picking up committed transaction from the transaction log and writing the data to the Distribution database.  The Distribution Agent is picking up those data commands and writing them to the Subscriber.  The data is not stored in TSQL, i.e. “update table set column a = 1”, but instead as parameters to a stored procedure.  The Distribution database uses these parameters to call Replication recreated Insert/Update/Deleted stored procedures on the subscribers.  These SP have format sp_MSups_<schema><tablename> as in sp_MSupd_dboAddress.

Approach:

The error indicates a mis-match in the number of parameters (columns) in the command stored in the Distribution DB compared to the number of columns in the Replication created stored procedure.  The approach is to see which one is wrong!

First step was to compare the SCHEMA of the Published database to the Subscriber.  Since the Subscriber was a backup of the Publisher, I suspected them to be the same, but you never know unless you check it first. We executed the following command on both the Pub and Sub and yes, they were identical.

sp_help Address

Next was to look at actual text stored in the Distribution database.  The text is stored as binary so we need the help of a Replication SP sp_browsereplcmds to decipher.  This command takes a number of parameters to reduce the output.  We’re interested in only the transaction which fails.  Do dump these commands we need the Transaction ID.  Thankfully it is included in the error message.

Transaction Seqno = 000a261100001560013e00000000  >>>>drop off trailing 8 0<<<<

sp_browsereplcmds 
    @xact_seqno_start = '0x000a261100001560013e', 
    @xact_seqno_end = '0x000a261100001560013e'

Output

{CALL [sp_MSupd_dboAddress] (,,,,,,,,,,2009-12-28 02:22:10.000,,,,,2009-12-27 02:22:13.683,,,620190,0x008400)}

Okay, so now I know the table schema are the same and I see the command and parameters being stored in the Distribution database so how do they match up to the Sub stored procedure code? To get that we executed command below on the Subscriber.

sp_helptext sp_MSupd_dboAddress

Output:

CREATE procedure [sp_MSupd_dboAddress]
@c1 int,@c2 int,@c3 char(1),@c4 varchar(40),@c5 varchar(40),@c6 varchar(30),@c7 varchar(30),@c8 varchar(30),@c9 varchar(30),@c10 varchar(10),@c11 datetime,@c12 varchar(7),@c13 varchar(7),@c14 char(1),@c15 varchar(50),@c16 datetime,@c17 varchar(4),@c18 va
rchar(2),@pkc1 int
as
begin
update [dbo].[Address] set
[person_id] = @c2
,[address_type] = @c3
. . . .
,[uc_code] = @c18
where [address_id] = @pkc1
if @@rowcount = 0
    if @@microsoftversion>0x07320000
        exec sp_MSreplraiserror 20598
end

Looking at the code I see the last “expected” value is a PrimaryKey used in the WHERE clause to update 1 row.  However, the sp_MSupd_dboAddress parameter list has a binary value 0x008400 as the last parameter.  Clearly these don’t match, but which is right?

What is the expected behavior?

When troubleshooting these problems I like to learn what is the “expected” behavior.  This way I can compare what I should expect to what I’m seeing when it fails.  My next step was to setup a simple Transactional Replication publication using sample AdventureWorksLT database.  When I scripted my subscriber stored procedure, it included the bitmap parameter along with additional logic within the stored procedure to use this extra parameter. 

Since the “broken” subscriber didn’t have this code, I figured it was source of the mis-match, not the parameter list stored in the Distribution database.

How to correct the problem?

SQL Server includes stored procedure sp_scriptpublicationcustomprocs to re-generate new set of subscriber stored procedures.  You execute this first on the Publisher, then take the OUTPUT AS TEXT, and execute it on the Subscriber.  Make sure you increate the TEXT COLUMN WIDTH to about 5000 in the Query Properties window or the code will get truncated at the default 256 characters.  Yeah, I found that one out the hard way.  Thankfully, the code automatically includes all DROP then CREATEs SPs needed by the Subscriber.

sp_scriptpublicationcustomprocs  '<publication>'

Root Cause?

Since the bitmap column was missing it looked like the SPs were written for SQL 2000 subscriber.  May have been caused by a database_compatibility_level setting.  Not sure as we never found root cause just the solution.