Troubleshooting LogReader Error “repldone log scan occurs before the current start of replication”

Chris Skorlinski
Microsoft SQL Server Escalation Services

Below are cause and recommendation for the SQL Replication LogReader error “The specified LSN (…) for repldone log scan occurs before the current start of replication in the log (…).'.

When the LogReader starts, it verifies the Last Distributed transaction (last transaction in msrepl_transactions) still exists in the transaction log.  This confirms the Published database transaction log and the Distribution database match and that the last replicated transaction did make it to the Distribution database.

2010-04-11 18:23:34.437 Publisher: {call sp_repldone ( 0x00046399000076f80009, 0x00046399000076f80009, 0, 0)}
2010-04-11 18:23:34.515 Status: 0, code: 20011, text: 'The process could not execute 'sp_repldone/sp_replcounters' on 'SQL5'.'.
2010-04-11 18:23:34.625 The process could not execute 'sp_repldone/sp_replcounters' on 'SQL5'.
2010-04-11 18:23:34.703 Status: 0, code: 18768, text: 'The specified LSN (00046399:000076f8:0009) for repldone log scan occurs before the current start of replication in the log (00046399:00007708:001a).'.
2010-04-11 18:23:34.812 Status: 0, code: 22017, text: 'The process could not set the last distributed transaction.'

From the error, the LogReader is expecting to find LSN 76f8, but the oldest LSN in the transactions log is only 7714.  There are a group of transactions missing from the Log from 76f8 until 7708.  Further examination of the transaction log shows the oldest LSN is 7714.  How many transactions are missing, we don’t know.  We’ll only be able to confirm missing transactions from Transaction Log Backups.

DBCC OPENTRAN on the Published database also confirms “missing” LSN values match error.

Replicated Transaction Information:
        Oldest distributed LSN       : (287641:30472:26) >>>00046399:00007708:001a matches last row in msrepl_transactions
        Oldest non-distributed LSN : (287641:30484:1)  >>>00046399:00007714:0001 from oldest LSN in the transaction log.

Why can’t we just start the LogReader at 7714?  Because we don’t know what data was missing.  If an INSERT was skipped, then what happens to an UPDATE for that same rows when it gets to the Subscriber?  We’ll get a “PK row not found error”.

If we suspend all updates to the Publisher, we could use sp_repldone to mark 00046399:00007708:001a as the last Replication Transaction then use the TableDiff utility to match the Publisher and Subscriber.  This becomes much greater challenge if the Publisher is online and being updated 24x7.

In SQL 2005/2008 you can also run sp_replrestart to restart the LogReader and begin moving transactions to the Distribution database.  However, again the LogReader will have skipped transactions missing from the log potentially causing out of sync condition between the Publisher and the Subscriber.

Best solution is to reset replication to ensure the Publisher and Subscriber match.

Recommendation
---------------------
1) Use sp_repldone to mark all transactions as being replicated
    http://blogs.msdn.com/chrissk/archive/2010/01/03/using-sp-repldone-to-mark-all-pending-transactions-as-having-been-replicated.aspx

2) Use removedbreplication to drop all publications from this database.

3) Reconfigure Replication for all publications for this database.