What happens if I select Create a snapshot immediately in the Publication Wizard?

Chris Skorlinski
Microsoft SQL Server Escalation Services

When creating a Transactional Replication publication, what happens if I check Create a snapshot immediately and keep the snapshot available to initialize subscriptions?

 

 

image

 

 

This sets the publication property @property='immediate_sync',@value='TRUE' which instructs Replication to keep all the transactions cached in the Distribution database for the entire Retention Period.  Normally these rows are purged as soon as they are replicated to the subscribers, thereby keeping the distribution database lean and quick.

 

Why keep the rows?  When NOT checked, new subscribers would first need the Publisher to generate a new Snapshot, then apply new transactions.  This was added to allow NEW subscribes who joined within the retention period to NOT need a new Snapshot. Replication would apply the OLD snapshot, then apply the 2 or 3 days of queued transactions.

 

This feature also assumes you’ll keep making NEW snapshots sometime during the Retention Period.  For example, when TRUE, NEW subscriber could subscribe at LSN/Time 8, Snapshot 1 is applied, then LSNs 2-8 are applied.  The Subscriber didn’t have to wait for a Snapshot to be generated.  Then a “fresh” Snapshot should be taken just before the 72 hours (retention period) so NEW subscribers join at LSN/Time 12 can get Snapshot 2 and transactions 10, 11, and 12

 

LSN 1

2

3 4 5 6 7 8 9 10 11 12
Snapshot 1     24 hours     48 hours   Snapshot 2 72 hours    

 

As you can see, this features is used when you expect NEW subscribers to be joining during the Retention period and you don’t want to wait for a NEW snapshot to be generated.  But it does require some coordination between Retention period and Snapshot generation.  If your business needs don’t fall into this scenario you’re best off leaving this option blank.

 

High Volume Servers

 

Distribution Cleanup for highly active servers with 60-100+ millions rows cached, database growth and cleanup will be an ongoing issue.  If we can reduce the number of transactions/commands cached in the distribution database we can improve overall replication performance.  

 

For these environments we recommend disabling “immediate_sync” if not needed for business reasons.  When enabled, this setting maintains the replicated transactions in the Distribution database, just-in-case, a new subscriber is added or an existing subscriber needs to reinitialize.  If new subscribers are not being added or existing not being reinitialized, this feature can be disabled. When disabled, the transactions are deleted as soon as they are replicated to the subscriber thereby reducing replication metadata overhead. 

 

Also consider time to Reinitialize v. time to “Catch up”.  For example, in a high volume Transactional Replication environment moving 50 million rows a day and have a retention period of 72 hours.  Now the Subscriber is “offline” for 2 days and we’ve queued up 100+ million rows in the Distribution database.  How long do you think it will take to “empty” that queue when new transactions are still flowing in?  Then how long will the cleanup job take to purge that data from the distribution tracking tables.  It could take days to catch up.

 

How how long does it take to reinitialize the Subscriber with a backup from the Publisher using “sync with backup” option?  12 hours?  So, if the Subscriber is more then 12 hours behind, it is faster to reinit from backup instead of waiting to drain the queued transactions. Therefore, why keep 36 hours cached data?  Sounds like a 12 hour retention period is all that is needed.

 

This job is responsible for deleting data from MSrepl_transactions and MSrepl_commands tables in the Distribution Database.   Two parameters are passed to the cleanup job. The first one is minimum number of hours to keep transactions and second is to delete if transaction are older.

 

 

    EXEC dbo.sp_MSdistribution_cleanup  @min_distretention = 4, @max_distretention = 72

 

When old data gets purged is determined by "immediate sync" setting of the Publication.

 

--distribution database

select immediate_sync,* from distribution.dbo.MSpublications

 

--published database

select immediate_sync, * from syspublications

 

In the above example:

If "immediate sync" = TRUE, then keep transaction for at least 4 hours EVEN IF they have been replicated to the subscriber but delete any transaction older the 72 hours.  Metadata that is younger than 72 hours, but older than 4 hours will be kept. This is conservative.

 

If "immediate sync" = FALSE, then keep transaction for at least 4 hours but if the transaction has already been replicated (distributed) to the subscriber go ahead and delete EVEN IF not older then 72 hours. Metadata that is younger than 72 hours, but older than 4 hours may be deleted quite aggressively.

 

If the amount of metadata within the retention period is a problem, you could potentially change the immediate_sync flag with sp_changepublication, without reinit required, but there are some differences that you should be aware of. Read BOL for sp_addpublication about immediate_sync.  There are potentially negative side effects for no-sync subptions if they have been marked expired or inactive, and also adding new subscribers and adding new articles will require more manual intervention.

 

sp_changepublication

@publication='repltest',

@property='immediate_sync',@value='FALSE'