Blogs from Suhas

Dirtying my hands in SQL Server

INF: Automatic Identity Range Management with Merge Replication in Microsoft SQL Server 2005 / 2008

INF: Automatic Identity Range Management with Merge Replication in Microsoft SQL Server 2005 / 2008

  • Comments 2

I have been working with SQL Server Replication for over 4 years now; and till now each case that I work on gives me something new to learn. At the same time, every case amazes me by the intricate design that has gone into implementation of SQL Server Replication. With the wonderful GUI design, replication seems so easy to configure and monitor; however, when I start troubleshooting an issue and I look into the metadata to understand the cause of the issue, I am faced with new challenges every day.

One such technology is the Automatic Identity Range Management. Have you ever imagined how SQL Server Replication manages the identity values of user tables? If your user table is a part of Transactional Replication with Updatable Subscriptions or the table is replicated using Merge Replication, how does Replication ensure that different values are inserted at the Publisher and the Subscribers, so that conflicts arising off using same identity values can be avoided? Let’s see...

Let’s create a table that has an identity column:

CREATE TABLE [dbo].[Orders]
(
OrderID INT IDENTITY(1, 1) NOT NULL,
OrderDate DATETIME NOT NULL,
CustomerName VARCHAR(100) NOT NULL,
ProductName VARCHAR(100) NOT NULL,
OrderQuantity INT NOT NULL,
UnitPrice DECIMAL(10, 2) NOT NULL
)

We will now publish this table using Merge Replication. We set the Publisher Identity Range as 10000 and the Subscriber Range as 1000. We will also create a subscription to this publication. For demonstration purposes, our publication will have just this one article. Also, at this point, the table is completely empty. After the initial sync, we ran the following query against both the Published Database and the Subscribed Database:

SELECT IDENT_CURRENT ('dbo.Orders')
 
Against the Published Database, we got 1 as the output, and against the Subscribed Database we got 2002 as the output. This means that the next record inserted at the Publisher will have the identity value 1, while that inserted at the Subscriber will have a identity value of 2002. This leads us to the following questions:
a) On what basis is the identity value at the Publisher and the Subscriber decided?
b) How does replication implement the identity values?
c) What will happen if the current identity value at the Publisher exceeds 2001?
d) How does replication know what identity values have been assigned?

Question: On what basis is the identity value at the Publisher and the Subscriber decided?

When we configure replication, we have to set the Publisher Identity Range and the Subscriber Identity Range. These range values are by default set to 10000 and 1000 respectively. This means that the Publisher will be assigned an identity range of 10000 (primary) + 10000 (secondary) for distribution. Of these, the Publisher will keep 1000 (primary) + 1000 (secondary) identity values for inserts on the Publisher. Among the rest, 1000 (primary) + 1000 (secondary) identity values will be assigned to the subscriber for inserts at the subscriber. The next subscriber will be assigned the next 1000 (primary) + 1000 (secondary) identity values.

Question: How does replication implement the identity values?

SQL Server Replication implements the identity values in the form of constraints that are applied on the corresponding tables. If we examine the Orders table on the Publisher, we will find a constraint by the name of repl_identity_range_<GUID> created. The GUID is based on the value of the artid column for the article in the sysmergearticles system table. The same constraint is created on the Subscriber as well. The expression for this constraint on the Publisher and the Subscriber are as follows:

Publisher:
([OrderID]>=(1) AND [OrderID]<=(1001) OR [OrderID]>(1001) AND [OrderID]<=(2001))
Subscriber:
([OrderID]>(2001) AND [OrderID]<=(3001) OR [OrderID]>(3001) AND [OrderID]<=(4001))

Question: What will happen if the current identity value at the Publisher exceeds 2001?

Once rows are entered into the Publisher Database and the Primary and the Secondary identity ranges are completely utilized, 2 new ranges are assigned to the Publisher. Hence, when the identity value on the Publisher reaches 2001, and we try to insert the 2002th value, the constraint is modified on the Publisher. The expression now becomes:
([OrderID]>(4001) AND [OrderID]<=(5001) OR [OrderID]>(5001) AND [OrderID]<=(6001))

However, the behavior on the subscriber is different. On the subscriber, if both the assigned identity ranges are completely utilized, further inserts will fail, unless synchronized with the Publisher. When the subscriber synchronizes, new identity ranges are assigned to the subscriber. If the primary range is completely utilized, during synchronization, the secondary range is made the primary range, and a new identity range is assigned, which now becomes the secondary identity range.

For SQL Server CE subscribers or subscribers running SQL Server 2000 or an earlier version, only one identity range can be assigned. Here the threshold parameter comes into play. If the identity range usage is above the threshold (default value 80%), a new identity range is assigned. Thus holes in the identity values can be created. For example, on the SQL CE subscriber, if we have the identity range set to ([OrderID]>(4001) AND [OrderID]<=(5001)), and we are currently at 4850, it means that we have utilized more than the threshold value; while synchronization, a new identity range is assigned. If now, the identity range is like ([OrderID]>(5001) AND [OrderID]<=(6001)), it means that the next identity value that will be assigned to the next new row is 5002; thus identity values between 4851 and 5001 are skipped, leaving holes in the identity values. Higher the threshold, smaller is the hole, the less fault-tolerant the system becomes.

Question: How does replication know what identity values have been assigned?

Starting SQL Server 2005, the MSmerge_identity_range system table on the Published Database has information about the current identity ranges in use. It has one row per article for the Publisher Identity Range, one row per article for the Identity Range being used by the Publisher, and one row per article for each Subscriber. The following is a screenshot of the contents of this table on the publisher:

Publisher_MSmerge_identity_range

The MSmerge_identity_range system table on the Subscribed Database, however, has one per row per article and has information about the identity range used by itself for that article.

Similar to previous versions of SQL Server, we also keep the history of all the identity range allocations in the MSmerge_identity_range_allocations system table in the distribution database. The following is a screenshot of the contents of this table:

MSmerge_identity_range_allocations 
(Click on the image to enlarge)

Distributing Identity Ranges to Multiple Subscribers:

Now that we know how identity rages are assigned and kept track of, lets see how the identity ranges will look like when we have multiple subscribers. Let’s assume that database PubDB of the SQL Server Instance Publisher is published and is subscribed by Subscriber1, Subscriber2, Subscriber3 and Subscriber4. The identity range allocations are as follows:

Server Name Primary Range Secondary Range Comments
Publisher 1 to (max value *) - Publisher Range
Publisher 1 to 1001 1001 to 2001 Subscriber Range
Subscriber1 2001 to 3001 3001 to 4001 Subscriber Range
Subscriber2 4001 to 5001 5001 to 6001 Subscriber Range
Subscriber3 6001 to 7001 7001 to 8001 Subscriber Range
Subscriber4 8001 to 9001 9001 to 10001 Subscriber Range

* max value for INT datatype is "2147483647".

Distributing Identity Ranges in a Re-Publishing Scenario:

In a republishing scenario, the re-publisher will also be assigned a Publisher Range, which will be used by it for distributing identity ranges to its subscribers. Lets consider the following scenario: Publisher is the main publisher, with a Publisher Identity Range of 10000 and Subscriber Identity Range of 1000, and its subscribers are Subscriber1, Subscriber2, Subscriber3, and RePublisher. The database on RePublisher is published again with a Publisher Identity Range of 7500 and Subscriber Identity Range of 500. The subscribers of RePublisher are SubSubscriber1 and SubSubscriber2.

The chart of identity range assignments will be as follows:

Server Name Primary Range Secondary Range Comments
Publisher 1 to (max value *) - Publisher Range
Publisher 1 to 1001 1001 to 2001 Subscriber Range
Subscriber1 2001 to 3001 3001 to 4001 Subscriber Range
Subscriber2 4001 to 5001 5001 to 6001 Subscriber Range
Subscriber3 6001 to 7001 7001 to 8001 Subscriber Range
RePublisher 8001 to 9001 9001 to 10001 Subscriber Range
RePublisher 10001 to 17501 17501 to 25001 Publisher Range
SubSubscriber1 10001 to 10501 10501 to 11001 Subscriber Range
SubSubscriber2 11001 to 11501 11501 to 12001 Subscriber Range

* max value for INT datatype is "2147483647".

So, we can clearly see that RePublisher is assigned two Publisher Identity Ranges (primary range and secondary range) and all ranges assigned to its subscribers are from these ranges only.

What if we reinitialize the Subscriptions?

If we reinitialize the subscriptions, each time we reinitialize, new identity ranges are assigned to the corresponding subscribers. This is irrespective of the fact whether the identity ranges already assigned to the subscribers are depleted. This has the potential to create holes in the identity values in the table and can even cause the assignable identity ranges to get depleted. Let’s see using an example.

I have a database that has the Orders Table, similar to the one defined earlier, with a small change. My identity column is now smallint instead of int. I have created the publication using 5000 as the Publisher Range and 2000 as the Subscriber Range. Once I create the subscriptions, I have the following identity values assigned:

Server Name Primary Range Secondary Range Comments
Publisher 1 to 32767 - Publisher Range
Publisher 1 to 2001 2001 to 4001 Subscriber Range
Subscriber1 4001 to 6001 6001 to 8001 Subscriber Range
Subscriber2 8001 to 10001 10001 to 12001 Subscriber Range
Subscriber3 12001 to 14001 14001 to 16001 Subscriber Range
Subscriber4 16001 to 18001 18001 to 20001 Subscriber Range

Now, I reinitialize all the subscriptions. In this case, the Publisher Identity Range (1 to 2001 & 2001 to 4001) is not changed, however, each subscriber is assigned 2 new ranges. This is irrespective of the fact that I have not inserted any row in any of the databases. So, the Merge Agent starts off with the assignment:

Subscriber1: 20001 to 22001 & 22001 to 24001
Subscriber2: 24001 to 26001 & 26001 to 28001
Subscriber3: 28001 to 30001 & 30001 to 32001
Subscriber4: 32001 to 34001… Oops!!!

The max value for smallint is 32767, and the Merge Agent has tried to assign an identity range that is above the max value of smallint. This allocation will now fail, and the Merge Agent will fail with the following message:

The Publisher failed to allocate a new set of identity ranges for the
subscription. This can occur when a Publisher or a republishing Subscriber
has run out of identity ranges to allocate to its own Subscribers or when
an identity column data type does not support an additional identity range
allocation. If a republishing Subscriber has run out of identity ranges,
synchronize the republishing Subscriber to obtain more identity ranges
before restarting the synchronization. If a Publisher runs out of identit
(Source: MSSQL_REPL, Error number: MSSQL_REPL-2147199417)
Get help: http://help/MSSQL_REPL-2147199417


What can I do to resolve this error?

To resolve this error, you can try to drop the Subscriptions and Publication, and create them once again. If the last inserted row has a low value in the identity column, this recreating everything will give you relief for some time.

However, if the last inserted identity row value is high, and close to the max value for smallint, recreating the Publication and the Subscriptions will be of no good. The only way out in such cases is to change the datatype of the Identity Column to a larger datatype (int or bigint).

Hope this helps my readers to understand the concepts of Automatic Identity Range management in Merge Replication using Microsoft SQL Server 2005 / 2008.

Disclaimer: All information provided here is my personal opinion and is neither verified nor approved by Microsoft before it is published. All information, and code samples, if any, is provided "AS IS" with no warranties and confers no rights.

Comments
  • I dont think we can have republisher range as 7500 when the root publisher range is 10000. I have tested this and found that we can give the publisher range for republisher as max of 50% of root publisher range. Please check and confirm

  • What is the strategy to deal with the situation where the publisher is down for a prolonged period e.g. as part of preventative maintenance or an infrastructure upgrade such that inserts at the subscriber require new identity ranges to be assigned and automatic identity range management is in use. Is it to increase the size of the ranges e.g. to 100,000 and 10,000 rather than the defaults?

Page 1 of 1 (2 items)
Leave a Comment
  • Please add 5 and 5 and type the answer here:
  • Post