When using Microsoft SQL Server 2005 Transactional Replication, the Log Reader Agent reads the log, constructs SQL statements for published articles and posts it to the Distribution database; the Distribution Agent reads the Distribution database, delivers and applies the "package" to the Subscriber. Often the Publisher, Distributor and Subscribers are on fast networks resulting in very low total delivery latencies. However on slower networks such as in a geographically distributed environment, the connection speeds are often of low bandwidth with high network latencies - this obviously has a negative effect on total delivery time. Network latency (easily seen when using the ping command) has a substantial negative performance impact and using the new -SubscriptionStreams can provide drastic performance increases.
By default (and all previous editions of Microsoft SQL Server) the Distribution Agent uses as single stream to deliver a transaction from the distribution database to the subscriber database. In Microsoft SQL Server 2005 this default behavior can be overridden by specifying the -SubscriptionStreams NN parameter, where NN can be 0 (for non-SQL Server Subscribers or Peer-to-Peer Transactional subscriptions) to 64; where 64 would be a pretty insane number of threads especially if there are multiple Distribution Agents concurrently executing on the same server. The parameter refers to the number of connections allowed per Distribution Agent used to apply batches of changes in parallel to a Subscriber. It's important to note that even though transactions are parallelized, transactional consistency is maintained. This is achieved by cleverly hashing and "bucketizing" the primary key and re-assembly before committal at the subscriber. However, as noted in BOL (ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/rpldata9/html/67084a67-43ff-4065-987a-3b16d1841565.htm), if one of the connections fails to execute or commit, all connections will abort the current batch, and the Distribution Agent will use a single stream to retry the failed batches. Before this retry phase completes, there can be temporary transactional inconsistencies at the Subscriber. After the failed batches are successfully committed, the Subscriber is brought back to a state of transactional consistency.
A very large industry-leading online business required data to be replicated from busy OLTP servers via the internet to geographically distributed offices around the world with low latency and be able to recover/catch-up from loss of connection for up to 24 hours. The "last-mile" local connection speeds varied but were generally of poor quality (i.e. high latency) and low speed/bandwidth. The average office (i.e. subscriber) connected at around 512kbps with average latencies of between 200-600 milliseconds. The Publisher (i.e. OLTP server) and remote Distributor are located on a fast local network and the Distributor server is a dual-core four socket processor.
The OLTP publisher was "primed" with hundreds of transactions each consisting of approximately 5 to 10 commands per transaction. The Logreader Agent then moved the replicated transactions into the Distribution database. The Distribution Agent was then started and the duration and throughput measured. The network connection between the Distributor and the subscriber was throttled using Network Throttling software (which interfaced directly with network card). Network bandwidth of 512kbps and 128kbps with latencies of 300 milliseconds was tested.
On a throttled network with 512kbps bandwidth, 300 millisecond latency and a default of only 1 distribution stream; a throughput of 6.86 commands per second was distributed to the subscriber. When increasing the number of streams to 8; a throughput of 40.34 commands per second was attained - 488% improvement. And when further increasing the number of streams to 16; a throughput of 56.11 commands per second was attained - 718% overall improvement and 39% improvement over 8 streams.
When the network bandwidth was reduced to only 128kps but the network latency kept at 300 milliseconds; 1 distribution stream delivered 5.9 commands per second, 8 distribution streams delivered 31.54 and 16 distribution streams delivered 28.01 commands per second. Chart number 1 shows these results.
Although it's important to notice the hugely positive throughput benefit of using SubscriptionStreams, it's important to note the impact of reduced bandwidth versus increased network latency. The results indicate that network latency has greater negative effect than reduced bandwidth - due to the longer response/acknowledge time for each transaction to be delivered across the network. It's equally important to note that for this test, with very low bandwidth (with high network latency), 16 subscription streams did not improve the delivery rate (in fact, there was a slight decrease). This indicates that bandwidth utilization has been exceeded and there is a negative cost adding more managed parallel threads.
As the requirement included the ability to recover/catch-up from a loss of connection for up to 24 hours, Chart number 2 shows the effect of the positive performance improvement measured in hours. Over a slow network, without Microsoft SQL Server 2005 Subscription Streams, it would have taken up to 11 hours to catch-up whereas with the feature less than 2 hours is possible. This assumes 24 hours worth of transactions or 240,000 commands waiting to be distributed due to network connection outage (or other scenarios).
The SubscriptionStreams feature in SQL Server 2005 Transactional Replication can have a highly positive effect on performance, especially over a slower networks; however it must be cautioned that the more threads/connections are opened and if an "error" occurs during reassembly and or committal of the transactions at the subscriber, the number of streams are reduced to 1 and at a later stage increased back to the configured value. The constant toggling between the two states can be costly. It is highly recommended that the optimal number of streams must be carefully tested for each application and environment before rollout.
Even on a high bandwidth network with low network latency (e.g. 1gigabit and 0 millisecond latency), configuring the Distribution Agent to use SubscriptionStreams can provide significant performance benefits - as illustrated in Chart 3. Clearly this is a feature worth testing and possibly deploying.
Great stuff, I can immediately see applications for this in our environment. But I wanted to point out that your description of the results doesn't match up with Chart #1, and I was curious to know which was accurate.
According to the text, 1 stream allowed a throughput of 6.86 cps at 512kbps/300ms, but the chart shows 31.54. If the chart is correct, the difference between 1 stream and 8 for a given bandwidth and latency is significant, but not the near 500% boost that you claim in the text. Either way, something doesn't match up. Can you please clarify?
マイクロソフトの植田です。 今回はレプリケーションに関する話題をご紹介したいと思います。 http://blogs.msdn.com/sqlcat/archive/2007/05/07/sql-server-2005-transactional-replication-benefit-of-using-subscriptionstreams-for-low-bandwidth-high-latency-environments.aspx
Really good article and has just saved my behind! It looked like we would be waiting for 13 hours for our data to catchup, but using the really cool feature it was reduced to 4 hours.