Sharding has been around for a while and I have seen quite a few systems that utilize SQL Azure with 100s Databases and 100s of compute nodes and tweeted and written about them in the past. Like this case with Flavorus;

Ticketing Company Scales to Sell 150,000 Tickets in 10 Seconds by Moving to Cloud Computing Solution

James has certainly been a great partner to work with and he is an amazingly talented guy who can do magic to pull some amazing results together in a few weeks. You need to read the case study for full details but basically he got Jetstream to scale to sell 150,00 tickets in 10 seconds roughly with 550 SQL Azure databases and 750 compute nodes. And he did that in about 3 weeks time testing included!

These types of systems are now common but the scale problems at these levels certainly get complex. one of those that hit many of the customers is the problem with connection multiplexing or simply connection pool fragmentation. Bear with me, this one takes a paragraph to explain:

Imagine a sharded system with M shards and N middle tier servers and with a max of C concurrency requests. The number of connections you need to establish is M*N*C. That is because every middle tier server has to establish possibly C connection to every shard given requests come in randomly distributed to the entire site. Now imagine the formula with some numbers; I’ll be conservative and say 50 shards, 75 middle tier servers and 10 concurrent connections. Here is what you end up with;

M * N * C = 50 * 75 * 10 = 37,500 connections

image

Over 37K… That is a lot of connections! Here are some other numbers;

  • Each middle tier server ends up with M connections pools with C connections in each at the worst case. that is 500 connection from each middle tier machine.
  • Also every shard maintains 750 connections from the middle tier servers in the worst case. that is a lot of connections to maintain as well. Large number of connection are bad because that can cause you to become victim of throttling… not a good thing…

Another fact of life, is that life isn’t perfectly distributed so what happens most times is these 500 connections in the middle tier server don’t get used enough to stay alive. SQL Azure terminates them after a while of being idle. You end up with many of these idle connections dead in the pool. That means the next time you hit this dead connection to a shard from this app server, you have to reestablish a cold connection from the client all the way to the db with a full login and whole bunch of other handshakes that cost you orders of magnitude more in latency. This issue can be referred to as connection pool fragmentation… Connection Pooling documentation also makes references to the issue: http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx and look for fragmentation. Push M or N or C higher and things get much worse.

Federation cure this for apps. With federations the connection string points to the root database. Since we ‘fool’ the middle tier to connect to root in the connection string, M=1 and the total connection from all middle tier servers to SQL Azure is only N*C = 750 total connections. Compared to over 37K that is a huge improvement!

Not only that but each middle tier server has 1 connection pool to maintain with only C=10 connection… Much better than 500. Member connections can get more complicated because of pooling in the gateway tier but it is much much better than 750 connections per shard as well. So this makes the problem go away! This is the magic of USE FEDERATION statement. No more connection pool fragmentation.

Another great benefit of USE FEDERATION is that it give you the ability to maintain your connection object and keep rewiring your connection from gateway to the db nodes in SQL Azure without a disconnect. With sharding, since we don’t support USE statement yet, the only way to connect to another shard is to disconnect and reconnect. With USE FEDERATION you can simply keep doing more USE FEDERATIONs and never have to close your connection!

If you made is this far, thanks for reading all this… If you use federations, you can forget what you read. Smile Just have to remember that you didn’t have to learn more about connection pool fragmentation or frequent disconnects & reconnects you have to face with sharding.

-Cihan