Federations: Knowing When and Where to Split? A Few Handy DMVs Make That Easy!

Federations: Knowing When and Where to Split? A Few Handy DMVs Make That Easy!

Rate This
  • Comments 2

Deciding on when to scale out to more members is one of the critical decisions you make when running the system. However it isn’t always clear to everyone as to when to split and where to split a member. We’ll pick that up this time around.

SQL Azure does not provide a specific set of numbers on IO rate or CPU cycles each SQL Azure database can achieve. This makes it a challenge to decide how to initially lay out your federation. How many members do you need and what are the split-points? The way to figure this out is testing and applying the principles here to your decision can help both decide your initial layout and your ongoing policy for when and where to split.

To decide when and where to split, you need to define the key metrics that best indicate the threshold of ‘good’ vs ‘bad’ experience for the app. To put it another way, what is the key metric for the apps healthy experience: Every app I look at have a signature resource utilization pattern that seems to utilize the resources in its own unique way. Some apps have small and chatty transactions with many concurrent users. Others have long batch processes over a few connections that chew over a lot of data. Some are CPU bound in performance and others are IO bound. There seems to be always a resource that seem to suggest when you are getting close to the edges of the apps deal experience. Beyond that point, either the latency becomes too high or you start getting errors for large number of your transactions etc. Identifying these few resources that are good predictors of the load of the app on a given DB is key to deciding when to split. Here are a few examples that are classic triggers for many apps:

- # users connections open to the db

select count( *) from sys.dm_exec_sessions

- # msecs of the worst blocking

select top 1 wait_type, wait_time, wait_resource from sys.dm_exec_requests where wait_time > 0 order by wait_time DESC

- # concurrent requests reaching a peak magic number

select count( *) from sys.dm_exec_requests where wait_time > 0 order by wait_time desc

Once you have identified the points where you want to split to improve the experience, you need to decide where to split for best distribution of the load. these mostly pivot off of the previous metrics. For example, if you identified #connections as an issue, what would be a good spot to issue a split to distribute the connections equally? Trying to be precise in these calculations could be costly and complicated. The trick is to identify another good indicator that will be a good metric that is easy to median. in most cases, split points that distribute based on data size works well. Half the db size means half the load in many cases. Many simplify this down to calculating the median for the the largest table. Median of the largest federated table may not be  super precise but it is dead simple… Figuring out the largest table is easy if you are uncertain about it in a member;

select top 1 object_name(a.object_id),c.name, used_page_count from sys.dm_db_partition_stats a join sys.federated_table_columns b on a.object_id=b.object_id join sys.all_columns c on a.object_id=c.object_id and c.column_id=b.column_id order by used_page_count desc

This can be done easily with something like this;

select  max(federation_column), tile from (select federation_column, ntile(2) over (order by federation_column) as [tile]
          from largest_federated_table with (nolock)) i
where tile = 1 group by tile

Lets see how this works on a database that has uniqueidentifier type as a federation key. Most people find sorting with GUID type hard so this should help clarify that difficulty as well. Uniqueidentifier in SQL (GUID) is sortable and given they are random it is an easy way to simulate hashing. Some argue their distribution isn’t always uniform and franly that does not matter given that we do not require members to have some symmetric split points. you can split at any uniqueidentifier to balance the data size in members. Well lets see how:

Here is what the existing member distribution looks like with the 10 members;

image

The size of the last member database is 326MBs. Lets assume that is the point where I want to split my member since I know when size gets larger, I bump into issues with # connections.

image

To get the middle point that will distribute the data equally to 2 federation members, I run the query on the largest federated table in the member “blog_entries_tbl” which has the blog_id as the federation column.

image

I issue a split at this point ‘00000000-0000-060A-0000-000000000000’ and a new split is kicked off.

image

Operation completes and here is what the new federation member layout looks like; Notice the new members at ‘00000000-0000-060A-0000-000000000000’

image

Here are the new sizes of the new members: 163 MBs each! Perfectly in the middle.

image

Until next time. Enjoy!

-cihan

  • Cihan,

    Your readers may also find it helpful to know that AzureWatch, an Azure-specific monitoring service is capable of live monitoring SQL Azure Federations for user-defined conditions and alerting DBA's when such conditions are true.  More info at http://www.paraleap.com

  • Thanks Igor! Love the flexibility of AzureWatch...

Page 1 of 1 (2 items)