Quick Tip - Federations in SQL Azure: How to automatically maintain the federation distribution key in federated tables with FILTERING connections.

Quick Tip - Federations in SQL Azure: How to automatically maintain the federation distribution key in federated tables with FILTERING connections.

Rate This
  • Comments 5

In federations, federations member contain part of the data that is in a federation. The range of data contained in a federation member depends on the layout of the partitioning you have in your federations. There could be 10s or 100s of federation members covering the full range of values of the federation key. When connecting to federation member, apps use the USE FEDERATION command. USE FEDERATION takes a federation name (f1 below), a federation distribution key name (id below) and a atomic unit value (the guid value in the example below).

   1:  use federation f1(id='82855FD2-BA4C-4F72-BFB5-78523741C5A8') 
   2:  with reset, filtering=off

USE FEDERATION provides 2 connection types.

#1 The unfiltering connection is just a regular connection to the whole federation member containing this atomic unit. This is no different than connecting to the db name of the federation member. In the example below, I express to connect to a GUID value, however when I query t1 for all rows, I get back all data that is in this federation member’s range.

   1:  use federation f1(id='00000000-0000-0000-0000-000000000000') 
   2:  with reset, filtering=off
   3:  go
   4:  select * from t1
   5:  go
Returns: 
   1:  c1                                   c2
   2:  ------------------------------------ ------
   3:  08FFE74B-1BC0-460B-AA47-044E2DD7C0E0 b
   4:  732ABAAE-8727-4C49-B40B-8013DFB3C735 b

#2 The Filtering connection is a new concept that is available with federations. Filtering connection connect to a slice of a database that contains your atomic unit as opposed to the whole federation member (database). The filtering connection simply eliminates the need to express the federation key value in every query against federated tables. SQL Azure automatically injects the WHERE <federation_column>=federation_filtering_value(…) into all queries that target a federated table. This means the above example returns no rows since there is not rows present for '00000000-0000-0000-0000-000000000000’ in t1. If I connect to an existing value such as … I only get the rows that are in the atomic unit for t1 for my query. See the example below;

   1:  use federation f1(id='08FFE74B-1BC0-460B-AA47-044E2DD7C0E0') 
   2:  with reset, filtering=on;
   3:  go
   4:  select * from t1
   5:  go
Returns: 
   1:  c1                                   c2
   2:  ------------------------------------ ------
   3:  08FFE74B-1BC0-460B-AA47-044E2DD7C0E0 b

If you establish a filtering connection to a federation member, we do provide the following handy function to return the value back to you in the federation member.

   1:  use federation f1(id='08FFE74B-1BC0-460B-AA47-044E2DD7C0E0') 
   2:  with reset, filtering=on
   3:  go
   4:  select federation_filtering_value('id') as filtering_value
   5:  go

Returns:

   1:  filtering_value
   2:  ------------------------------------
   3:  08FFE74B-1BC0-460B-AA47-044E2DD7C0E0

You can also use the function to maintain the federation distribution column in federated tables for DML statements such as INSERTs. In the example below, t1 is created with a default based on this function. once a filtering connection is established, insert statements only specify the value to insert to column c2 and value of c1 is maintained automatically for you by the default property on table t1. Here is a full end to end example;

   1:  use federation root with reset
   2:  go
   3:  create federation f1(id uniqueidentifier range)
   4:  go
   5:  select 'use federation f1(id='''+cast(newid() as varchar(100))+''') 
   6:  with reset, filtering=on'
   7:  go
   8:  use federation f1(id='82855FD2-BA4C-4F72-BFB5-78523741C5A8') 
   9:  with reset, filtering=off
  10:  go
  11:  create table t1(
  12:    c1 uniqueidentifier primary key default federation_filtering_value('id'), 
  13:    c2 nvarchar(100)) 
  14:  federated on (id=c1)
  15:  go
  16:  select 'use federation f1(id='''+cast(newid() as varchar(100))+''') 
  17:  with reset, filtering=on'
  18:  go
  19:  use federation f1(id='732ABAAE-8727-4C49-B40B-8013DFB3C735') 
  20:  with reset, filtering=on
  21:  go
  22:  -- value of c1 is set to the atomic unit value in the filtering connection
  23:  -- '732ABAAE-8727-4C49-B40B-8013DFB3C735'
  24:  insert into t1(c2) values('a')
  25:  go
  26:  select federation_filtering_value('id')
  27:  go
  28:  select 'use federation f1(id='''+cast(newid() as varchar(100))+''') 
  29:  with reset, filtering=on'
  30:  go
  31:  use federation f1(id='08FFE74B-1BC0-460B-AA47-044E2DD7C0E0') 
  32:  with reset, filtering=on
  33:  go
  34:  -- value of c1 is set to the atomic unit value in the filtering connection
  35:  -- '08FFE74B-1BC0-460B-AA47-044E2DD7C0E0'
  36:  insert into t1(c2) values('b')
  37:  go
  38:  select federation_filtering_value('id')
  39:  go
  40:  use federation f1(id='00000000-0000-0000-0000-000000000000') 
  41:  with reset, filtering=off;
  42:  go
  43:  select * from t1
  44:  go
Returns: 
   1:  c1                                   c2
   2:  ------------------------------------ ------
   3:  08FFE74B-1BC0-460B-AA47-044E2DD7C0E0 b
   4:  732ABAAE-8727-4C49-B40B-8013DFB3C735 b

If you happen to connect using a unfiltering connection, federation_filtering_value() will return NULL as we do not have a known filtering value. That will cause violation due to NOT NULL requirement for federation columns so app will be responsible for specifying an explicit value. I expect bulk inserting and updating of data for example will always explicitly specify federation key value.

Note: This functionality will be in the final version of the product however this does not work on the preview bits just yet. If you are in the technology preview program for federations today, you will get this functionality when we refresh the build to a more recent build.

  • I would like to use SQL Azure but I want to be able to scale to high volume of data in the future so I started to look at SQL Federations. But I am having issues designing this. Here's a sample

    scenario:

    Let's look at the example of a high volume auction site. I am going to use AuctionID as the federation key.

    We also have users on the site that both buy and sell at auctions (and also provide ratings to other buyers and sellers).

    My federation atomic unit is based on AuctionID. Now it is easy to select the appropriate federation and get the relevant data.

    But when I want to find out all auctions my UserID (say 101) sold in, which could be in different federations, how would I go about it? I will have to scan all federations, right?

    Now I also want to find out all auctions where my UserID bought from another seller. Again, I will have to scan all federations, right?

    The problem is that an atomic unit from Auctions standpoint is different from an atomic unit from a Buyer's (User table) stand point and an atomic unit from a Seller's (still the User's table) is

    still different from both.

    So how would I solve this problem where I need 3 different atomic units (based on AuctionID or based on UserID from Buyer's stand point or from Seller's standpoint) with SQL Azure federations when

    all of this data is related through foreign key relationships??

    The bottom line is that I want to be able to filter by Auction (Auctions table) properties, by the buyer properties (Users table) and by the Seller properties (Users table). There is no difference

    between buyer and seller from the User table stand point but they are on different side of transaction (the same user can be a buyer for some auctions and seller for other auctions).

    If you draw this on a piece of paper, the problem will be readily apparent or maybe I am completely missing something!

    Would you or someone on your team please look at this and provide suggestions on how I could do this with SQL Azure Federations?

  • Thanks for the example. Sharding as a pattern deals with this problem in a few ways. With federations This type of many-to-many relationship can be achieved but requires work in the app side. Here are a few options;

    #1 yes; You can search all data for finding the UserIDs: this won't be very efficient especially if this type of query is in a much exercised path of the product.

    #2 You can create more federations in your database; one federation scaling out the users and another for the auctions. User_fed can maintain a table that reference to the auctions that the user has a relation to. OR Auction_fed can contain users table to maintain a list of users. However the maintenance of the tables is not done by sql azure federations. It needs to be done by the app. the maintenance can be done lazy or eagerly. Some apps use background consistency checkers to maintain relationships like this across federations, others do cleanup as they retrieve data by validating the data, others utilize broadcast of updates to many tables to maintain the information).

    This is fairly brief but hopefully gets to the point. I will do a detailed blog on this topic soon and look forward to more questions there.

    Thank you.

  • Thank you. I like option #2. It does require more storage and application logic but gives the benefit of scalability.

    So in the example, I could create a table in Users_Fed called UserAuctions and maintain the AuctionIDs for each UserID. I might need to also maintain some kind of auction summary in this table so that I can build the "My Auctions" list for a user without scanning all individual Auction federations. When the user clicks on a particular auction from the list, I can then retrieve the auction details from that particular Auction federation.

    Now I need to figure out how to do the federations with Entity Framework!

  • In future we hope to make this more automated. Also, there is a post coming on entity framework and federations soon.

  • Thanks. Looking forward to the blog post with Entity Framework and Federations.

    Despite the learning curve, I am also considering Azure Table storage in conjunction with SQL Azure and Lucene.Net (text indexing/searching) as the cost of SQL Azure for low margin high data volume web application is prohibitively expensive (that's an understatement when you consider 50GB for $499.95 per month!)!

    I haven't found any good guidelines/tutorials for maintaining indexes in SQL Azure and storing text data in Azure Table storage. But I will keep looking.

Page 1 of 1 (5 items)