How to scale out an app with SQL Azure Federations… The Quintessential Sales DB with Customer and Orders.

How to scale out an app with SQL Azure Federations… The Quintessential Sales DB with Customer and Orders.

Rate This
  • Comments 1

Continuing on the SQL Azure Federation theme with this post, I’ll cover an important question; what would it take to build applications with the SQL Azure Federations. So lets walk that path… We’ll take the quintessential sales app with customers and orders. I’ll simplify the sample and schema to get you through the headlines. But, we won’t lose any fidelity to the basic relationships in the data model. To recap the sample; as you would expect from a sales db, customers have many orders and orders have many order details (line items). We’ll focus on implementation of 2 scenarios; get orders and order details for a given customer and get all orders of all customers.

Step 1 – Decide the Federation Key

Picking the right federation key is the most critical decision you will need to make. Simply because the federation key is something that the application needs to understand and will be tough thing to change later. How do you pick the federation key? Apps that fit the sharding pattern will have most of their workload focused on a subset of data. With the sample sales app, scenarios mostly revolve around a single customer and its orders. Customers log in place or view their orders or change demographic info etc. Some apps may not have a slam dunk choice like customer_id or may have more than one dimension to optimize. If you have more than one natural choice for a federation key, you will need to think about which specific workload you want to optimize to take a pick. We’ll cover techniques for optimizing multiple dimensions in the advanced class.

It will also be critical to pick a federation key that spreads the workload to many nodes instead of concentrating the workload. After picking a federation key, you don’t want all your load to be concentrated at a single federation member or node. In the sales app, customer id as a federation key help distribute the load across to many nodes in the sales app. One more important factor is the size of the largest federation key instance. Remember that atomic units cannot be split any further. So Atomic units, that is federation key instances, cannot span federation members. When you pick the federation key, the computational capacity requirements of the largest federation key instance should safely fit into a single SQL Azure database.

Step 2 – Denormalize the Schema

With the federation key is set, next step is to modify your schema a little to deploy it into SQL Azure federations. It is a fairly simple change; we need to carry the federation key to all tables that need to be federated. In the sales apps, customer and orders tables naturally have customer_id as the column. However order_details table also need to contain the customer_id column before the schema can be deployed. This is required for us to know how to redistribute your data when a SPLIT command is issued. The script below walks through how to deploy the schema to the SQL Azure federation member.

  1: -- Connect to 'master'
  2: CREATE DATABASE SalesDB (EDITION='business',MAXSIZE=50GB)
  3: GO
  4: 
  5: -- CONNECT to 'SalesDB' and Create Federation
6: CREATE FEDERATION Orders_Federation(cid BIGINT RANGE)
  7: GO
  8: 
  9: -- Connect to ‘Order_Federation’ federation member covering customer_id 0.
 10: USE FEDERATION Orders_Federation(cid=0) WITH RESET, FILTERING=OFF
 11: GO
 12: 
 13: CREATE TABLE orders(
 14:   Customer_id bigint,
 15:   Order_id bigint,
 16:   Order_date datetime,
 17:   primary key (order_id, customer_id))
 18: FEDERATED ON (cid = customer_id)
 19: GO
 20: 
 21: -- note that customerid – federation key - needs to be part of all unique indexes but not noneunique indexes.
 22: CREATE INDEX o_idx1 on orders(odate)
 23: GO
 24: 
 25: -- create a federated table 
 26: CREATE TABLE orderdetails(
 27:   customerid bigint,
 28:   orderdetailid bigint,
 29:   orderid bigint,
 30:   partid bigint,
 31:   primary key (orderdetailid, customerid))
 32: FEDERATED ON (cid = customerid)
 33: GO
 34: 
 35: ALTER TABLE orders 
 36: ADD CONSTRAINT orders_uq1 UNIQUE NONCLUSTERED (orderid,customerid)
 37: GO
 38: 
 39: ALTER TABLE orderdetails 
 40: ADD CONSTRAINT orderdetails_fk1 FOREIGN KEY(orderid,customerid) REFERENCES orders(orderid,customerid)
 41: GO
 42: 
 43: -- reference table
 44: CREATE TABLE uszipcodes(zipcode nvarchar(128) primary key, state nvarchar(128))
 45: GO

Step 3 – Make Application Federation Aware

For the large part of the workload that is focused on the atomic unit, life is fairly simple. Your app needs to issue an additional statement to rewire the connection to the right atomic unit. Here is the example statement that connect to customer_id 55.

  1: USE FEDERATION Orders_Federation(cid = 55) WITH RESET, FILTERING=ON
  2: GO

Once you are connected, you can safely operate on data that belongs to this customer_id value 55. Even if SPLITs were to redistribute the data, you don’ tneed to remember a new physical database name or do anything different in your connection string. You connection string still connects to the database containing the federation and then issue the USE FEDERATION statement to go to customer_id 55. We will guarantee to land you in the correct federation member. Here is a dead simple implementation of GetData for a classic sales app with Orders and OrderDetails tables. Interesting thing to note here is that the connection string points to the SalesDB database and that the net new code here is the lines between line 13 and 17 below.

  1: private void GetData(long customerid)
  2: {
  3:   long fedkeyvalue = customerid;
  4: 
  5:   // Connection

6: SqlConnection connection =

new SqlConnection(@"Server=tcp:sqlazure;Database=SalesDB;User ID=mylogin@myserver;Password=mypassword")

  7:   connection.Open();
  8: 
  9:   // Create a DataSet.
 10:   DataSet data = new DataSet();
 11: 
 12:   // Routing to Specific Customer
 13:   using (SqlCommand command = connection.CreateCommand())
 14:   {

15: command.CommandText = "USE FEDERATION orders_federation(cid="

+ fedkeyvalue.ToString() + ") WITH RESET, FILTERING=OFF";

 16:     command.ExecuteNonQuery();
 17:   }
 18: 
 19:   // Populate data from orders table to the DataSet.

20: SqlDataAdapter masterDataAdapter =

new SqlDataAdapter(@"select * from orders where (customerid=@customerid1)", connection);

 21:   masterDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 22:   masterDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 23:   masterDataAdapter.Fill(data, "orders");
 24: 
 25:   // Add data from the orderdetails table to the DataSet.

26: SqlDataAdapter detailsDataAdapter =

new SqlDataAdapter(@"select * from orderdetails where (customerid=@customerid1)", connection);

 27:   detailsDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 28:   detailsDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 29:   detailsDataAdapter.Fill(data, "orderdetails");
 30:   
 31:   connection.Close();
 32:   connection.Dispose();
 33: 
 34:   …
 35: }

For parts of the application that needs to fan out queries, in future versions there will be help from SQL Azure. However in v1, you will need to hand craft the queries by firing multiple queries to multiple federation members. If you are simply appending or union-ing resultset across your atomic units, you need to iterate through the federation members. This won’t be too hard. Here is the GetData function expanded to include the scenario where all customers across all federation members are displayed. The function assumes the special customer_id=0 returns all customers so T-SQL is adjusted to enable that through the added @customerid2 parameter in the where clause ; where (customerid=@customerid1 or @customerid2=0). The net new code here is the loop that wraps lines 14 to 39. Lines 35 to 39 basically grab the value required to rewire the connection to the next federation member to repeat the commands until the range high of NULL is hit. NULL is the special value used to represent the maximum value of federation key in the sys.federation_member_columns system view.

  1: private void GetData(long customerid)
  2: {
  3:   long? fedkeyvalue = customerid;
  4: 
  5:   // Connection
  6:   SqlConnection connection = new SqlConnection(@"Server=tcp:sqlazure;Database=SalesDB;User ID=mylogin@myserver;Password=mypassword");
  7:   connection.Open();
  8: 
  9:   // Create a DataSet.
 10:   DataSet data = new DataSet();
 11:   
 12:   do
 13:   {
 14:     // Routing to Specific Customer
 15:     using (SqlCommand command = connection.CreateCommand())
 16:     {

17: command.CommandText = "USE FEDERATION orders_federation(cid="

+ fedkeyvalue.ToString() + ") WITH RESET, FILTERING=OFF";

 18:       command.ExecuteNonQuery();
 19:     }
 20: 
 21:     // Populate data from orders table to the DataSet.

22: SqlDataAdapter masterDataAdapter = new SqlDataAdapter(

@"select * from orders where (customerid=@customerid1 or @customerid2=0)", connection);

 23:     masterDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 24:     masterDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 25:     masterDataAdapter.SelectCommand.Parameters.Add("@customerid2", SqlDbType.BigInt);
 26:     masterDataAdapter.SelectCommand.Parameters[1].Value = customerid;
 27:     masterDataAdapter.Fill(data, "orders");
 28: 
 29:     // Add data from the orderdetails table to the DataSet.

30: SqlDataAdapter detailsDataAdapter = new SqlDataAdapter(

@"select * from orderdetails where (customerid=@customerid1 or @customerid2=0)", connection);

 31:     detailsDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 32:     detailsDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 33:     detailsDataAdapter.SelectCommand.Parameters.Add("@customerid2", SqlDbType.BigInt);
 34:     detailsDataAdapter.SelectCommand.Parameters[1].Value = customerid;
 35:     detailsDataAdapter.Fill(data, "orderdetails");
 36: 
 37:     using (SqlCommand command = connection.CreateCommand())
 38:     {
 39:       command.CommandText = "select cast(range_high as bigint) from sys.federation_member_distributions";
 40:       fedkeyvalue = (long)command.ExecuteScalar();
 41:     }
 42:   } while (customerid == 0 && fedkeyvalue == null);
 43: 
 44:   connection.Close();
 45:   connection.Dispose();
 46:   
 47:   …
 48: }

Below is the output from a sample sys.federation_member_columns view for the orders_federation. With this federation setup, range_low is inclusive and range_high is exclusive to the given federation member. NULL represents the maximum bigint value +1.

federation_id

member_id

federation_key_name

range_low

range_high

1

51

customer_id

-9,223,372,036,854,775,80

1000

1

52

customer_id

1000

2000

1

53

customer_id

2000

3000

1

customer_id

1

100

customer_id

99000

NULL

The set of scenarios here is obviously not an exhaustive list. There will be more sophisticated types of queries that require cross federation joins. Those queries will need to utilize client side in v1. In future things will get better with support from SQL Azure for fanout queries. Fanout querying will enable easier querying across federation members.

Clearly, SQL Azure federations require changes to your application. The changes typically impact more the mechanics of the application to wire up coordination of the partitioning effort with the federations but less about the business logic of the application. These changes to the application comes with great benefits of massive scalability and ability to massively parallelize your workload by engaging man nodes of the cluster. There are many applications out there that have been designed utilizing the horizontal partitioning and sharding principals over the last decade. These applications did not have the luxury of having an RBDMS platform that had understanding of the data distribution and robust routing of connections. With SQL Azure federations, horizontal partitioning and sharding become first class citizens in the server and the bar for entry is lowered and it becomes much easier to get that flexibility to redistribute data at will, without downtime. These properties translates to great elasticity and best economics and unlimited scale for applications so I think the changes are well worth the effort.

  • Awesome!  Geezsh. I've been wanting "scale-out" for 10+ years in Sql Server, or at least a scale out architecture supported by MS.  Great stuff!

    Lee Everest

    Www.texastoo.com  

Page 1 of 1 (1 items)