In part 1, I focused on modeling your data for scale. That was the first step in the scale-first design process. Scale-first data modeling focuses on table groups that needs scale and federations and federation keys that help surround these table groups for configuring their scale characteristics. In this post, we focus on the next step: the schema deployment. We’ll talk about how to annotate the schema definition and deploy it for matching the scalable data model you want to achieve.

Lets start with the basics: an overview of the schema model in federations.

Schema Model in Federations:

Different from a single database schema, with federations schema is distributed. Some parts of the schema live in the root database while other parts of the schema are scaled out to federations. This distribution is important to note because in the absence of distributed queries, only the local schema is visible when you connect to root or a member. you can discover other members or the root from a member but you cannot yet access anything other than the local schema.

Here is a quick examples: with schema with federations is that each db in the system, lets imagine Root db has table#1 and table#2 and member#1 in federation#1 has table#3 and federation#2 has a member#1 with table#4 in it. In this setup, table#3 is not visible in the root. table# 1, table#2 or table#4 are not visible when you connect to member#1 in federation#1. So no globally visible single schema with federations or a single namespace to address all schema today. All of this is fairly similar to what you would have, if you were building your own sharding.

It is important to note that within a federation, it is possible to setup member#2 and member#1 with different schemas (different tables, indexes, triggers sprocs, permissions etc). However typically most apps eventually deploy the same schema to all members in a federation. That is many apps do no have a permanent drift of schemas but a transient drift during upgrades etc. I have worked on sharded models where I index member differently for examples based on their size but very few instances.

The ability to independently manipulate schema in each part (root or members) of federations is critical because of a few reasons;

1- Availability and Partition Loss Tolerance: Independent schema per part (root or member) allows for better partition loss tolerance and overall availability. With independent schemas, each part of the system have its own private copy of the schema and have little or no dependency on other parts of the system. If some parts of the system suffer from transient or permanent loss of availability, the rest of the system stays accessible.

2- It promotes a flexible upgrade pattern for 24x7 apps: you can upgrade a piece at a time in your federated database (root+all members) and can rollback with minimized impact. A single monolithic database does not give you options to upgrade half of a table or try a fix to a stored proc for some of your users. I that this is the experience of many DBAs; it is hard to figure out what schema change operations and updates can be done online in databases and with the separation of schemas to many mini private parts, you can upgrade your schema partially for some of your data and test. During the upgrade even it the DDL exclusively locks the data, it is isolated to only 1 member. If the upgrade works out, roll the new schema to all other members. If it does not work out, you can roll back faster because you only rollback parts you updated. All of this is similar to how we do our rollout in SQL Azure as well.

3- It allows for parallel high performance upgrades: given the ability to scale out all your schema updates on all independent nodes with their independent transaction logs and temp dbs, memory, cpu and IO capacities, you can deploy your schema much faster than you can on a single monolithic database. Because schema update is massively parallelized to all members.

Annotating and Deploying Schema to Federations:

So you get it: All objects are local to where they are created. However if you start looking at the collection of all parts together across the root database, the federations and their members, you can see a set of tables used for 3 different purposes: central tables, federated tables and reference tables.

Types of Tables

Tables in the root are called central tables , tables scaled out in federations are called federated tables. Within members you can also create a second type of table called reference tables.

Central tables are the tables in your data model that you choose not to scale out. They are low traffic any happy with the resource constraints of a single node. For these tables life is easy. There are no changes to the shape of these tables or how you deploy them. Just connect to the root and deploy.

We talked about reference tables in part 1 as part of the logical data model but lets remember:

Reference tables are the tables we choose to distribute to all members because we want to optimize query-ability at the members. You can choose to leave many lookup tables as central tables in the root database but if the tables are regularly used for lookups by queries hitting your federated tables, it may make sense to move them to members as reference tables.

With reference tables given they are going to be part of the members, there are a few restrictions to ensure correctness;

  • Reference tables does not have support for IDENTITY property or the Timestamps/Rowversion data type. We hope to remove the restriction in future.
  • Reference tables cannot have a foreign key relationship back to a federated table. Other type of foreign key relationships work fine.

We touched on federated tables in part 1 as well: Federated tables have federation keys and are constrained to store values that comply with the range the member serves, whereas reference tables does not contain a federation key and are not constrained by the range of the member.

With federated tables, there are a few restrictions to ensure correctness as well;

  • Federated tables need to contain the federation distribution column as part of the table and today that can only be a single column and computed columns are not supported.
  • Federated tables does not have support for IDENTITY property or the Timestamps/Rowversion data type.
  • All unique indexes and clustered indexes must contain the federation distribution column.
  • Foreign key relationships between federated tables must include the federation distribution column.

 

image

Figure#1 : Gray – Central Tables, Blue & Green & Orange – Federated Tables

Objects Other Than Tables

All other types of objects besides tables, such as views, triggers, stored procedures and functions simply get deployed without requiring any changes to their definition as well as long as they can maintain local references. You may choose to deploy these objects to root or to members or to both depending on their function. For objects with a clear single table dependency you can follow the table placement (root or one of the members) and place the objects with table dependency in the same location as the table. However If you have objects that reference distributed pieces of your schema, like a stored procedure that reference tables in root and in member, this logic has to move to client side for now since SQL Azure does not support any cross database communication.

The only restriction within federations for non-table objects is on indexed views. Indexed views cannot be created on federation members. We hope to remove that restriction in future.

Deploying Central Tables

Lets start with tables you will leave in the root. These are central tables. Schema definition for central tables have no new consideration in federations. That is no change required to how you would design and deploy these tables.

-- connect to the adventureworks. this is the root db

CREATE TABLE HumanResources.Employee(EmployeeID int primary key, NationalIDNumber nvarchar(15)…)

Deploying Reference Tables

Pieces of the schema you scale-out in federation out of the root database like federated tables and reference tables need to be deployed to the federation members. Reference tables require no changes to their single-database schemas as well. However they are subject to the limitation on federation members listed above.

The deployment of reference tables require you to deploy the schema to every member. You can do this in a script to loop through them. You can use the following tool or code sample to achieve this;

Americas Deployment: http://federationsutility-scus.cloudapp.net/
European Deployment: http://federationsutility-weu.cloudapp.net/
Asian Deployment: http://federationsutility-seasia.cloudapp.net/

To deploy a central table you connect to root and iterate over each member using USE FEDERATION and repeat the CREATE TABLE statement such as the one below.

USE FEDERATION CustomerFederation(customerID=0x0) WITH RESET, FILTERING=OFF

GO

CREATE TABLE Zipcodes(id uniqueidentifier primary key, code nvarchar(16) not null, …)

GO

Deploying Federated Tables

Just like reference tables, federated tables need to be deployed to the federation members. Federated tables do go through a transformation for their original form in a single database model. Federated tables are required to have the federation key as a column in the database. Some tables naturally contain the federation key but for others like orderdetails table, you denormalize the table to include the federation key (customerID). Federated tables are also anotated with a FEDERATED ON clause that point to the distribution column used in the table. Here is the deployment of the orderdetails table;

USE FEDERATION CustomerFederation(customerID=0x0) WITH RESET, FILTERING=OFF

GO

CREATE TABLE Ordersdetails(orderdetail_id uniqueidentifier primary key,

  order_id uniqueidentifier not null,

  customer_id uniqueidentifier not null, 

…)

FEDERATED ON (customerID=customer_id)

GO

 

One useful piece of information to note here is that even though a denormalization is required, maintenance of the new federation in each table like customer_id columns the oderdetails table above, can be simplified with the following default using the federation_filtering_value() function. The function grabs the federation key value you used to connect into the member in the USE FEDERATION statement and will automatically add the value to new rows. Great way to shortcut through this change.

USE FEDERATION CustomerFederation(customerID=0x0) WITH RESET, FILTERING=OFF

GO

CREATE TABLE Ordersdetails(orderdetail_id uniqueidentifier primary key,

  order_id uniqueidentifier not null,

  customer_id uniqueidentifier not null DEFAULT federation_filtering_value('customerId'), 

…)

FEDERATED ON (customerID=customer_id)

GO

Deploying Objects Other Than Tables

For objects other than tables there is no refactoring to do but you need to ensure connection is switched over to the right part of the federation for the deployment. For objects you need to deploy to members the fanout utility above can help or the code sample can help you code your own scripts. Here is a sample script that deploys a stored procedure to a member.

USE FEDERATION CustomerFederation(customerID=0x0) WITH RESET, FILTERING=OFF

GO

CREATE PROCEDURE proc1(p1 int) AS

BEGIN

….

END

To recap, once the scale-first data model is defined, there are a few incremental changes to the tsql script that expresses the data model. There are the annotations  with FEDERATED ON for tables, also changes to do to your foreign key relationships like including the federation distribution key and there are changes to how you deploy some parts of your schema to members like using USE FEDERATION to switch the connection to the members and fanning out the statements to all members. with those changes you get a scale-first data model deployed to SQL Azure. The last step along the way is to make some changes to the app. We’ll talk about that next.