Referential Integrity in Federations
Federation provide a great scale-out model with independent member databases. Even thought many local referential integrity rules are supported, federations place restrictions on costly referential integrity rules across federation members. Let me expand on that sentence; you can set up your familiar referential integrity with foreign keys between Customer, Orders and OrderDetails. All you need to do is ensure the federation distribution column is part of your federation key. Here is what the T-SQL would look like; Line# 24 and 36 defines the foreign key relationships.
1: -- Create Orders Federation
2: USE FEDERATION ROOT WITH RESET
4: CREATE FEDERATION Orders_Federation(cid BIGINT RANGE)
6: -- Deploying Schema for Orders_Federation
7: USE FEDERATION Orders_Federation(cid = 0) WITH RESET, FILTERING=OFF
9: -- Deploy Customer, Orders and OrderDetails
10: CREATE TABLE customers(
11: customerid BIGINT,
12: customername nvarchar(256),
14: primary key (customerid))
15: FEDERATED ON (cid = customerid)
17: CREATE TABLE orders(
18: customerid BIGINT,
19: orderid bigint,
20: odate datetime,
21: primary key (orderid, customerid))
22: FEDERATED ON (cid = customerid)
24: ALTER TABLE orders
25: ADD CONSTRAINT orders_fk1 FOREIGN KEY(customerid)
26: REFERENCES customers(customerid)
28: CREATE TABLE orderdetails(
29: customerid BIGINT,
30: orderdetailid bigint,
31: orderid bigint,
32: inventoryid uniqueidentifier,
33: primary key (orderdetailid, customerid))
34: FEDERATED ON (cid = customerid)
36: ALTER TABLE orderdetails
37: ADD CONSTRAINT orderdetails_fk1 FOREIGN KEY(orderid,customerid)
38: REFERENCES orders(orderid,customerid)
Take a look at line#32; the “inventoryid” column; “invetoryid” identifies each part instance individually in the inventory table and lets the app track each shipped item back to each individual part instance through production and delivery. Also imagine that inventory, much like orders, is a large table with high traffic. You want to scale out inventory as well. However inventory table does not align with the partitioning style of orders. The way you mainly access the inventory table is through a partid to place the order or to produce the part. So… you create another federation in your “salesdb” to scale out the inventory table. Here is the T-SQL for the schema;
1: -- Create Inventory Federation
4: CREATE FEDERATION Inventory_Federation(pid UNIQUEIDENTIFIER RANGE)
6: -- Deploying Schema for Inventory_Federation
9: -- Deploy Inventory
10: CREATE TABLE inventory(
11: inventoryid uniqueidentifier,
12: partid uniqueidentifier,
13: warehouseid bigint,
14: arrivaldatetime datetimeoffset,
16: primary key (inventoryid, partid))
17: FEDERATED ON (pid = partid)
19: ALTER TABLE inventory
20: ADD CONSTRAINT inventory_fk1 FOREIGN KEY(warehouseid)
21: REFERENCES warehouses(warehouseid)
23: CREATE TABLE warehouses(
24: warehouseid bigint
25: zipcode int,
26: address nvarchar(4096),
In a single database without federations, one would naturally setup the foreign key relationship between the orderdetails table and the inventory table.
1: ALTER TABLE orderdetails
2: ADD CONSTRAINT orderdetails_fk2 FOREIGN KEY(inventoryid)
3: REFERENCES inventory(inventoryid)
However the T-SQL above does not work with federations. Federations does not allow foreign key relationships across federation members. SQL Azure does not support distributed transactions yet and such large scale distributed transactions will likely cause scalability issues. The cost of the enforcement is generally proportional to the # rows involved in the transaction and complexity of the relationships.
Imagine a case where you bulk insert 100 orders that needs to transact across many parts in the inventory table. That could is a single transaction that potentially involve 100s of parts and 100s of members would need to participate in that transaction and that would consume large amount of system resources and would not likely succeed. So what is the solution for federations?
Strictly vs. Eventually Consistent Data Models
Referential integrity in classic database applications is a powerful way to ensure full and strict consistency of data between normalized pieces of data. You can ensure that no order will be created for a customer that does not exist or a customer won’t be dropped while orders for that customer still exists in the system. However, complex consistency checks can be resource intensive to validate. One can find many references to this added cost in the database literature. For example; you regularly will see references to disabling or turning off constraints and referential integrity for bulk data processing etc.
The cost of enforcement is amplified in scale-out systems given that the local transactions turn into distributed transactions. It is common to find either no support for referential integrity or restrictions in many scale-out systems around such expensive constraints.
Given all this, many web-scale databases choose to loosen consistency requirements for the app and gain scale and perf. Apps instrument to detect and gracefully handle cases where data inconsistencies are found. Many systems implement offline data consistency checking through consistency-checkers which scan the data structure and check data based on logical consistency rules of the app.
Consistency checkers are fairly commonplace in most system software; SQL Server exposes it through DBCC command (DBCC stands for database consistency checker. You can see DBCC CHECKDB etc for information on how SQL Server does this for its lower level structures such as pages and extents). Your application can do the same and in it “DBCC”, it can auto fix or manually fix consistency issues much like the SQL Server version does.
Overall, in absence of support for cross-member referential integrity, I recommend adapting the eventual consistency model. It does place some additional load on the developers but this model allows better scale for the system.
This definitely gives some perspective to these issues. For years we have worked to make sure our data gets into the database in a consistent state and constraints have helped us here.
Now with Azure and Federations, the database engine itself it being broken apart into more modular pieces that cannot rely on assumptions of the past and where operations are much more asynchronous.
What about the idea of still being able to use TSQL to create a FOREIGN KEY constraint, but in federations this did what you suggested, and created not a synchronous constraint but a built in async checker at the server with a formal interface/ stored proc or similar to examine constraint violations generated by this.
If the engine can identify both ends of the constraint being table and columns this goes along way to us being able to catch and solve this on the application end.
Eg: we could discover a discontinued or delete part and then mark the order as needing attention. This would move some of the generic work closer to the server.
Just a thought.