The official source of information on Managed Providers, DataSet & Entity Framework from Microsoft
One of the coolest features of the ADO.NET Entity Framework is the ability to use inheritance in your database! Very cool, but what does it mean, and why is it important?
War of the worlds
The world of relational data is based around tuples, which have an inherent two dimensional structure. Think of a database table with rows and columns; it is a very two dimensional idea. Relationships, UDTs, views, etc are all extensions of this two dimensional view. This structure allows for the power we know and love in products like SQL Server, with features like query optimization, indexes, views, etc.
Object oriented programming (OOP) drives higher developer productivity and allows for applications to better represent the world in which they operate. One of the ways that they do this is by utilizing inheritance. Inheritance allows for an application to better model the data, and also provides the ability to extend an existing model and application in interesting ways.
Union of worlds
While the relational model has a lot of power, its way of looking at the world is somewhat at odds with the gestalt of object oriented programming.
The Entity Framework helps bridge these worlds by bringing the power of inheritance to the world of data access in ways that make sense and allow for the best features of both to shine through. To illustrate this, we will look at some of the benefits of OOP, and see how the Entity Framework helps bring these benefits to the store.
In OOP, inheritance is often used to represent objects that have a IS-A relationship. This simply means that one object is a specialization of the other. For example, think of an online store with Products. A Product has certain attributes and data associated with it. Now, there is also a Discontinued Product which has all the same data as a regular Product, but also something more, namely a discontinued date. You could create a new object to represent the Discontinued Product, but it makes a lot more sense to create a new object that inherits from Product, and adds the additional field. After all, programmers are lazy, and why would I want to write more code then I need to!
You’ve done such a great job, and this online store has such a great database and application driving it, it’s ready to expand into new markets. The store is ready to branch out into seasonal products!
This scenario really shows off the extension power of OOP. You simply need to write a new object that inherits from Product and applies some different logic in its methods. For example, you might only show summer products in the summer or automatically provide a 10% discount for summer products in the fall. And all of the existing systems to manage inventory, shopping carts, etc just continue to work, as everything is still made up of Products!
How the Entity Framework bridges the gap
The ideas presented above are fairly basic; generally introduced in OOP 101. So what is the big deal? Let’s look at how we apply these scenarios in the Entity Framework, so we can see how powerful it is. This will also give us an opportunity to dive into the specifics of how the translation of OOP ideas to the relational world works.
To make the scenario specific, let’s use the example of an online hardware store, with both Discontinued Products, and Seasonal Products. The Entity Data Model (CSDL) for the Products might look something like the following.
<EntityType Name="Product" Key="ProductID">
<Property Name="ProductID" Type="Int32" Nullable="false" />
<Property Name="Name" Type="String" MaxLength="500" Nullable="false"/>
<Property Name="Price" Type="Decimal" Nullable="false" />
<EntityType Name="DiscontinuedProduct" BaseType="Product">
<Property Name="DiscontinuedDate" Type="DateTime" Nullable="false"/>
<EntityType Name="SeasonalProduct" BaseType="Product">
<Property Name="OffSeasonDiscount" Type="Float" Nullable="false"/>
The CLR code for this situation is not shown, but we would have the same inheritance hierarchy in the generated CLR classes, with the full capabilities that this implies. The interesting thing with respect to this post is the OO to relational mapping.
Table per Hierarchy
Let’s start with the Discontinued Product, and see how you might model this in a database. You know that you will need to maintain a list of discontinued products (to maintain referential integrity), so you might create a table that contains all products with data like the following.
How do you use inheritance with data like this? Looking at the data, you know that when the Discontinued field is true, we’re dealing with a discontinued product. This is exactly the way that Table per Hierarchy (TPH) works! The idea is that the entire OO hierarchy is contained in one table, with some column(s) acting as discriminator. The value of the discriminator is what informs the Entity Framework what OO type each row contains.
For this setup, the mapping file might look something like the following. The Condition tag identifies the discriminator for the type.
<ScalarProperty Name="ProductID" ColumnName="ProductID" />
<ScalarProperty Name="Name" ColumnName="Name" />
<ScalarProperty Name="Price" ColumnName="Price" />
<Condition ColumnName="Discontinued" Value="0"/>
<ScalarProperty Name="DiscontinuedDate" ColumnName="DiscontinuedDate" />
<Condition ColumnName="Discontinued" Value="1"/>
Table per Type
Perhaps we did not initially intend to sell Seasonal Products, so they were not introduced until after the application was deployed. Modifying a core table of a live application is a good way to get the adrenaline flowing, and risk that new boat you have been saving for. We do not want to modify the Products table to add any new information!
We can create a new table with the Seasonal Product information, and create a one-to-one relationship between the two. Then, we simply need to modify the mapping file to let the Entity Framework know that the additional information associated with Seasonal Products can be found in another table. This is the core of Table per Type (TPT), where each table represents a type, but with only those fields associated with that type in the table. The mapping for this situation might look like the following.
<ScalarProperty Name="OffSeasonDiscount" ColumnName="OffSeasonDiscount" />
Table per Concrete Type
We are going to have a long and successful run; our store is going to grow and prosper. The housing boom continues and we have millions of users. Competitors are entering the market, and so we want to make sure we do not have issues with scaling. One issue is that the number of products will increase and decrease over time, but it won’t change by a huge amount. However, the number of discontinued products is only going to increase with time.
To deal with this, we might want to move all the discontinued products to their own Discontinued Product table, perhaps on a linked server or on another filegroup. However, we don’t want to change the application to make this change. With the Entity Framework we don’t need to!
The Discontinued Product table contains all fields that exist on Discontinued Products, not just those that do not apply to other products (as was the case with TPT). This is the essence of Table per Concrete Type (TPC). Each table represents the entire Entity, and so when they are loaded, you only need to load from that one table. The mapping for TPC might look something like the following.
As you can see from all this, the Entity Framework’s separation of the model from the database allows for some very cool scenarios to work. By using the different inheritance models supported, you can have the best of both the relational and the OO worlds. Viva la difference!
Please leave a comment with any questions that you have, or if you’d like to see more detail on any of the technologies mentioned.
ADO.NET Program Manager