Implementing a hierarchy structure in a relational data base normally takes a bit of work. The new SQL Server data type for hierarchyID gives a good shortcut to the old methods, makes it faster to get a solution in place and makes it much easier to maintain.
Rather than repeat what is already on the MSDN tutorial, I will give examples of where it works well and where it doesn’t.
First, take a quick refresher course in the difference between a tree and a graph. These links are quick reads and will refresh your memory on those old courses you took at the university so many years ago. This is important to do before continuing because HierarchyID works well with trees but not graphs. Ok, that is a general statement for which there are some exceptions that I explain below.
WHERE IT WORKS WELL
The HierarchyID works well in projects where there is a hierarchical structure where each child has a single parent, like in a product/product classification hierarchy. In this example each product can only belong to one product class. You may have other levels, like product family where each product class can belong to only one product family.
Another, slightly more complex example would be a bill of materials structure for manufacturing a product. A part can belong to a subcomponent, which can belong to another subcomponent, which gets assembled into the final product. It is more complex than the first example because a part can belong to many subcomponents. HierarchyID may or may not work well in this case, it all depends on how many time a part can appear in the final product. If your graph is relatively unconnected, either due to relatively few parents or low links/node, then hierarchyId can be used for this type of graph using a primary key of (RootId int, Path hierarchyId).
All XML documents are trees and the HierarchyID works well with these projects.
WHERE IT DOESN’T WORK
1. When each child node in the graph has multiple parents. We tried this in a genealogy/ancestry project and discovered that to get this to work you would have to add a HierarchyID column for each parent. This is called a highly connected graph, where the number of paths is substantially more than the number of nodes. In this scenario, the traditional hierarchy design for relational databases works well, especially when combined with the CTE query pattern.
2. If subtree movement forms a substantial part of the workload. This is O(1) for parentiD/childId and O(subtree size) for hierarchyId. So if you are constantly updating the tree and cause the nodes to move, then the HierarchyID is not the best solution.
3. If subtree query isn’t a substantial part of the workload. This is O(1) for hierarchyId and O(subtree) for parentId/childId – but if not common this isn’t an advantage. In other words, if you don’t walk the tree often then it may not be worth using the Hierarchy data type.
Kevin Cox, Peter Carlin
Hi, very nice blog... Conglaturations !!
I've got a question ... You said that you've added a HierarchyId column for each parent when you have a multiple parents structure ...
Couldn't we use a strategy using several rows to represent a tree with multiple parents ? For example, one child has two rows that represent its tree (two diferents hierarchyid values) ...
Thank you !!
@Luiz, Although this would technically work the T-SQL queries to return a recordset that was easy to work with would be challenging. A better approach would be to use CLR to implement a graph data type.