If you’ve been following my blog you should remember a couple of posts about hierarchical data in SQL Server, and how Common Table Expressions can be a great way to extract that data. If you’ve not read them, you can find them here and here.
In SQL Server 2008 we have a brand new shiny data type called “HierarchyID”. This data type is designed precisely for storing hierarchical data in a compact, efficient manner that makes it easy to perform common operations, and hence massively simplifies matters when retrieving data.
So I thought we could briefly revisit the two scenarios I’ve mentioned before and see how SQL Server 2008 makes our lives easier. Remember in my previous posts we had a table that looked something like this;
CREATE TABLE dbo.Employee
[Id] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
[ManagerId] INT NULL
ALTER TABLE [dbo].[Employee]
WITH CHECK ADD CONSTRAINT [FK_Employee_Employee]
REFERENCES [dbo].[Employee] ([Id])
This is really simple – we just have a record with an ID, and a foreign key to the “parent” record (the employee’s manager in this case).
With HierarchyID, the data type stores the whole path to the current record in the “Id” column. This means we don’t have to do crazy recursion to work out who my boss’ boss is – it’s all there in the path all the time. This also means I don’t need a foreign key back to another record in the same table! Confused? Let’s see it in action.
CREATE TABLE EmployeeWithHierarchyID
[Id] hierarchyid not null primary key,
[Name] nvarchar(50) not null
... and that’s it. To insert some records, a few lines such as the following can be used;
INSERT INTO EmployeeWithHierarchyID ([Id], [Name]) VALUES
(hierarchyid::GetRoot(), 'Simon Ince'),
('/1/', 'Someone Else'),
('/1/1/', 'Another Person')
* I’ve used another nice feature of SQL 2008 here – separating multiple lines of data to insert with commas.
This “path” syntax of ‘/1/1/’ and so on means that I will be at the root of the tree, Someone Else will report to me, and Another Person will in turn report to them. Once I’ve filled my table with data, I can select it out as follows;
Id.ToString() AS [Path],
Id.GetLevel() AS [Level],
I’ve used a couple of functions here on the HierarchyID data type to demonstrate what you can do;
What you should be able to see here is a few things;
1. The HierarchyID field is stored in a binary format, that isn’t human readable.
2. The ToString function immediately shows the path that the binary format represents. So I know for example that Andrea Dunker reports to Rob Caron, who in turn reports to Ellen Adams... just by reading the “path” output by the ToString function.
3. The GetLevel function calculates how many steps down the hierarchy the current record is. It can do all this based on just the binary content of the HierarchyID field again.
Good huh? OK, so let’s get to replicating the queries I created in my other posts.
In the first scenario all I wanted to do was retrieve all records for employees that reported (directly or indirectly) to a specific employee. For example, every employee reports at least indirectly to Ellen Adams in the data above, but only Terry Adams and, Dana Birkby, and Rob Caron report directly to her.
It turns out this query is unbelievably simple;
WHERE Id.IsDescendantOf('/5/') = 1
In this case, I select everyone that reports to Rob Caron. And that’s it! The IsDescendantOf function handles the rest for you – and of course it is very efficient because all it needs is the path information encoded in the Id field.
Of course, here I’ve used a hard coded path value as the parameter to IsDescendantOf, but I could equally have selected a value from the database based on a name, or passed one from somewhere else.
The next scenario was how to select the entire management chain of an employee. So given a junior staff member, how do I select his boss, his boss’ boss, his boss, and so on? This is a little more complex, as we need to walk the stack going back up the chain.
At first thought, I assumed I’d need another Common Table Expression; perhaps something like this?
WITH Ancestors(Id, [Name], AncestorId)
Id = '/5/6/9/10/'
INNER JOIN Ancestors a
ON e.Id = a.AncestorId
SELECT *, Id.ToString() FROM Ancestors
All you need to do is replace/parameterise the ‘/5/6/9/10/’ path that I’ve used to select the employee we’re interested in. This works fine, but is it the optimum way of achieving it?
Nope. Let’s try again!
DECLARE @TheEmployee hierarchyid
SELECT @TheEmployee = Id
WHERE [Name] = 'Johnson, Barry';
WHERE @TheEmployee.IsDescendantOf(Id) = 1
This time I’ve selected the employee by name, just so I can get hold of an instance of a HierarchyID data type that relates to him. This means I can perform operations on it just like I have done on columns in the table before. So next, I just reverse the order of my IsDescendantOf call in the WHERE clause – now I say “Is Barry Johnson a descendant of the current record?”. If he is, we include it in the result set.
Well, this has been a very quick post just to get you started with HierarchyID, but I hope it has made it obvious just how much easier and more efficient it can be for most hierarchies. Enjoy!