SQL Smack down is back!
It’s been a while since the last issue, but I think this issue should make up for having to wait so long.
The main focus of this article will be trees and hierarchies, and I am going to base most of this article around the example of an employee table, which will show subordination in the structure.
One of the problems with trees when using a database language like SQL, is that trees are hierarchal data, and SQL is designed to store relational data. Below I will go explain the 2 most common ways to model hierarchal data in SQL, and go over how to use one of them.
Most of you have probably seen something similar to this example in a book or on the web, and I am betting that the example told you to build a table with a structure similar to the one below:
Employee| emp | boss | Other Info |
|---|---|---|
| Jamie | NULL | |
| Rudy | Jamie | |
| Julie | Jamie | |
| Mike | Julie | |
| Ed | Mike | |
| Nate | Mike |
The above structure represents what is called an adjacency list model, and is fairly easy to use. The problems with this model are that it is not very normalized, and it does not model subordination very well (if you remove Mike, then Ed and Nate will be lost in the hierarchy). Another problem with this model is that in order to query a tree from it, you most likely will be relegated to using a cursor. There must be a better way…
The answer is nested sets. Let’s consider a table with the following structure:
Employee| emp | lft | rgt |
|---|---|---|
| Jamie | 1 | 12 |
| Rudy | 2 | 3 |
| Julie | 4 | 11 |
| Mike | 5 | 10 |
| Ed | 6 | 7 |
| Nate | 8 | 9 |
This structure is more normalized, since the emp column is not duplicated when used for the boss column. Also, if we removed Mike from the table, Ed and Nate would not be lost in the hierarchy (you will see why soon). There are also some simple queries we can run to get the tree structure.
In order to understand what is going on in the structure above, we will use a simple nested oval analogy. With the data from the tree above, we can see that Jamie, with the span in between his lft and rgt values encompassing all other lfg and rgt values, will have the largest oval, encompassing all other ovals. Rudy has a lft of 2 and a rgt of 3, which means he will be inside Jamie’s oval. Julie has a lft value of 4 and a rgt value of 11, so she will be inside of Jamie’s oval, but not inside of Rudy’s oval, since Rudy’s lft and rgt values do not fall in between Julie’s range. Mike’s lft and rgt values of 5 and 10 do, however, fall in between Julie’s, so his oval is located inside of Julie’s oval, which is in turn inside of Jamie’s oval. Ed and Nate have lft and rgt values not encompassing each other, but that fall inside of Mike’s range, so they will be inside of Mike’s oval. Below I have included an image to help you visualize this idea.
From the above model I hope you can get a basic understanding of how the lft and rgt fields work to group the employees.
Now that you have a basic understanding of how the Nested Sets models works, you may be thinking that it looks nice, but how is this supposed to make it easier to use the tree? While I will admit it looks like a daunting task to write queries to select trees from this structure, and even more difficult to insert an employee in the middle or even at the end of the tree, with the few examples I give below you should have a good foundation of how to write most, if not all of the queries you will need for the tree.
First, here is the code to create the table, and insert the sample data.
CREATE TABLE Employee (
emp VARCHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt)
)
INSERT Employee (emp, lft, rgt) VALUES ('Jamie', 1, 12)
INSERT Employee (emp, lft, rgt) VALUES ('Rudy', 2, 3)
INSERT Employee (emp, lft, rgt) VALUES ('Julie', 4, 11)
INSERT Employee (emp, lft, rgt) VALUES ('Mike', 5, 10)
INSERT Employee (emp, lft, rgt) VALUES ('Ed', 6, 7)
INSERT Employee (emp, lft, rgt) VALUES ('Nate', 8, 9)
Now we will move on to the examples.
SELECT b.*
FROM Employee AS a
INNER JOIN Employee AS b
ON (a.lft BETWEEN b.lft AND b.rgt)
WHERE a.emp = 'Ed'
ORDER BY b.rgt
This will return the following results, showing Ed, then each one of his superiors in order of their subordination, all the way up to Jamie.
| emp | lft | rgt |
|---|---|---|
| Ed | 6 | 7 |
| Mike | 5 | 10 |
| Julie | 4 | 11 |
| Jamie | 1 | 12 |
SELECT a.*
FROM Employee AS a
INNER JOIN Employee AS b
ON (a.lft BETWEEN b.lft AND b.rgt)
WHERE b.emp = 'Mike'
ORDER BY a.rgt DESC
This query will return the following results, showing Mike, then all of his subordinates.
| emp | lft | rgt |
|---|---|---|
| Mike | 5 | 10 |
| Nate | 8 | 9 |
| Ed | 6 | 7 |
You will need to create a stored procedure to do this one, or execute it as a batch in the query analyzer.
BEGIN
DECLARE @right_most_sibling INTEGER;
SET @right_most_sibling = (SELECT rgt
FROM Employee
WHERE emp = 'Mike'); -- Supervisor
UPDATE Employee
SET lft = CASE
WHEN lft > @right_most_sibling THEN lft + 2
ELSE lft
END,
rgt = CASE
WHEN rgt >= @right_most_sibling THEN rgt + 2
ELSE rgt
END
WHERE rgt >= @right_most_sibling;
INSERT INTO Employee (emp, lft, rgt)
VALUES ('Dave', @right_most_sibling, (@right_most_sibling + 1))
END
This will add the new employee, and shift all necessary lft and rgt values accordingly.
SELECT COUNT(a.emp) AS indentation, a.emp, a.lft
FROM Employee AS a
INNER JOIN Employee AS b
ON (a.lft BETWEEN b.lft AND b.rgt)
GROUP BY a.emp, a.lft
ORDER BY a.lft
This will return a straight list that can be used to display the tree in your application, if you want to display the tree in query analyzer, try this:
SELECT CAST(REPLICATE(' ', COUNT(a.emp) - 1) + a.emp AS VARCHAR(15)) AS Tree, a.lft
FROM Employee AS a
INNER JOIN Employee AS b
ON (a.lft BETWEEN b.lft AND b.rgt)
GROUP BY a.emp, a.lft
ORDER BY a.lft
This should return a result set similar to the following, depending on which data you have inserted into the table:
| emp | lft |
|---|---|
| Jamie | 1 |
| Rudy | 2 |
| Art | 3 |
| Julie | 6 |
| Mike | 7 |
| Ed | 8 |
| Nate | 10 |
| Dave | 12 |
| Laura | 16 |
If you have any more questions about trees and hierarchies please let me know. This is a confusing subject, and I am sure I have not done a perfect job of explaining them here.
Disclaimer: Much of the information contained in this article was derived from Joe Celko's article at Intelligent Enterprise, various message board posts, and the author's own personal experience.
Well that's it for this edition of SQL Smack down. As always, if you have any questions about these or any other topics, please talk to Barry.