How Nested Sets Work

I got an email question today about how nested sets work, after the
developer started using my TreeManager component.  I figured that
was a good topic for a blog post, so here it is.

Nested sets operate based on two fields, rpos and lpos (right
position and left position).  They're calculated by doing a depth
first traversal of the tree; that is, numbering each node's left and
right sides as you get there.  So here's a sample tree:

food
/ \
meat fruit
| / \
beef apple pear

An depth first traversal will basically start to the left of 'food',
and run around the entire tree until it gets to the right side of
'food', and number each node it comes to.  Every node will get two
numbers, one for each side.  The numbering will look like this:

1.food.12
/ \
2.meat.5 6.fruit.11
| / \
3.beef.4 7.apple.8 9.pear.10

And in tabular form:

food 1,12
meat 2,5
beef 3,4
fruit 6,11
apple 7,8
pear 9,10

A couple things to notice.  First, the root node always has
number L=1 and R=n*2, where n is the number of nodes.  Leaf nodes
(those with no children) always R=L+1, and non-leaf nodes always have R
- L % 2 == 1 (and even number of interim numbers).  The real
magic, of course, is that any given node's subtree is entirely between the numbers
of the node, which is why they're called nested sets.

Nested sets
have a number of advantages over adjacency lists (using a parentID),
though they have some disadvantages as well.  Advantages include
the blazing speed of pulling out hierarchies,intrinsic ordering of
sibling nodes (meat and fruit or apple and pear in my examples), and
the fact that it's impossible to orphan a node by deleting a node in
the middle of the tree, just to name a few.  Disadvantages include
complexity (though TreeManager alleviates a lot of that), and expensive
structure changes.  However, the tradeoff is usually in favor of
nested sets over adjacency lists, since recall is almost always more
important than updating.

5 responses to “How Nested Sets Work”

  1. Mike Tangorre

    Barney,

    I used to use the adjacency list method until running into nested sets one day at a previous job. I just wanted to mention that Joe Celko is the man when it comes to nested sets and he has some great tutorials and articles over at DB Magazine dot com (not sure of the addy.. google it).

    Mike T

  2. Barney

    Yep, yep, yep. Here's a link to the tutorial I used to help me out (well, same content, different location): http://searchoracle.techtarget.com/tip/1,289483,sid13_gci537290,00.html

  3. Michael Conger

    Agreed. Joe Celko is the man all around. He lays out nested sets in great detail in his fantastic book "SQL for Smarties". I highly recommend it. My dog-eared, coffee-stained copy has been on my desk for 5yrs.

  4. Mike Kear

    I followed the linke to Joe Celko's 2001 Article, which mentions a follow up article. I guessed that would explain why the code didnt work on my database, so I clicked on it to follow it. Sadly, it went nowhere. It led to SearchORacle.com, so I seached in SearchOracle.com for the follow uparticle. I couldn't get access to SearchOracle.com unless I registered, which took me 45 minutes of giving them multiple opportunities to spam me (with 2000 emails a day, mostly spam, what I Really need is yet ANOTHER "newsletter"). After all that, his Follow UP article can be found in the search engine on that site, but when you click on it, it takes you to a "page not found". AAARRRRGH!!!!!!!!!!

    So the code doesnt work on SQLServer, and I still dont know why and still havent seen the followup.

    (sigh).

  5. Barney

    The TreeManager (as it says in the readme) was developed on MySQL, and makes use of the LIMIT clause, which to my knowledge, is only supported by MySQL and PostgreSQL. It's similar to the MSSQL TOP clause, though LIMIT has the ability to do an offset as well.

    Porting the SQL to MSSQL should be pretty trivial, and shouldn't require a real understanding of what's going, just the ability to provide the MSSQL syntax. I believe it'll just be LIMIT to TOP conversions, but without a SQL Server to test on, I can't say for sure.

    If you do port it, I'd be interested in the resulting code so I can include it in the distribution.