Tag Hierarchies

About four and a half years ago I wrote a little event tracking app that accepts a timestamp and a list of tags, and then provides a pile of ways to report on the data.  Think Twitter, except a couple years earlier, and designed for consumption by software, not people, at least at the individual event level.  The app's been working magically since then, and now that there are a few people using it I realized I needed to support tag hierarchies.

The specific use case in question is aggregate reporting.  Say you track what you eat, and then you want to get a report of how often you eat vegetables.  To this point you've had two choices: retag all the events with 'celery', 'corn', 'carrots', etc. with an extra 'vegetable' tag, or write your report to OR together all the different veggies (remembering to update it every time you eat a new one for the first time).

On the flip side, if you just have a hierarchy of tags, you can drop all those tags underneath the 'vegetable' tag and then report on 'vegetable' directly, which has the implicit meaning of "itself and all it's descendants".  This is obviously much more desirable.

However, overlying a hierarchy is not without problems.  Tags are inherently free-form, and this app is no exception.  As such, the unique key needs to be the tag name by itself, not the combination of the tag name and it's parent.  So the solution I adopted is to expose tags in a flat structure, excepting in the actual hierarchy editor (for which I used the fantastic ExtJS Tree Control), and a slight tweak to the querying language.

Previously, you searched for events using a "tag:celery" style query.  With no hierarchy, that matched exactly the celery tag.  With a hierarchy, the semantics have changed slightly to the celery tag or any of it's descedants.  If you want the old behaviour, you'd use "tag:=celery".

Behind the scenes, I'm using nested sets for storage which makes those descendant queries lightning fast, though I ran into some interesting issues because there is only one storage table for all tags across all users, and each user's tag tree is potentially multi-rooted.  Neither are inherently difficult to deal with, but both were new problems and required a bit of reworking to my treemanager component.

All in all, the process was really painless, and I'm quite pleased at how transparent the overlay of hierarchy ended up being to the general functioning of the system.  In particular, ExtJS was a dream to work with.  It's fast, easy to use, easy to develop with, paired quite nicely with jQuery (what drives the rest of the app), and ended up requiring less than 70 lines of code to create the control, do lazy loading of children, add new nodes to the tree, reorder and rename nodes, and do all the backend calls to update the DB as needed.

Comments are closed.