Domain Model Integrity

Unlike my last several posts, this one isn't ORM related.  At least not directly.  If you're using ORM, you necessarily care about your domain model's integrity, as it's a prerequisite for ORM doing it's job, but it has nothing to do with ORM specifically.  The point of a domain model is to be a representation of your business rules and logic, and that means it needs to be internally consistent.

If you're building a SQL-heavy, procedural application, the database is probably the only place your domain model is represented.  But if you're building an object oriented application, your domain model will also be represented in memory as object graphs.  In almost all cases, a given object graph is only a small slice of your entire domain model, but it is a representation and must be kept consistent.

Here's an example of a very simple domain model consisting of Person and Pet classes, where a Pet has an owner (a Person), and a Person has a collection of Pets:

component Person {
  property name="name" type="string";
  property name="pets" type="array[Pet]";
}

component Pet {
  property name="species" type="string";
  property name="owner" type="Person";
}

Just to reiterate, these are NOT persistent types.  They're simple types for in-memory use only.

So what semantics does this model imply?  Or to rephrase, what invariants does this model carry?  The most important semantic is that the relationship between pets and their owners is expressed from both sides (both classes).  More explicitly stated, the domain model is structured such that if you have a Person you can get their Pets, and if you have a Pet, you can get their owner (a Person).  The implications of this is expressed in these two invariants:

assert pet.owner is null || pet.owner.pets contains pet
assert person.pets.every { pet.owner == person }

The first one states that if a Pet has an owner, that owner's "pets" collection must contain it.  The second one states that every Pet in a Person's "pets" must have that Person set as it's owner.  I'm trying to be really deliberate in spelling this out, because it's really important.

Just for a moment, let's take a detour to the relational database world.  If we were to express this domain model in the database we'd have a Person table and a Pet table, and the Pet table would have a foreign key (likely named 'owner_id') that references the Person table's primary key.  SQL allows us to traverse the relationship expressed by that foreign key in either direction, so both relationships (Person->Pet and Pet->Person) are expressed in a single place (the foreign key column).  Both directions are represented together.  A foreign key constraint (which all RDBMSes support), on the column is doing nothing more than instructing the database to enforce these invariants.  This is all second nature, and we don't even think about it when we use a database to represent our domain model.

Now back to the in-memory representation.  We still need to enforce these invariants, but in memory we have to deal with references (pointers), and references only point in one direction.  That's why we have to have both the 'pets' property (a Person's references to Pets) and the 'owner' property (a Pet's reference to a Person), but in the database we only need one foreign key column (Pet.owner_id).  The relationship between Person and Pet objects is actually expressed in a pair of references.

The problem with this arrangement is that you, in effect, double represent your relationships.  Both invariants must remain true, and since each invariant is represented by it's own reference in the model, you have to synchronize changes to those references.   When you set the owner of a Pet, you must also add that Pet to the owner's "pets" collection.  When you remove a Pet from a Person's "pets" collection, you must also remove the Pet's owner reference.  If you don't keep these in sync, one of your invariants will be false, and that means your domain model is in an invalid/inconsistent state.

When your domain model is in an invalid state, your application falls apart.  Every assumption you make in your application is suddenly unreliable, because they're predicated on your business rules, and your business rules are expressed through your domain model.  An invalid domain model means your business rules were violated, and anything you do from this point forward will be suspect.  I'm going to say it again: this is really important.  If your domain model is in an invalid state, your application has failed.  Period.  End of story.  Your only recourse is to revert it back to it's last known consistent state and throw away all pending operations.

What about relationships that are not bi-directional?  For example, perhaps your model has PrivateEye and Subject types.  Clearly the PrivateEye needs to know about his Subject, but it'd be kind of silly if the Subject knew about the PrivateEye.  In this case the relationship only moves one way, so there is only one reference (from PrivateEye->Subject), and there is no invariant.  When we put this in the database, however, we have exactly the same structure as the bi-directional Person<->Pet relationship: a foreign key that can be traversed in two directions.  With the database representation of the model you can't express the concept of a one-directional relationship.  This is a powerful differentiator for in-memory models, since it gives you much finer control over the semantics of your model than a database could ever provide.

So where does this relate to ORM?  Just like everything else in your application, Hibernate depends on your invariants being true in order to persist your model to the database.  If they're not true, Hibernate can't hope to do it's job correctly.  A huge number of "problems" that people starting out with Hibernate face have nothing to do with Hibernate itself, but rather are caused by an invalid in-memory domain model.  Coming from the world of procedural, SQL-based persistence, you don't necessarily have to worry about an in-memory domain model's integrity, which means that you can write what amount to buggy applications where the bugs never manifest themselves.

Bottom line is that if you're using an in-memory domain model, you simply must ensure the invariants of that model remain true.  More specifically, you must set both sides of every bi-directional relationship.  If you don't, you're just asking for punishment, both from your software tooling and from users/clients of your application.

6 responses to “Domain Model Integrity”

  1. Martijn van der Woud

    Great post Barney, as usual I am impressed by your analytic capabilities.

  2. Bob Silverberg

    Another gem indeed.

    It might be worthwhile to spell out specifically how this relates to ColdFusion ORM. I assume that you are hoping that your readers will infer from this that it is critically important to set both sides of a bi-directional relationship, as that is a logical conclusion from this discussion.

    I fear, however, that some people may not infer that simple fact from this excellent explanation of _why_ this is so important. I'm certainly not advocating watering down your message, but, if you see fit, a final note explicitly referencing that rule (always set both sides of a bi-directional relationship) might enable more people to make use of this information.

  3. Raymond Camden

    I hate to add a "Me Too" comment w/ nothing intelligent to it – but you've really given me something to think about here. I've just released my first large scale ORM app and I really need to review the model and ensure I apply what you've talked about here.

  4. Mike Chandler

    Hi Barney, this is an excellent post. I have situations like these in a document management system I'm trying to design. An Employee has a collection of Documents. A Document as an owner of type Employee. Your points are very eye opening. Do you believe it's better to avoid a design where there are these bi-directional relationships, or are you just suggesting that we need to be careful and mindful of considering them in our implementations/unit tests?

  5. Jamie Krug

    Great stuff, thanks, Barney. This is timely, as I was recently re-reading some of this excellent book: "Fundamentals of Object-Oriented Design in UML" by Meilir Page-Jones (http://www.amazon.com/Fundamentals-Object-Oriented-Design-Meilir-Page-Jones/dp/020169946X/ref=ntt_at_ep_dpt_1). There's some serious in-depth discussion of class invariants, and so much more. Some of it is semi-intuitive (with experience anyway), but much of it is just great reading and knowledge to have when you tackle object oriented design. Anyway, this just related, so I thought I'd suggest that great book for anyone looking to dive deeper into this kind of goodness. Cheers.