BarneyBlog » database

Railo And AOP Transactions

barneyb — Fri, 23 Jul 2010 19:03:15 +0000

If you're reading this, chances are good you're a software developer.Â And that means you never touch the DB outside a transaction, right?Â Of course, right.

Most of the time, transactions are a bit of a bear because their use is orthagonal to your business tasks.Â To put that another way, business operations flow vertically from UI to backend to UI, and the various operations in your app typically flow mostly in separate parallel channels.Â Transactions go across your application's structure, needing to be uniformly applied to all those operations, but being independent of all of them.Â The "right" term for this is a "cross-cutting concern."

Fortunately, there are a pile of people in this world who are way smarter than I, and they've come up with Aspect Oriented Programming as a way to deal with these cross-cutting concerns.Â The typical example shown is with logging, but transaction management is rather more useful, I think, isn't any more complicated, and is also more specific to AOP (whereas logging could be done with event-driven programming as well).Â I'm not going to go into AOP, but if you're not using it, you should spend the time to learn a bit about it.Â Really.

So the idea is to leverage AOP to automatically wrap your business methods with transactions.Â In the CFML world, that means ColdSpring and its AOP support (which is modelled after Java's Spring framework).Â Unfortunately, that's the easy part.Â Since CFML servers abstract away DB connections (behind DSNs), it's hard to get ahold of the proper context to do the transaction demarcation correctly from your application code.Â Somehow you have to know what the engine is doing behind the scenes so you can start/stop transactions at the right places.

When I wrote transactionadvice.cfc many years ago, I used a psuedo-ThreadLocal for tracking transactional method depth.Â That way it can apply transactions at the top level, but skip them lower (nested) levels.Â It works, but it doesn't handle multithreaded requests (i.e., using CFTHREAD) correctly, so it's far from ideal.Â Using a real ThreadLocal would be better, but it requires Java or Groovy, and that's a dependency I don't want to require.Â But for the majority of cases, it works fine.

Today, however, I discovered that Railo has this method:

getPageContext().getDataSourceManager().isAutoCommit()

It will tell you whether you're currently in autocommit (non-transaction) mode.Â With this direct check of the transactional status, all the issues simply vanish, and the advice can be whittled down to this:

Copiously excellent, if you ask me.Â About the only thing that would make it better is if CFML grew a isAutoCommit() or isInTransaction() method so you don't have to dig into the underlying Java bits in an engine-specific way.

Embrace Your [HS]QL

barneyb — Wed, 05 May 2010 19:13:08 +0000

I'd like to start with a confession. I love SQL (and it's Hibernate counterpart HQL). Its simplicity and elegance is remarkable. Its expressive power is stunning. It is almost universally disliked by developers. Why is that?

The answer, at least to me, is pretty straightforward. Developers spend a lot of their time programming, so that's where their strengths lie. Most programming is done with general-purpose languages (GPLs – not to be confused with the Gnu Public License), which are generic, Turing-complete environments. They a real-world-like typing system (numbers, objects, lists, etc.), and often mutable data structures. Developers get very good at using such languages to get stuff done. This isn't a revelation.

SQL, on the other hand, is a domain-specific language (a DSL). It is not Turing-complete, uses set-based data structures (which are immutable – more on this later), and just generally does stuff in a totally different way than a GPL. This can create friction for developers, because they have to change mindsets when shifting between their general programming tasks (with a GPL) and their querying tasks (with SQL).

This bit of friction, however, is a source of great power if you're willing to invest a little in the SQL way of doing things.

The Bad Stuff

I'm not going to deny that writing out big-ass INSERT and UPDATE statements is a bear. CRUD in general – SQL or otherwise – is simply an unpleasant task that should be avoided at all costs. Fortunately there exist myriad tools and frameworks to alleviate with this problem (Hibernate being a shining example). For the purposes of this discussion I'm only going to be talking about data recall (i.e., SELECT statements). The C_UD part of CRUD is a different discussion, and hopefully you're using some "magic" way of dealing with that stuff.

The thing that is interesting to note, however, is that all the toolkits that I've seen to isolate you from the C_UD part of CRUD still provide a query language for the _R__ part. So clearly a query language (SQL or otherwise) is an important part of the way things work, even if the whole point of a toolkit is to isolate you from SQL itself.

The Good Stuff

First, the "S" in "SQL" stands for "Structured". Queries using SQL should be structured into little atomic pieces (subqueries) that are easily understandable and then assembled in to larger structures (with no semantic change to the composed bits). Think of SQL as a box of LEGOs where you build up larger and larger structure from incredibly simple base components. Here's an example of what I mean, working through it first with HQL, and then with SQL.

HQL

Consider this HQL example (from the cf-orm-dev mailing list):

select distinct p
from Post p
  left join p.categories c
where c.name like 'top%'

This selects the distinct Posts that have a category which starts with "top". Straightforward. But not really a very good use of the structured nature of HQL, because we're really asking two nested questions. The first is the obvious – which Posts – but the second is more subtle: which Categories start with "top". Consider this version:

from Post p
where exists (
  from p.categories
  where name like 'top%'
)

This is better; now we have a representation of both questions we're asking, as well as the "nested" relationship between them. But this is still a little wonky because we're doing a dependent subquery when the nested question (which Categories start with "top") is really completely isolated from any association with Posts. So consider this one:

from Post p
where any elements(p.categories) in (
    from Category
    where name like 'top%'
  )

Now we have the inner question totally isolated from the outer question, and the association between the two questions is expressed via the "any elements in" clause. Very expressive, very modular, very structured. And unfortunately, very unsupported by RDBMSs. This is a valid query construct, but I don't know of any databases that actually support this ANY..IN construct (I tried MS SQL, MySQL, and Postgres). So you can't actually use it. Of course, a better HQL and/or SQL preprocessor could transform it, but they don't do that either. In my opinion this is really Hibernate's problem to fix; it should generate different SQL for this construct.

Unfortunately, there's an … uh, "un-silver" … lining. In this particular case, most DBs will execute the JOIN more quickly than the subquery, because the subquery is correlated (i.e. must be executed once per row in the outer query). Aside from the semantic separation, that's another reason that the third (unsupported) query is better. Since the subquery in the third rev is not correlated it only needs to be run once and the same result can be used to filter all the Post rows.

SQL

Here's the SQL equivalent of the first HQL query:

select distinct p.*
from Post p
  left join link_post_category l on p.id = l.post_id
  left join Category c on l.category_id = c.id
where c.name like 'top%'

This illustrates an interesting fact that the HQL doesn't show as clearly: in order to do the DISTINCT, the SQL engine must do a comparason on every column of every row. If you're pulling back a few BLOB or CLOB fields this can be prohibitively expensive. But if your column width is small, this is still fast because it suits the optimizer. Now here's the correlated subquery version:

select *
from Post p
where exists (
    select *
    from link_post_category l
      left join Category c on l.category_id = c.id
    where l.post_id = p.id
      and c.name like 'top%'
  )

Now we've got rid of that "every column of every row" problem, but replaced it with a correlated subquery. So lets ditch that next:

select *
from Post p
where id in (
    select l.post_id
    from link_post_category l
      left join Category c on l.category_id = c.id
    where c.name like 'top%'
  )

Now the subquery is pulling back all the post_ids we want to retrieve (in one step), and the outer query is just doing a primary key lookup to grab those rows. This is going to be crazy fast, but we still have that JOIN in there which makes the subquery kind of nasty. One more revision:

select *
from Post p
where id in (
    select post_id
    from link_post_category
    where category_id in (
        select id
        from Category
        where name like 'top%'
      )
  )

With a pair of nested subqueries we're down to that LEGO-like ideal. Starting at the innermost piece, we pull all the Categories that start with "top". Then we find all the post_ids that are assigned those Categories. Finally we pull all the Posts with those post_ids. With MS SQL's query optimizer this will have the same performance characteristics as the prior version (i.e., crazy fast), but MySQL will unfortunately struggle a little bit. online casino Malaysia

Remember that final HQL query that won't actually execute on any DB I found? Here's the SQL for it:

select *
from Post p
where any (
    select category_id
    from link_post_category
    where post_id = p.id
  ) in (
    select id
    from Category
    where name like 'top%'
  )

This is roughly equivalent to the second SQL version, but since it won't run, it's academic.

The SQL Way

If you're going to use SQL effectively, you have to think about SQL in it's terms, it's domain. As a DSL, it's not designed to be a GPL. It's designed to be used from a set theory perspective and expressed in a very structured fashion. If you try to use it like a GPL you'll be able to get the job done, but you'll also get frustrated and build some really nasty queries that will be hard to maintain/reuse. And then you'll hate SQL, and then you'll try to avoid it (and other query languages) at all cost, and really be missing out.

As an example, I challenge you to dig into the Hibernate Criteria API, which is a programmatic way of querying (i.e., an API for a GPL). Try and create a Criteria query that expresses this question "what is the average age of males with either zero or two or more children" in a way that is even close to readable and concise. Here is the HQL version (which is both readable and concise):

select avg(age)
from Person
where sex = 'male'
  and (children.size >= 2
    or children.size = 0
  )

As way of a hint, you'll need a few Restrictions instances along with a Projections instance, plus some glue bits.

Immutability

I mentioned the immutability of SQL data structures way back at the beginning, and now I'll follow through. Obviously state can be changed via UPDATE, INSERT, and DELETE statements, but that's not what I mean. When you're running a query you have the ability to create temporary data structures (typically subquery results). They are immutable. Even if you're using a "programming" SQL extension (e.g., T-SQL or PL/SQL), state is still typically immutable. You can change variable references, but you can't go changing the objects they reference.

For example, if you have a rowset and want to add a new row, how do you do it? You don't add a new row, you select the UNION of the current rowset and a new (very likely ad hoc) rowset containing the row to add. This is a very different paradigm than most GPLs use, where state is typically mutable, at least for reference types. Consider a Java List. Calling .add(o) doesn't create a new List with the new item added to it, it just adds it to the existing list.

Functional languages are an exception, as many (most?) embrace the concept of immutable state (or zero side effect) which removes any need to consider concurrency, allows programs' behaviour to be "proven", etc. In a zero side effect environment, calling .add(o) would create a new List with the new item added and return the new list (the original list would be discarded). It's definitely a different approach to doing things from what most people are used to, but provides some incredibly powerful capabilities.

Stop Already

Ok, ok. That ended up being about ten times longer than I intended. I'm sorry. But if you got this far you've hopefully grown a bit of an appreciation for SQL, or at least the knowledge that there might be something there to appreciate. And do keep in mind that the examples I've given are all really simple. The power of the "Structured" is sort of nice with queries just a few lines long, but it really shines when you're dealing with queries that stretch over hundreds of lines, especially when they're dynamically constructed. That incredibly granular breakdown of functionality allows you to reuse bits and pieces very effectively, and the zero side effect nature means that you are guaranteed not to have some random side effect ruin your day.

Domain Model Integrity

barneyb — Wed, 14 Apr 2010 16:35:23 +0000

Unlike my last several posts, this one isn't ORM related.Â At least not directly.Â If you're using ORM, you necessarily care about your domain model's integrity, as it's a prerequisite for ORM doing it's job, but it has nothing to do with ORM specifically.Â The point of a domain model is to be a representation of your business rules and logic, and that means it needs to be internally consistent.

If you're building a SQL-heavy, procedural application, the database is probably the only place your domain model is represented.Â But if you're building an object oriented application, your domain model will also be represented in memory as object graphs.Â In almost all cases, a given object graph is only a small slice of your entire domain model, but it is a representation and must be kept consistent.

Here's an example of a very simple domain model consisting of Person and Pet classes, where a Pet has an owner (a Person), and a Person has a collection of Pets:

component Person {
  property name="name" type="string";
  property name="pets" type="array[Pet]";
}

component Pet {
  property name="species" type="string";
  property name="owner" type="Person";
}

Just to reiterate, these are NOT persistent types.Â They're simple types for in-memory use only.

So what semantics does this model imply?Â Or to rephrase, what invariants does this model carry?Â The most important semantic is that the relationship between pets and their owners is expressed from both sides (both classes).Â More explicitly stated, the domain model is structured such that if you have a Person you can get their Pets, and if you have a Pet, you can get their owner (a Person).Â The implications of this is expressed in these two invariants:

assert pet.owner is null || pet.owner.pets contains pet
assert person.pets.every { pet.owner == person }

The first one states that if a Pet has an owner, that owner's "pets" collection must contain it.Â The second one states that every Pet in a Person's "pets" must have that Person set as it's owner.Â I'm trying to be really deliberate in spelling this out, because it's really important.

Just for a moment, let's take a detour to the relational database world.Â If we were to express this domain model in the database we'd have a Person table and a Pet table, and the Pet table would have a foreign key (likely named 'owner_id') that references the Person table's primary key.Â SQL allows us to traverse the relationship expressed by that foreign key in either direction, so both relationships (Person->Pet and Pet->Person) are expressed in a single place (the foreign key column).Â Both directions are represented together.Â A foreign key constraint (which all RDBMSes support), on the column is doing nothing more than instructing the database to enforce these invariants.Â This is all second nature, and we don't even think about it when we use a database to represent our domain model.

Now back to the in-memory representation.Â We still need to enforce these invariants, but in memory we have to deal with references (pointers), and references only point in one direction.Â That's why we have to have both the 'pets' property (a Person's references to Pets) and the 'owner' property (a Pet's reference to a Person), but in the database we only need one foreign key column (Pet.owner_id).Â The relationship between Person and Pet objects is actually expressed in a pair of references.

The problem with this arrangement is that you, in effect, double represent your relationships.Â Both invariants must remain true, and since each invariant is represented by it's own reference in the model, you have to synchronize changes to those references.Â Â When you set the owner of a Pet, you must also add that Pet to the owner's "pets" collection.Â When you remove a Pet from a Person's "pets" collection, you must also remove the Pet's owner reference.Â If you don't keep these in sync, one of your invariants will be false, and that means your domain model is in an invalid/inconsistent state.

When your domain model is in an invalid state, your application falls apart.Â Every assumption you make in your application is suddenly unreliable, because they're predicated on your business rules, and your business rules are expressed through your domain model.Â An invalid domain model means your business rules were violated, and anything you do from this point forward will be suspect.Â I'm going to say it again: this is really important.Â If your domain model is in an invalid state, your application has failed.Â Period.Â End of story.Â Your only recourse is to revert it back to it's last known consistent state and throw away all pending operations.

What about relationships that are not bi-directional?Â For example, perhaps your model has PrivateEye and Subject types.Â Clearly the PrivateEye needs to know about his Subject, but it'd be kind of silly if the Subject knew about the PrivateEye.Â In this case the relationship only moves one way, so there is only one reference (from PrivateEye->Subject), and there is no invariant.Â When we put this in the database, however, we have exactly the same structure as the bi-directional Person<->Pet relationship: a foreign key that can be traversed in two directions.Â With the database representation of the model you can't express the concept of a one-directional relationship.Â This is a powerful differentiator for in-memory models, since it gives you much finer control over the semantics of your model than a database could ever provide.

So where does this relate to ORM?Â Just like everything else in your application, Hibernate depends on your invariants being true in order to persist your model to the database.Â If they're not true, Hibernate can't hope to do it's job correctly.Â A huge number of "problems" that people starting out with Hibernate face have nothing to do with Hibernate itself, but rather are caused by an invalid in-memory domain model.Â Coming from the world of procedural, SQL-based persistence, you don't necessarily have to worry about an in-memory domain model's integrity, which means that you can write what amount to buggy applications where the bugs never manifest themselves.

Bottom line is that if you're using an in-memory domain model, you simply must ensure the invariants of that model remain true.Â More specifically, you must set both sides of every bi-directional relationship.Â If you don't, you're just asking for punishment, both from your software tooling and from users/clients of your application.

Don't Forget Inverse="true"!

barneyb — Fri, 09 Apr 2010 20:36:37 +0000

When you're mapping relationships with Hibernate, you have to reconcile the uni-directional nature of Java references with the bi-directional nature of RDBMS foreign keys. Consider this simple database schema with a single relationship:

              +----------+
+--------+    | PET      |
+ PERSON |    +----------+
+--------+    | id       |
| id     |----| owner_id |
| name   |    | species  |
+--------+    +----------+

That works all well and good, and we can query across the relationship in either direction:

-- get pets
select pet.*
from pet
where ownerid = 456

-- get owner
select person.*
from person
where id = (
    select ownerId
    from pet
    where petId = 123
  )

When you get into the world of objects, however, the interplay between Person and Pet is a little different because we actually have two associations (Person.getPets() and Pet.getOwner()). With the SQL model there are two tables, one key, and two queries. WIth the object model there are two objects and two properties. So when mapping these two entities (at least with CF9 ORM) you'll use code something like this:

component Person {
  property name="id" fieldType="id" generator="native";
  property name="name" type="string" unique="true";

  property name="pets" fieldType="one-to-many" cfc="Pet";
}

component Pet {
  property name="id" fieldType="id" generator="native";
  property name="species" type="string" unique="true";

  property name="owner" fieldType="many-to-one" cfc="Person" fkColumn="owner_id";
}

And then, if I were to have a cat, you might want to run some code like this:

p = new Person();
p.setName("Barney");
c = new Pet();
c.setSpecies("cat");
c.setOwner(p);
p.addPet(c);

In particular note that two references are created between the objects: c.setOwner(p) creates a c -> p reference, and then p.addPet(c) creates a p -> c reference. If you don't create both references, you're going to get weird behaviour because either I'll believe I own a cat while the cat believes it has no owner, or I'll believe I'm petless while the cat thinks it belongs to me. Either of those situations is invalid, at least from the perspective of this contrived example.

That double reference causes an issue, however, when Hibernate comes along and tries to persist the object graph to the relational database structures, because both the Person and the Pet claim ownership over the in-DB relationship between them. So you'll get an invalid series of SQL statements as Hibernate tries to persist the relationship twice, once from each end.

Fortunately, the solution is trivially simple: inverse="true". By adding that attribute to one of the related properties you're telling Hibernate that it's the "inverse" of some other already-mapped property. So when it comes to persistence operations, Hibernate should ignore the references because they're the inverse of some other already-existing references. I know that girl porn stars are not all that interested in what I'm doing. I know it is a little bit embarrassing on the whole but I like it and I love it. Sometimes I like to touch myself and get off on it. I have a little bit of a problem with my own face. This is the other reason why it's vitally important to set up both references when you're creating objects: not only will you get the invalid state discussed above, depending on how you've set up inverse relationships you might not get the transitive persistence you expected.

So the corrected entity code from above looks like this:

component Person {
  property name="id" fieldType="id" generator="native";
  property name="name" type="string" unique="true";

  property name="pets" fieldType="one-to-many" cfc="Pet" inverse="true";
}

component Pet {
  property name="id" fieldType="id" generator="native";
  property name="species" type="string" unique="true";

  property name="owner" fieldType="many-to-one" cfc="Person" fkColumn="owner_id";
}

It's worth mentioning that if you don't know about inverse, the "correct" solution might appear to be only creating one reference. Yes, that'll let you get your entities into the database, and then when Hibernate pulls them back out for you on the next request the bi-directionality of the relationship will be created. But you'll run into problems with certain types of update operations, so it's not really a solution. You need to use inverse="true".

This applies to all bi-directional relationships, not just one-to-many/many-to-one. A two-way one-to-one or many-to-many carries the same requirments. The symptoms are a little different, however, if you forget. In particular, with many-to-many you won't get an error. Instead, you'll just get duplicate rows in your link table. This can lead to really weird duplicate membership problems in collections, and well a troubles with deleting and/or updating relationships, but again, they're not going to cause hard fails.

In general, however, real many-to-many relationships are rare. In the majority of cases, the "middle" of that relationship really ought to be an entity in it's own right, splitting the many-to-many into two one-to-many relationships. For example, people are often on many mailing lists , so you might think Person-MailingList would be a perfect fit for a many-to-many. But really there is a Subscription object in the middle that links the Person to the MailingList and has info like createDate and stuff. So we're back to a pair of one-to-many relationships. I'm not saying that many-to-many doesn't exist in the real world, just that if you find one, you should step back for a second and see if you're actually missing an entity in the middle.

But in any case, the point is that when you're using ORM, you have to be really careful about your semantics, and inverse="true" (and it's not-always-needed compatriot mappedby="prop") is an easy one to overlook.

Why ORM Transactions are Crucial

barneyb — Wed, 07 Apr 2010 18:18:58 +0000

Continuing the transactions thread, I wanted to talk about a couple specific reasons that transactions are vitally important in ORM applications.Â Even more than in raw-SQL applications.Â I consider them mandatory anywhere, but I would argue that if you feel compelled to skip them for some reason, you'd be better off leaving them out of your SQL applications than your ORM applications.

But before I dive into the specifics, I wanted to point out a rather fundamental issue I see with the CFML community: transactions are an afterthought.Â I've been using CFML for over a decade and the laissez faire approach to transactions never ceases to amaze me.Â Not individual developers, mind you, I'm talking about CFML developers as a group.Â If you look at the Spring documentation, for example, the first topic under "Data Access" is transactions.Â Not DAO support, not JDBC, not ORM.Â Transactions.Â And that is absolutely the correct order.Â We as a community need to get this one sorted.Â It'll result in better CFML applications, richer CFML toolkits (like native transaction support in ColdSpring and others), improve the public opinion of CFML developers, and just generally be better all around.

Now back to transactions and ORM…

Reason #1

Let's assume for the moment that you're processing the submission of an account creation form (name, username, password) using CF9 ORM.Â Here's some code you might use (minus any architecture and validation):

a = new com.site.Account();
a.setName(attributes.name);
a.setUsername(attributes.username);
a.setPassword(attributes.password);
entitySave(a);

This is pretty straightforward: create an Account instance, set the properties, and all entitySave().Â So where does the actual INSERT statement happen?

No one knows!

It's possible that it'll happen at the entitySave() call (depending on how you're doing unique identifiers).Â It's possible it'll happen at the end of the request (depending on how "this.ormsettings.flushAtRequestEnd" is set in Application.cfc).Â It's possible it will never happen at all and the submitted data will be lost.

Egads!Â Really?

Yes, really.

If we simply wrap that code with a transaction block, all our problems go away:

transaction {
  a = new com.site.Account();
  a.setName(attributes.name);
  a.setUsername(attributes.username);
  a.setPassword(attributes.password);
  entitySave(a);
}

Now we can state with complete certainty that by the time the transaction block closes not only will the INSERT have executed, but the new record is assured to be safely persisted within the database.Â Thank goodness!

It's not all roses though, because using transaction blocks like this can screw with your sessions (ORM sessions, not the session scope).Â Adobe has fixed the issue, but the initial CF9 bits are broken.Â So you really want to use Hibernate transactions instead of CFML ones to avoid the issue.Â If you use ColdSpring, I have a TransactionAdvice specifically for addressing this problem.Â If not, you just need to write the above code like this:

txn = ormGetSession().beginTransaction();
try {
  a = new com.site.Account();
  a.setName(attributes.name);
  a.setUsername(attributes.username);
  a.setPassword(attributes.password);
  entitySave(a);
  txn.commit();
} catch (e) {
  txn.rollback();
  throw e;
}

It's a couple extra lines of code, but it'll save your sanity – particularly on transactions that return a result which you want to manipulate later.Â And really, you should just use ColdSpring to manage your service layer and use AOP to do it all magically.Â :)

Reason #2

CF9 shipped with "this.ormsettings.flushAtRequestEnd = true" as the default, which means that side effects can trickle into your database without you being aware of them.Â The first thing you should do in any CF9 ORM application is set flushAtRequestEnd to false.Â In fact, stop reading and go do that right now for any existing applications you have as well.Â I'll wait.

The primary reason you want this turned off is that you don't want changes being made to your database inadvertently.Â If you're flushing at request end, innocently calling some setter somewhere (maybe as part of validation) will change your database.Â Even worse, the change won't make it into the database until after the current request is complete, so you won't be aware of it until the next request at the earliest.Â More likely, however, you won't notice in development and then your customer will call a week or two after production deployment and ask why all of the records have invalid values in some column or another.Â Oops.

If you ignored my advice a couple paragraphs back, I beg of you, please go set flushAtRequestEnd to false.Â Right now.

By disabling flushAtRequestEnd, changes to persistent entities will no longer make it to the database without your intervention.Â Believe me when I say that this is the way it should be.Â You, as the application developer, need to be in charge of when and what stuff is made persistent.Â ORM is about removing the busy work of persistence, it's not about making you unaware of persistence.Â There are two ways you can do it:

ormFlush() – this will do a non-transactional flush of every modification in the current ORM session.Â This is what flushAtRequestEnd invokes at the end of the request.Â At least before you went and disabled it everywhere, right?Â Right?
transactions – allows wrapping of specific changes in a transactional context so the stuff you want get saved when you want.Â Plus all the "normal" benefits.

Even if you're doing transactions manually, the work required on your part is pretty much identical for the two options, so can get all the benefits of transactionality for "free" by choosing that one.Â And if you go with transactions, you can probably make things even easier by using AOP to apply them by default, basically freeing you from doing any work at all.

The bottom line is that transactions need to be a carefully considered part of any and every application.Â And if you're using ORM (which is state based) instead of SQL (which is statement based), then it's even more important.

Transaction Demarcation

barneyb — Tue, 06 Apr 2010 19:43:45 +0000

Transactions are important, and I wanted to talk about them a little, particularly as they relate to ORM.Â If you're on the cf-orm-dev mailing list you've undoubtedly seen this topic come up over and over again.Â And probably started muting conversations when my name shows up.Â But before digging into the details, we need to clear up a couple misconceptions, starting with no ORM context at all…

First, transactions are controlled in the business layer, even though they are typically implemented in the database.Â What do I mean by this?Â When you begin/commit/rollback a transaction, you're actually issuing commands to your database server.Â It does the work for you.Â But the transaction demarcation is controlled by the business logic of the application.Â Only your business logic knows what series of SQL statements comprise a unit of work.Â With client-server apps (including web apps) there is often a one-to-one correspondence between requests and transactions, but not always.

Consider a simple account signup form like we've all filled out a million times.Â It looks like that over there to the right.Â There is the core account information, and then there's the checkbox for subscribing to the mailing list.Â So what is the unit(s) of work?Â There are [at least] two: creating an account and signing an account up for updates.Â This means that submitting this form should trigger a request with two transactions (one for each unit of work).

In plain English, when processing this form the account creation should be successful regardless of whether the subscription is successful.Â Far more important to have the account set up and ready for use, since you can just reprompt the user about the subscription if the subscribe action fails.

If you were to implement this with simple SQL statments, your code might look something like this (I've omitted all architecture, validation and security – you shouldn't do that):

transaction {
  executeQuery('
    insert into account
      (name, username, password,
       email
      )
    values
      ($params.name, $params.username, $passhash,
       $params.email
      )
  ');
  accountId = executeQuery('
    
  ');
}
transaction {
  executeQuery('
    insert into subscription
      (accountId, type
      )
    values
      ($accountId, 'inside scoop'
      )
  ');
}

This is a (the?) fundamental concept of transactions.Â Let me state that again: transaction demarcation is business logic.Â It belongs in your business layer.Â Not in your persistence layer and certainly not in your presentation layer.Â Doing this demarcation can be a gnarly problem, but you're already doing the hard part: identifying the units of work.Â If you're using an object backend then those backend business methods are very likely your units of work.Â Now it's just a matter of applying transactionality.

The easiest way that I've found to address transaction demarcation is through Aspect Oriented Programming (AOP).Â That's a big scary term that simply means "running code when stuff happens, without having to explicitly call it."Â For example, I want to run transaction demarcation code whenever a business method is invoked.Â Not that complicatedÂ (As an aide, if you're using ColdSpring, my TransactionAdvice CFC will help you.)

AOP gives you transparent transactionality which matches your business rules (assuming you have business methods that correspond to units of work) which is a huge win.Â It makes transactionality simply disappear.Â You don't have to think about it, you don't have to worry about it's various problems, everything just works.Â Using the signup example (with a bit of architecture, though still no validation or security), your UI controller might look like this:

acctId = request.beanFactory.getBean("accountService).createAccount(
  attributes.name,
  attributes.username,
 Â attributes.password,
  attributes.email
);
if (attributes.signUp) {
  request.beanFactory.getBean("subscriptionService").subscribe(
    acctId,
    'inside scoop'
  );
}

The implementation of 'createAccount' might look like this:

function createAccount(name, username, password, email) {
  executeQuery('
    insert into account
      (name, username, password,
       email
      )
    values
      ($params.name, $params.username, $passhash,
       $params.email
      )
  ');
  var accountId = executeQuery('
    
  ');
  return accountId;
}

Note that there is no transaction demarcation anywhere.Â Instead the beanFactory object is going to use AOP to ensure that every method invoked on accountService or subscriptionService will execute in a transactional context.Â Transparent transactionality is achieved.

But don't think that this is an inflexible solution.Â AOP provides ways to exclude methods, so if you have some method that shouldn't be transactional you can just exclude it and then do manual demarcation.Â So it's really the best of both worlds: totally transparent when you don't care, but with the ability to take back complete control when you do care.

The bottom line is that if you care about the correctness of your applications and the integrity of your clients' data, then transactions are required.Â But that doesn't have to be a burden on you as a developer, since transaction demarcation can be made almost entirely transparent.Â So there is really no excuse for avoiding comprehensive transaction demarcation in every app you build.Â The correct mindset should be that a non-transactional write to your persistent store is a bug, no questions asked.

Now that that's out of the way, I'll dive into some of the ORM-specific problems transactions present (and solve!) in my next post.

Tag-Based HQL Queries on CF9

barneyb — Mon, 22 Mar 2010 17:43:56 +0000

I'm sure I wasn't the only one that was sorely disappointed when Adobe released CF9 without the ability to execute HQL queries via the CFQUERY tag.Â Using a function (ormExecuteQuery) works, but it's really ungainly if you have complex – or even just long – HQL or need to build your statement conditionally, and you can't get syntax highlighting/code completion.Â Being one to solve the Hibernate problem myself (I've been using Hibernate on CF8 for years via CFGroovy), I wrote a couple really simple custom tags to give me the functionality.Â Pure CFML, 14 lines total.Â Yes, 14 lines.

First, the query tag:

And then the queryparam tag:

Wheee!Â And here's how you might use them:



  from FtoPolicy
  order by effectiveDate

unique="true">
  from Employee
  where username =

Note, in particular, that I support a 'unique' attriute on query which behaves exactly like the 'unique' argument to ormExecuteQuery returning a single entity instead of an array of entities.

MySQL Tuning

barneyb — Fri, 01 May 2009 06:41:41 +0000

While tuning my new server, I came across an article onÂ MySQL tuning that does a really great job of laying out the main options that need to be tweaked, as well as explaining what they do.Â It's several years old, but still applicable.Â Also worth mentioning that MySQL ships with four "sample" configuration files for small, medium, large, and huge servers.Â The compiled-in defaults (at least with RedHat binaries) are somewhere around the "medium" server, which is for "systems up to 128M where MySQL is used together with other programs (such as a web server)".Â To put that another way, if your hardware was built this century, the defaults are horribly conservative.

In particular, if you're using InnoDB, you simply must set 'innodb_flush_log_at_trx_commit' to either zero or two.Â The default of one will kill you.Â I've never picked two (for the data I collect that extra second of safety is irrelevant), but you might want to.

Holding Out for a [MySQL] Hero

barneyb — Tue, 28 Apr 2009 06:02:04 +0000

I've been fighting this god-forsaken query for a week, and can come up with absolutely no explaination for its behaviour.Â Next time I see you, I'll buy you a beer if you can help me understand what's going on.Â There are two tables involved, and here are their CREATE statements:

CREATE TABLE `sequence` ( -- contains rows from -50,000 to 50,000
  `num` int(11) NOT NULL,
  PRIMARY KEY  (`num`)
)
CREATE TABLE `count_datum` ( -- ~550K rows, ~22K rows with seriesId = 47
  `id` int(10) unsigned NOT NULL auto_increment,
  `seriesId` int(10) unsigned NOT NULL,
  `timestamp` datetime NOT NULL default '0000-00-00 00:00:00',
  `count` double NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `k_seriesId_timestamp` (`seriesId`,`timestamp`)
)

And then we have the query itself.Â Note that one line is commented out.

select ts.ts as timestamp,
  (
    select `count`
    from count_datum
    where seriesId = 47
      and timestamp <= ts.ts
--    and timestamp > ts.pts
    order by timestamp desc
    limit 1
  ) as `count`
from (
    select '2009-04-27 8:00:00' + interval 20 * num minute as ts,
      '2009-04-27 8:00:00' + interval 20 * (num - 1) minute as pts
    from sequence
    where num between 0 and 12
  ) ts

The `ts` subquery uses the sequence table to create a fixed-length list list of timestamps 20 minutes apart for four hours, starting at 8am on April 27th (in the `ts` column), along with the previous timestamp (in the `pts` column).Â Only the former is used in the query as it's written.Â It then uses the timestamps to pull the `count` column from the last record in the `count_datum` table recorded prior or equal to the timestamp.Â In plain English, it pull the last known value for every 20 minutes, regardless of how old that value was (might be a couple minutes, might be a couple hours).Â The query runs quite quickly as written, somewhere around 50ms on my server.

If you uncomment the commented out line, you'll get the same behaviour, except that it pulls the last known value at each timestamp, as long as it was recorded after the previous timestamp.Â To put that another way, if a given 20 minute window didn't have a value recorded with it you'll get a null back instead of a value from a previous window.Â This version of the query runs in around 7,000ms (seven seconds) on the same server. Â Yes, well over 100 times slower.

Here is some example data from count_datum, followed by the result from running each version of the query.Â The rows that I've bolded are the ones that actually get returned by the queries, the unbolded rows are ignored (because they're not the last record in any target window).

raw data in count_datum        | with carryover                 | without carryover
-------------------------------+--------------------------------+-------------------------------
timestamp                count | timestamp                count | timestamp                count
2009-04-27 08:00:00          1 | 2009-04-27 08:00:00          1 | 2009-04-27 08:00:00          1
2009-04-27 08:50:00          1 | 2009-04-27 08:20:00          1 | 2009-04-27 08:20:00       NULL
2009-04-27 08:55:00          2 | 2009-04-27 08:40:00          1 | 2009-04-27 08:40:00       NULL
2009-04-27 09:55:00          0 | 2009-04-27 09:00:00          2 | 2009-04-27 09:00:00          2
2009-04-27 10:42:00          1 | 2009-04-27 09:20:00          2 | 2009-04-27 09:20:00       NULL
2009-04-27 10:45:00          2 | 2009-04-27 09:40:00          2 | 2009-04-27 09:40:00       NULL
2009-04-27 10:50:00          8 | 2009-04-27 10:00:00          0 | 2009-04-27 10:00:00          0
2009-04-27 10:52:00          7 | 2009-04-27 10:20:00          0 | 2009-04-27 10:20:00       NULL
2009-04-27 10:57:00          6 | 2009-04-27 10:40:00          0 | 2009-04-27 10:40:00       NULL
2009-04-27 12:00:00          2 | 2009-04-27 11:00:00          6 | 2009-04-27 11:00:00          6
                               | 2009-04-27 11:20:00          6 | 2009-04-27 11:20:00       NULL
                               | 2009-04-27 11:40:00          6 | 2009-04-27 11:40:00       NULL
                               | 2009-04-27 12:00:00          2 | 2009-04-27 12:00:00          2

So the 64 thousand dollar one beer question is "why does it do that?"Â Running each version through EXPLAIN returns exactly the same information, including rowcounts and index usage.Â Converting the pair of timestamp comparisons to a BETWEEN … AND yields a slight improvement (down to just under 6,000ms), but nothing to write home about.Â Here's the EXPLAIN:

id  type                table       type    keys                    key                     key_len     ref     rows    extra
1   PRIMARY               ALL                                                                         13
3   DERIVED             sequence    range   PRIMARY                 PRIMARY                 4                   13      Using where; Using index
2   DEPENDENT SUBQUERY  count_datum ref     k_seriesId_timestamp    k_seriesId_timestamp    4           const   12224   Using where

I'm running on MySQL 5.0.45 on CentOS 5.Â Nearly identical behaviour on my old server which was CentOS 4 with an older MySQL (though I'm not sure which exactly).

Watch Your Column Types

barneyb — Wed, 15 Apr 2009 21:03:30 +0000

I've been bit by this twice in the past few months: comparing database columns that aren't the same type is really really expensive.Â If you've only got a few rows, no big deal, but if you've got a few hundred thousand (or a few tens of millions) it makes a huge difference.Â And varchar is NOT the same as nvarchar on SQL Server.