Monthly Archive for September, 2005

The Encapsulated Hack

As is pretty obvious, I'm a big fan of good OO design.  But I'm
a bigger fan of maintainable apps, and one of my favorite tricks when
evolving apps is what I call the "encapsulated hack."

Take, for
example, a CFC that needs access to an
application-scope variable.  What's the right solution? 
Package the variable up in such a way that it can be passed to the CFC,
so the CFC doesn't have to break encapsulation by calling the
application-scope variable.  Much of the time, that's easy to do,
but not always.  When it's not possible, I hack it. 

Here's what I'd do:

<cffunction name="getSomeVar" access="private" …>
  <!—TODO: encapsulated hack —>
  <cfreturn application.someVar />
</cffunction>

"Ick!,"
you say, "that's breaking encapsulation!," and you'd be right. 
But notice that I'm breaking encapsulation in exactly one place,
and that place is itself encapsulated.  Now, when I do it the right
way, I can change the implementation of this one method, and nothing
else has to change.

Speaking from experience working with a single large
app over several years, I can tell you that this is very helpful in
allowing you to move forward with a good OO design, without having to
do everything
all at once.  Just make sure you clearly
indicate that you've got a hack, and that it needs to be fixed. 
If you're using CFEclipse, you can use a TODO comment as I have, open
the 'tasks' view, and you'll get a listing of all such todos in your
code.

Designing an OO Backend

As follow-up for my 'Impetus for an OO Backend' post from a couple
weeks ago, I wanted to talk about some approach designing an OO
backend. Again, I'm going to attempt brevity and undoubtedly end up
being long winded, and I'm going to again intentionally skip
implementation details. Finally, there will be almost no mention of UIs
at all; I'm talking exclusively about the backed of an application. UIs
will use the exposed APIs to do "stuff", based on user input. How they
do it is a totally different discussion.

For the sake of having a consistent example, I'm going to be talking
about some unnamed app that deals with users. No need for more specific
than that, I don't think, as pretty much every CF developer has had to
build such functionality for one app or another, or at the very least,
had to use such an application.

To jump into the meat of things, the first concepts are really the
central tenets of good OO design: abstraction and encapsulation.
Abstraction refers to the separation of a thing's interface from it's
implementation. In other words, someone can be told what this thing
does without having to know how it does it. Encapsulation refers to
the state of being self-contained; not depending on other things. The
"black box" on an commercial airliner is a good example of both.
Everyone knows what it does (records everything about the plane), but
most people don't know how (for instance, what's the recording media?),
and it's also totally self contained so that if the plane completely
breaks, it'll keep functioning. These two core concepts are the central ideas of almost
every good OO design.

But on to our user management app. Pretty simple, right? Add a user,
delete a user, edit a user, and get a list of users in the system.
First, where does abstraction fit in? Well, I just told you what the
app needs to do, but I haven't told you anything about how it does it,
so I'd say we've found it. But how do we implement the abstraction? For that, we
need a service object.

Service objects are objects that expose "business operations" to
something. I listed the four core business operations of our app above
(add, edit, delete, and list users), so that'll be what's exposed in
our service object. Services are also singletons, which means that
there should never be more than a single instance of them within an
application. Since they're singletons, that also means that they must
be thread safe, but again, that's a whole different topic.

So now that we've got our service (named UserService, of course),
how do we use it? The easiest answer is to do <cfset application.userservice = createObject("component", "userservice").init(application.dsn) /> during application
startup (in Application.cfc or Application.cfm), modified
to reflect whatever specific parameters the service needs. Now the
whole application can access the service (via the application scope),
and since the createObject call only happens at application startup,
we're [mostly] assured that there will only be a single instance of the
component. As a note, this isn't my recommended technique for larger
apps, but for this scenario, it's perfectly adequate.

Now it doesn't seem like we've gotten much out of packaging the
business operations up in this fashion. We have a nice object with
everything in it, sure, but that just makes it harder, because it's a
single large file, right? Those are valid concerns (which can be dealt
with later), but that arrangement brings a very important benefit that
isn't quite so obvious. Your UI knows nothing about the business logic
of the app, except that a given method of application.userservice, when
passed a certain set of parameters, will do a certain thing.  I.e., the what but not the how.  That's a
ridiculously beneficial thing to be able to say, because it means the
implementation of the methods can change at will, and the UI needn't
care at all, as long as the end result is the same. If you don't see
why that's a wonderful trait to have, keep reading.

Lets say we've got our app running in production, and then the
managers come back with a set of changes, as they always do. But
they're not just little tweaks, they're some pretty massive changes to
core functionality. So you go off and basically gut your UserService
object to implement these changes. But while you're doing, you're
careful not to change any of the cffunction and cfargument tags. When
you're done, guess what? You're done. Since the methods didn't change,
your UI needn't change, so you don't have any more coding to do, and
your testing should be equally streamlined. That, my friends, is the
power of abstraction.

So what have we learned? Having a clearly defined API (exposed via
service objects) that your UI connects to can save you enormous amounts
of work as changes happen over time. That same API also lets you
redesign the backend, as well as change the business logic, without affecting anything else.

Now lets say you're working along, as it comes time to add some new
functionality that doesn't tie directly to users (lets say document
creation). Time for another service object: DocumentService, which will
have the corresponding effect of expanding our application's API. Same
rules apply, except now our UserService isn't the entire backend, so
we're not longer completely encapsulated. It's possible, likely even,
that the DocumentService will need to call methods on the UserService.
This is different then the UI calling methods, since the API the UI
uses doesn't know about implementation, but the guts of the
DocumentService does (the guts, after all, are the implementation).  We also don't want the service objects talking to the application scope, because there's no guarentee that application.userservice will always be there.  It might be server.userservice, or application.services.user.

We still have our abstraction, just like before, but the
encapsulation needs some work. Enter a factory. Not literally, but the
Factory design pattern (you may all now cringe and shudder in fear).
The Factory design pattern simply says "instead of instantiating
objects directly, create an object whose sole purpose is to create
those instances for you." Again, not very earth-shattering at first
blush, but digging deeper will reveal some great advantages. The more important question, however, is
how do we design it?

Remember where we called createObject to instantiate the UserService
directly into the application scope? Well instead, we'll create a
factory that has a getUserService method, and instatiate the factory
into the application scope, and then we'll immediately call that method
and assign the result to application.userservice. This second step
isn't necessary (or desirable, even), but it's important, because our
app currently depends on application.userservice being available, and
we don't want to have to change all the references to it right now,
though that's something that should be done.

Inside the factory, the getUserService method does about what you'd expect, except that it
caches the instance it creates in the factory's variables scope and uses
that cached instance for subsequent requests so that the instances are
true singletons.

Now back to our original problem: how does the DocumentService call
methods on the UserService? Well, both services, in addition to their
"normal" constructor arguments should be changed to accept a reference
to the factory as well. So now either service can easily obtain a
reference to any other service in the backend. The factory is allowing
the backend as a whole to remain encapsulated, and it's also allowing
each individual service to be abstracted from the others. Note that
this means the service objects are now exposing two APIs, one for the
UIs to use, and one for other services to use. The former is almost
certainly a subset of the latter, so it doesn't really appear to be two
different APIs, but it is.

Ok, quick recap to this point. We have a factory singleton
instantiated into the application scope of the app. It can be asked for
singleton instances of the application's service objects. Each of those
service objects also has a reference to the factory, so they can
request instance of other services for complex business operations. The
backend as a whole is fully abstracted from the UI and fully
encapsulated, and each individual service object is also fully
abstracted and mostly encapsulated from the other service objects. I
used the 'mostly' hedge because it's impossible to get fully
encapsulated within the backend (otherwise it'd be multiple separate
backends), but we want each piece to be as isolated as is reasonable to
reduce the maintenance headache.

Once we have this arrangement, the possibilities are nearly
limitless. We can wantonly reimplement any of our service objects
without affecting anything else. So when you're ready to give using
entity objects (business objects, entity beans, etc.) and their
corresponding DAOs a whirl, you can do it with a minimum of fuss. As
long as everything remains encapsulated behind the service objects, no
other pieces of the application (both the backend and the UI) need care.

It's probably worth a bit more discussion of encapsulation at this
point. Encapsulation isn't a wall around an object. It's a wall around
a system, where a system can be quite large or very small. All but the
simplest systems are made up of smaller subsystems, each with their own
wall. A system can't see outside it's wall, but it can see inside it's
subsystem's walls. Note that I'm talking only about encapsulation here,
not abstraction. A system can see it's subsystems' interfaces , but not
their implementations, but a system can't even see the interface of
it's parent system.

Two more concepts to mention (by name) before I go: coupling and cohesion.

Coupling is when objects are tightly bound together in some way.
This is pretty obviously a Bad Thing, as it means that encapsulation
and abstraction are both missing to some degree or another. In dollars
and cents, it means that any changes to one thing have a great chance
of the "ripple effect", which usually means a lot more wild goose
chases across your codebase fixing bugs, and at the very least,
requires a longer testing and QA cycle for every code change.

Cohesion is the idea that a given thing (be it a system, and object,
or a method) does a single, clearly definable thing. This has the
benefit of making your code easier to follow, and it also reduces the
possibility that a method will semantically change, and therefore
require an API adjustment, which reduces the ripple effect, and again
saves you money with reduced maintenance costs.  Finally, it
promotes code reuse, since small atomic blocks are easier to reuse then
larger blocks.

Two scary terms, but speaking from experience, you don't have to
think about them much. If you've done your homework and designed a system that is both
abstracted and encapsulated, you'll usually end up with loose
coupling and tight cohesion without having to think about it. I only
mention them because they're helpful checks along the way while
designing a system. When you get stuck, ask yourself which alternative
is more cohesive and more loosely coupled, and that's usually the right
way to go.

In conclusion (if it's a conclusion at all), the APIs you build into
your backend are of utmost importance.  The rest of the design is
pretty much meaningless without this piece.  If it's present,
however, your developers will be enormously more productive and able to
meet changing needs without massive costs to rework the application as
a whole.

Good OO design is difficult, almost always requiring an
iterative approach, but using the coupling and cohesion tests along the
way can help you avoid potential pitfalls.  But most of all, it
just takes a lot thoughtful practice and experimentation to learn what
works and what doesn't.

So I'm again signing off after more typing than I intended. I
realize now that I didn't offer up a whiskey break breather in the middle,
but hopefully you'll forgive me.

CF7, Web Services, and null

I discovered today (after many hours of debugging), that if you have
web service method declared to return 'any' on CF6.1, you can return
null without incident.  However, if you're on CF7, it causes an
NPE to be thrown.  To illustrate what I mean, take this
[abbreviated] code:

<cffunction name="test" returntype="any">
<cfreturn functionThatHasReturnTypeVoid() />
</cffunction>

That function will work happily if served by CF6.1 and consumed by
CF6.1 and CF7.  However, if it's served by CF7, it will error
out.  The solution is to do something like this:

<cffunction name="test" returntype="any">
<cfset var local = structNew() />
<cfset local.result = functionThatHasReturnTypeVoid() />
<cfif NOT structKeyExists(local, "result")>
<<cfset local.result = "" />
</cfif>
<cfreturn local.result />
</cffunction>

The structKeyExists call is checking for null (it will return false
if the key exists but has a null value - something that I'm not sure is
documented anywhere).  If the result is null, it's reset to the
empty string before being returned.

Note that non-web service
method invocations on CF7 can return null with a declared returntype of
'any' without incident; this is a web services-only problem.

Remote Diff

Ever wanted to do a diff between a local file and a remote
one?  I have, so I wrote a simple shell script that'll do it for
you.  It requires scp, diff, and rm to be available on your system, which should be the case on any modern *nix.  The -b
option to diff tells it to ignore white-space difference.  If you
don't have password-less SSH authentication set up, you'll want to
remove the >& /dev/null trailers from the two scp lines so you get your prompts.

To run, just drop this into /usr/bin (or somewhere else on your path), make it executable, and call it just like diff, except with scp-compatible file paths.

#!/bin/sh
#
# this acts as a remote diff program, accepting two files and displaying
# a diff for them.  Zero, one, or both files can be remote.  File paths
# must be in a format `scp` understands: [[user@]host:]file

if [ "$1" = "" -o "$2" = "" ]; then
    echo "Usage: `basename $0` file1 file2"
    exit 1
fi

scp $1 rdiff.1 >& /dev/null
scp $2 rdiff.2 >& /dev/null
diff -b rdiff.1 rdiff.2
rm -f rdiff.1 rdiff.2

Good UI Design

I'm a stickler for good UI design.  I don't claim to be a wiz
at it myself, though I'd like to think I'm better than many.  I
love coming across web sites that are a breeze to use.  Hell, I
love coming across web sites that aren't painful to use.  In a fit
of spastic home-all-alone Googling, I hit the "I'm Feeling Lucky"
button on a search for "monkey nuts", hit the 'enter' button on the
splash page, and was rewarded by this site,
which had a quasi-prominent UI element that not only made me smile, but
happened to be EXACTLY what the page needed to elevate it bad to quite
good.  Enough of a shift that I couldn't help sharing.

For
those of you who don't see it, it's the handwritten element.  The
one that says, "hey, I'm a dumbass and put a scrolling frame in the
middle of my page, but I have the courtesy to flat out tell you I did
it, and help you deal with it, because while I know it's not a very
good way to do things, it was the only way to get what I wanted, so I'm
going out of my way to make it as painless as possible for you: 'this
is a scrollbar –>'."

I love it.

Eclipse's Incremental Find

Little Eclipse gem I just found.  On the in-file Find dialog,
there's an "incremental" checkbox.  If you check it, Eclipse will
find the first match as you type, updating for every keystroke,
just like FireFox's in-page text searching.  I've longed for that
feature for months, and never noticed that it was sitting there right
under my nose.

Ant For Server Configuration

I use Apache ant for a lot of things, almost none of which have anything
to do with building software. Simeon knows
much of what I do with it from the course of various discussions, and while
he was using it for something a week or two ago, he suggested that I blog
about some of my experience. Within 10 minutes. It didn't happen, of
course, but hopefully late is better than never.

ant, for those who don't know, is a Java-based build tool, somewhat of the
same vein as make. Major differences include that it's XML based, runs on
Java, and has built-in commands, rather than relying on the shell. Build tool
or not, where I use ant the most is in server configuration tasks. I'm going
to consider BIND config files for my example, as it's relatively straightfoward.
Apache config is my other major use; it makes it very easy to have a single
config template managed with version control, but still be able to build
actual configuration files for multiple servers in the cluster, all of which
are not exactly equal. But that's not what I'm talking about here.

DNS is pretty simple to deal with, but there is a LOT of repetition,
especially across a large number of domains that are all basically aliases
for a single application (which is what I've got). So rather than maintain
a couple hundred nearly identical zone files, I use ant to do all the dirty
work for me. Before I delve into the guts, here's a typical zone file:

$TTL 1h ; default TTL
@ IN SOA ns1.piersystem.com. root.piersystem.com. (
2005090901
3h
15m
30d
1h
)

; NS records
@ IN NS ns1.piersystem.com.
IN NS ns2.piersystem.com.

; Address records
@ IN A 216.57.200.38
www IN A 216.57.200.38

; other stuff
@ IN TXT "v=spf1 a mx ptr ip4:216.57.200.32/27 ip4:66.235.70.224/27 ~all"

I have about 90 copies of that zone, another 40 or so that are very close to
copies, and finally perhaps 10 that are pretty different. This is where ant
really shines, because it lets me templatize and parameterize the zone files
so that the entire set can be created from a very small amount of data.

How this works is via ant's wonderful filtering and property expansion
capabilities. Basically, when you copy a file from one place to another with
ant, you can also define filters to be performed as part of the copy. One of
those filters does property expansion, where properties are things like
${myPropName}, and defined in external properties files. Details to
come later. So those 90 cloned zone templates just contain a single line:

${basic.zone.pier}

That expands via this definition:

basic.zone.pier=${ttl} \n\
${soa} \n\
\n\
; NS records \n\
${ns} \n\
\n\
; Address records \n\
@ IN A ${ip.pier} \n\
www IN A ${ip.pier} \n\
\n\
; other stuff \n\
${spf1} \n\

As you can see, that definition includes even more properties, which continue
to expand until you arrive at the zone file I showed above. So all the data for
every single zone file is enclosed in two properties files (or for structure,
and another for IP addresses). But that's just the 90 or so clones.

The next batch of about 40 almost-clones are all additive changes. Most
require defining a subdomain or three, or some records for external infrastructure
that we don't manage. So those zone templates look like this (this for the
uscgstormwatch.com zone):

${basic.zone.pier}

dennis IN CNAME www
emily IN CNAME www
katrina IN CNAME www

Not much to see there, just the same thing with a couple extra records
defined afterwards. Now the last 10 or so totally custom zones. Here's an
example (the audiencecentral.com
template):

${ttl}
${soa}

${ns}

intranet IN NS ns1.piersystem.com.
IN NS ns2.piersystem.com.

; Address records
@ IN A ${ip.audiencecentral}
www IN A ${ip.audiencecentral}
testdrive IN A ${ip.audiencecentral}
shrike IN A ${ip.shrike}

; PIER sites
news IN A ${ip.pier}
sales IN A ${ip.pier}

; Other stuff
office IN A ${ip.office}

${mx.plands}

; SPF record
${spf1}

If you look back at the expansion of the basic.zone.pier, you'll
see a lot of similarities. Almost all of the pieces are reused, and there are a
few new pieces mixed in as wel. There are also some new IP addresses.

That's enough examples, lets get to the meat and potatoes of this whole thing,
the ant build file: build.xml. Here's the guts of it:

<target name="generate" depends="getSerial">
<property file="ip.properties" />
<property file="common.properties" />

<copy todir="${build.dir}" overwrite="true">
<fileset dir="${src.dir}">
<include name="**/*.tmpl" />
</fileset>
<mapper type="glob" from="*.tmpl" to="*.dns" />
<filterchain>
<expandproperties/>
</filterchain>
</copy>

<copy todir="${dest.dir}" overwrite="true">
<fileset dir="${static.dir}">
<include name="*" />
</fileset>
</copy>

<move todir="${dest.dir}" overwrite="true">
<fileset dir="${build.dir}">
<include name="**/*.dns" />
</fileset>
<mapper type="flatten" />
</move>
</target>

This defines a target (ant's name for a piece of work) named "generate", and
that it depends on the target "getSerial". GetSerial, as you might imagine,
creates the serial number for the zone files and stores it in a property so that
it can be injected as part of the ${soa} expansion. Anyone who's interested in
how that works, let me know; I'm going to skip it here because it's complex,
nasty, and doesn't really lend anything to this post.

First thing the target does is include a couple property files (which I've
mentioned before), that contain all the expansions, including the
basic.zone.pier one. Next it does a couple copy operations and
then finishes up with a move operation.

The first copy tag copies "something"
to my build dir (specified by the ${build.dir} property, which happens to point
at ./build. The fileset tag it contains specifies what that
something is: all files in the ${src.dir} directory (including subdirectories)
that end with .tmpl. Next, the mapper tag converts all file extensions from
.tmpl to .dns as part of the copy (since that's my extension of choice
for zone files). Finally, the innocent looking filterchain and expandproperties
tags, do all the magic of expanding all those properties in the files that are
copied.

The second copy is much simpler, doing nothing more than a vanilla copy
of the files in the ${static.dir} to ${dest.dir} (which points to
/var/named). Note that this is different than where the first
copy went for reasons we shall see in a moment.

The last piece of magic happens in the move tag. It moves the newly created
.dns files from the build directory into the destination directory where the
static files just went, and it applies another mapper that flattens the directory
structure. ant only allows a single mapper per copy/move operation,
which is why I copy to a temp location, and then move to the real place. As you
can probably guess, doing a flatten allows me to keep all my zone templates
organized into a neat hierarchy for easy management, but not have to deal with
the pathing issues when it comes time to actually give BIND the zone files.

Impetus for an OO Backend

After a long, multi-faceted discussions on CFCDev a few weeks ago,
one of the participants contact me off-list wondering about a sample
app that illustrated some of the concepts I'd mentioned in the
discussion. I don't have one, and while I could make one up, the
implementation is the easy part. It's the reasoning behind doing
things a certain way that is what's important. So what I'm going
to endeavor upon is a quick overview of how I've come to building apps
the way I have. It will undoubtedly be long winded, but I'll try
to be as brief as I can. ;) Hopefully at the end you'll see
three things:
First, OO is hard to do 'right', since 'right' is both subjective and
ever-changing, second, that the 'right' way is always defined by need
and not by how well you use Design Pattern X, and third, that I don't
have any magical OO power, I've just spent the past three years
tuning a single app and building a large body of experience.

Application LayersTo
put the cart before the horse, here's an image that roughly lays out
what I'm going for. At the top we have the Presentation and UI
Controller layers. The first is just HTML, the second is a
framework of your choice (I use FB3 primarily). The dark blue
section, however, is what I'm going to be talking about. And note
that this diagram is about how the tiers work, not necessarily how the
implementation is structured.

First some back-story. I took over an app that was utter spaghetti
code. About 80,000 lines of it. I spent a month converting
everything to Fusebox 3 (the current version at the time), and that
dropped to about 50,000 lines simply from code reuse. (ed.
note: That's a 37% decrease in code size; use a framework people) There was no MVC in the app, it was all single
fuseactions, so the only reuse happened at the fuse level. This
is all on CF4.5, and when CFMX 6.1 came out (and alleviated most of the
bugs and inadequacies in CFMX 6.0's CFC implementation), we upgraded,
and the CFC-ization began in earnest.

My first objective was to stop having
to write so many damn queries, so automated entity persistence was top
of the pile. Fortunately, that's really easy to do, since entity
persistence operations are very closely allied to your database
schema. A few hours of working and I'd built a generator that
would read a table schema from the DB and generate a skeleton BO and
fully implemented DAO for performing persistence operations for
it. It'd also generate a boilerplate factory/manager for the
entity type (getNewUser, getUserById, createUser, deleteUser,
updateUser). With that, I no longer had to have any single-entity
queries in my fuses, rather I request a BO from the appropriate manager
(all of which were singletons in the application scope), do what I
needed, and then call createXXX or updateXXX on the manager with the
modified BO. That saved me enormous amounts of time, particularly
with changes to entity fields (like adding a country to user info),
because I could just regenerate the DAO, and all the persistence
operations were magically updated.

That saved a lot of work, but
it didn't help abstract the business logic out of the UI. My
fbx_Switch.cfm files were still littered with a mix of business logic
and UI processing. So the next step was to start creating service
objects to put the business logic in. This became doubly
important since about this time we started exposing certain
functionality over web services as well as the HTML UI, and that was
starting to lead to enough duplicate logic code to trigger warning
bells. So much of the business logic moved into the service
objects.

An important point to make here is that while managers, BOs, and DAOs are
all pretty much one-to-one-to-one, services are not. An example
would be a permission-based security model, where you have users,
groups, and permissions. That's three distinct manager/BO/DAO
sets, but you probably only need a single SecurityService that deal
with all of them. And you'll probably have a LoggingService that
doesn't care about security, but does care about some aspects of users.

As soon as service objects popped into existence, however, up reared
a major issue with encapsulation. A good issue, mind you, but one that
required creating a solution for. Simply put, the services needed to be
able to talk to each other, but couldn't do that without going to the
application scope to get a reference. The same thing applied to manager
instance as well. The "right" solution, like so many other things, was
driven by time constraints not good design (though it was good enough).
At this time, all managers (and now services) were instantiated into the
application scope directly (i.e. application.securityservice). That was
quickly revised to instantiate them into a managers and services struct
respectively, copy the keys into the application scope, and the pass the
struct into each service (via a setManagers and a setServices method in
the AbstractService superclass). None of the existing app needed to change,
but all the services had references to everything else, and it took all
of ten minutes to put together.

The right solution (from a design perspective) is to use a factory,
and then just pass that factory around. The struct solution is similar
in concept, though not nearly as encapsulated. Needless to say, the
lesson was learned, and factories are now properly used from the start,
rather than being an afterthought.

Ok, time for a breather/cigarette/shot of whiskey. Whew! And here
we go again…

The next problem was that the services often needed to do arbitrary queries
as part of their business logic. This is a two-pronged issue:
SELECT queries, and other queries. And with the SELECT queries,
there are queries that the UI will also need (which still resided in
qry_ files at this point), and others that are strictly needed by the
backend. Along came gateways to solve the first part of the problem.

The set of gateways falls somewhere between the set of services and the
set of BOs in makeup. For example, a UserGateway for users and a
SecurityGateway for groups and permissions. Gateways were also the first
objects to be created via an application-scope factory and use lazy loading
for faster app startup. The methods of the gateways included all the SELECT
statements needed by both the UI and the services, though that may change at
some point in the future.

The problem of the non-SELECT queries was solved by deciding to let the
services perform those queries directly against the DB. It's worth mentioning
that this was only for queries that weren't tied to a single entity; those are
always performed through the entity. Initially, I thought this was a rather
poor way of handling it, but upon further reflection, I've become pretty
comfortable with it. The other solution would be dedicated DB modifier object
that the service delegates to. I'm not sure it's worth the complexity,
unless having all your SQL in SQL-specific objects is important for your app,
because you lose the ability to modify the business logic all in one place.

So now we've got business logic in the services, but we have BOs
(Business Objects, mind you) that are little more than DTOs
(Data Transfer Objects) for easing persistence operations. So entity-specific
business logic was moved into the entities from the services. An example
would be a 'post document' operation. Previously, the DocumentService
would pull the Document BO, set 'isPosted' to true, 'postDate' to now(), etc.,
persist it, and then go on with the other tasks (like distributing
notifications, or clearing the cached list of "recent updates" for the
site). With the new setup, the DocumentService simply calls postDocument()
on the Document BO, which takes care of all the document-related stuff in
one magic step. The service is still in charge of the other non-entity
operations, but all the entity operations are now part of the entity.

So what do we have at the end of it all? A rather complex arrangement of
CFCs (all told, about 200) that makes working with the application much easier.

What other things would I like to see? A real centralized application object
that contains the entire application, and can be passed around instead of the
structs of services and managers for one. This is already partially implemented,
but it's not complete. I'd also like to see user security and logging be
integrated in a more transparent way. Right now, both must be explicitly coded
for, which has led to errors in the past. I'd much rather have that applied
magically by some framework (probably dynamic wrapping of the service objects)
so that it's guaranteed to be consistent across the board.

What about problems? Funny you should ask. ;) Probably the biggest problem
is dealing with DB transactions. CF only exposes the CFTRANSACTION tag, which like
other tags, acts upon it's body. More to the point, there isn't a way to say "start
a transaction, if one isn't already active." Certain business operations
are both standalone operations and part of larger operations, and can be invoked
either way. The solution we've employed is to have both transactional and
non-transactional versions of those methods (the transactional method being nothing
more than a CFTRANSACTION tag wrapping a call to the non-transactional version).
It works, but it's hardly elegant, particularly with methods that take a lot of
arguments.

One solution that I tried out on another app I threw together was to manage
transactions via TransactionManager object that didn't use CFTRANSACTION at all,
but rather use CFQUERY to talk to the database directly. It worked fairly well,
but it depends on an implementation detail of CFMX. Namely that a given request
gets a single DB connection for the duration of the request, and that it is the
ONLY request that has access to the connection. Just for reference BD doesn't
have this feature. It also gets complicated because you must
ensure that your transaction either gets committed or rolled back before the request
finishes, or you'll have some weird issues. However, it does allow you to have
the "start a transaction, if one isn't already active" functionality, which makes
things ENORMOUSLY easier. I'd really love to see this behaviour appear in future
versions of CF, but who knows.

Another problem is that the sheer weight of a complex system can make otherwise
simple tasks fairly daunting. In that case, as long as encapsulation isn't broken,
we simplify. There are a few subsystems that are comprised only of a service object
that contains everything. To come clean, the 'permission' entity from my security
examples above is one. There isn't a permission manager, BO, or DAO, just the methods
in the SecurityService. These are also a major source of the non-SELECT queries that
the service objects need to perform. But like everything else, it's all about picking
the right compromises in each situation. If we need to have a full OO implementation
of permissions, injecting it will be very simple, since the service methods won't
change at all, they'll just stop doing everything themselves and start delegating to
the new backing objects. And that's the real power of encapsulation.

Just for reference, that flat, service-only architecture is usually where I start
with everything because it's quick to develop, and easily facilitates growth down the
road. One good (and publicly known) example is Ray's BlogCFC. It's all in there as
one massive file and that's exactly as it should be, particularly since installation
simplicity is important.

So, after much typing, I'm calling it quits. Hopefully I've met the three
objectives I laid out at the beginning, and haven't caused anyone to shoot smoke out
their ears or pass out on their keyboard. At some point, I may actually show some
code, but I'd like to at least pretend that anyone who gets example code will have
understood the reasons behind why it is the way it is.

cvs2svn Rules!

I've been moving all my stuff from CVS to Subversion over the past
few months.  Some of the stuff (like server config files) I've
just been moving the top revision across since the history is of little
concern.  The larger projects, however, I've migrated the whole
history with the fantastic cvs2svn
tool.  It somehow reads your CVS directory, figures out what each
changeset should be (presumably by finding nearly identical timestamps
with the same commit message), and then imports it all into Subversion
just as if you'd made the same commits as you did to CVS. 
Needless to say, this is fantastic, since it basically lets you switch
from CVS to Subversion with almost no cost.  But how well does it
work?

Tonight, after a couple months of wussing out, I finally
sacked it up and moved my company's main app across and it went
flawlessly.  It's not a huge app by any stretch (2500 files or
so), but it had several years of history with a lot of branches and
tags in there.  Much to my relieve, it all came across perfectly.

I
ran into two minor problems with the export, undoubtedly due to newbie
screwups I made, since both were from very early in the history. 
First, I had a tag that was both a tag and a branch; easily solved with
the –force-branch="branch_name" option to cvs2svn.  Second, I had
an invalid keyword expansion command, and that was solved by using the
–use-cvs option so it used CVS rather than RCS's checkout command.

After
those tweaks, and about two hours of spinning, I had a shiny new
Subversion repository.  Disconnected the projects in Eclipse,
moved my CVS working directories out of the way, checked the same
branches out from Subversion, copied my .project file (for Eclipse)
over, and reconnected the projects in Eclipse (to Subversion this
time), and off I went.

If you're considering moving from CVS to
Subversion, but haven't because you don't want to lose all that
history, definitely check out this tool.

How Nested Sets Work

I got an email question today about how nested sets work, after the
developer started using my TreeManager component.  I figured that
was a good topic for a blog post, so here it is.

Nested sets operate based on two fields, rpos and lpos (right
position and left position).  They're calculated by doing a depth
first traversal of the tree; that is, numbering each node's left and
right sides as you get there.  So here's a sample tree:

food
/ \
meat fruit
| / \
beef apple pear

An depth first traversal will basically start to the left of 'food',
and run around the entire tree until it gets to the right side of
'food', and number each node it comes to.  Every node will get two
numbers, one for each side.  The numbering will look like this:

1.food.12
/ \
2.meat.5 6.fruit.11
| / \
3.beef.4 7.apple.8 9.pear.10

And in tabular form:

food 1,12
meat 2,5
beef 3,4
fruit 6,11
apple 7,8
pear 9,10

A couple things to notice.  First, the root node always has
number L=1 and R=n*2, where n is the number of nodes.  Leaf nodes
(those with no children) always R=L+1, and non-leaf nodes always have R
- L % 2 == 1 (and even number of interim numbers).  The real
magic, of course, is that any given node's subtree is entirely between the numbers
of the node, which is why they're called nested sets.

Nested sets
have a number of advantages over adjacency lists (using a parentID),
though they have some disadvantages as well.  Advantages include
the blazing speed of pulling out hierarchies,intrinsic ordering of
sibling nodes (meat and fruit or apple and pear in my examples), and
the fact that it's impossible to orphan a node by deleting a node in
the middle of the tree, just to name a few.  Disadvantages include
complexity (though TreeManager alleviates a lot of that), and expensive
structure changes.  However, the tradeoff is usually in favor of
nested sets over adjacency lists, since recall is almost always more
important than updating.