Medians and Quartiles

A number of years ago I build this little app called EventLog.  It's a really simple data journal: you enter a set of a tags and a timestamp (defaulting to "now"), and it saves it off to a database.  Then you can build all kinds of reports and such based on your data to help you track "stuff".  As an example, here's a chart about my benadryl (which helps my skin enormously) consumption since mid-July (click it for the full report):

All kinds of neat, you might say.

One of the other reports that you can access if you're logged in (the report above was made publicly available) is called a hiatus reports, which instead of reporting on the events themselves, reports on the length of the hiatuses between events.  For example, I aim to have  a benadryl every 8 hours or so, so monitoring my hiatuses is more useful than the actual data points themselves.  However, the raw hiatuses aren't terribly interesting as the number of events goes up — you want stats on them (average, median, deviation, etc.) so I added that this evening.

And then Kim (aka Dr. Repp, microbiologist and bio-statistician) wanted quartiles as well as median.

Quartiles are kind of a pain.  It's not obvious how exactly they are computed.  Median is easy, just line up your points in ascending order and take either the middle one (if there is one), or average the middle two (if there isn't).  Quartiles are not the same algorithm applied to the 25% and 75% points (the 50% quartile is the median, of course).  Rather, they are the medians of the two halves of the data on either side of the median.  In particular, the median value (is there is one) is not part of either half; it's the pivot.  Think quicksort.

An example will make this more clear.  Consider this set of values (already sorted):

[1, 2, 3, 4, 5, 6, 7, 8, 9]

The median is obviously five, which leaves four points on either side of it which will be used to compute the quartiles.  So the 25% quartile is the median of this subset:

[1, 2, 3, 4]

The 75% quartile is the median of this subset:

[6, 7, 8, 9]

Note that five isn't in either one.  These subsets yield medians of 2.5 and 7.5, which correspond to the 25% and 75% quartile values.

Now consider this set of values (also already sorted):

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Here the median in 5.5 (the average of 5 and 6), and as that isn't an actual value in the set, the entire set will be represented in the two quartile subsets:

[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]

This yields quartiles of 3 and 8.

Not a difficult  algorithm, but a bit tricky since you don't actually figure out the quartiles directly on the data, you have to compute the median, split the data, and then figure out the quartiles from the subsets.

One response to “Medians and Quartiles”

  1. Rich Ehmer

    I stumbled upon this blog while researching ways to split datasets into quartiles programmatically for my Android App. I'm trying to make a pie chart of logged quantities by quartile of recorded quantity (so you can see how much each quartile contributes to the total sum). Charting data like this can certainly be a head-exploder! I can empathize.