The Blog

view all

Equal. Different.

I explained here last week that we subjected the Castillo and Caturra samples we collected for the Colombia Sensory Trial to two different kinds of sensory evaluation: two cupping panels at Intelligentsia Roasting Works in Chicago that applied the CQI’s Q protocols and two sensory panels at the Sensory Analysis Center at Kansas State University that applied the new World Coffee Research lexicon.

The rock-star cuppers who gathered at Intelligentsia told us the samples were equal.

The highly trained assessors who gathered at KSU told us they were different.

Equal, but different.


Equal. Different..

Consider these two photos.

The one on the left is Brasil’s National Cathedral in the capital Brasilia.

The one on the right is the Iglesia San Francisco in the Historic Center of Ecuador’s capital, Quito.

Both churches are architectural icons.  Both are visited by large numbers of tourists, many of whom, like me, post photos like these on Instagram.  And both are regarded in elite and popular opinion alike as beautiful.

In these important regards, I suppose you could say the buildings are “equal.” But clearly, they are different.

Imagine for a moment–and I know this will be hard, but give it a shot–that you didn’t have a smartphone and there was no such thing as Instagram.  Imagine that you had to describe each of these buildings to someone using only–gasp!–words.

Think of three or four terms you might use to describe the building on the left.

Now think of three or four terms you might use to describe the building on the right.

There may be some overlap, but chances are those sets of terms are mostly different.

Equal, but different.  It was kind of like that with the Castillo and Caturra samples in the Colombia Sensory Trial.

Over two panels, the cuppers at Intelli evaluated 22 samples of Castillo and 22 samples of Caturra using the Q form.  Two repetitions per sample per panel.  Eight cuppers.  Hundreds of data points.  When we totaled them up and averaged them out, both coffees came in right around 83 points.  There was a 0.3-point difference between the average for Castillo and the average for Caturra, but it was not statistically significant.  The coffees were, in this regard, equal.

Over two panels, the assessors at KSU evaluated the same samples using 36 of the 108 attributes in the new WCR lexicon.  These 36 were selected based on a preliminary analysis of the Trial samples and the lab’s previous experience with Colombian coffees.


WCR Lexicon - 36 Attributes.

What did the assesors find?

First, they found that all 36 attributes were present in both varieties.  In other words, it wasn’t that one variety presented certain attributes and the other variety presented different attributes.  The difference was one of degree: certain attributes were more intense in the Castillo samples and other attributes were more intense in the Caturra samples.

So even though the assessors at KSU did find evidence of statistically significant, if narrow, differences between Castillo and Caturra, they came against the backdrop of a signficiant degree of overlap in the “sensory footprints” of the two varieties.  These attributes were more intense in Castillo than Caturra:


Castillo Attributes.

What jumps out at me is how many attributes were more intense in the Castillo samples: 27 of the 36 attributes the assessors considered, or 3 in every 4.  These 27 include some that we would consider desirable in a coffee–caramelized, fruity dark, cocoa–and others we would almost certainly not: petroleum, moldy damp and astringent aftertaste.

These are the attributes that were more intense in Caturra than Castillo:


Caturra Attributes.

Only nine of the 36 attributes were more intense in the Caturra samples than the Castillo samples.  As in the case of Castillo, the attributes that were more intense for Caturra than Castillo include both attributes we would want to taste in our coffees and others we wouldn’t.

Here’s the thing, most of these differences–the differences registered for 23 of 36 of the attributes–are not statistically significant.


Statistically Insignificant Differences.

For the 13 remaining attributes, an analysis of variance test produced a p-value below 0.05, which means we can say with 95 percent confidene that these differences are significant.


Statistically Significant Differences.

For 10 of those attributes, Castillo registered a significantly higher intensity than Caturra.  For three, Caturra was more intense than Castillo.


Castillo and Caturra.  Equal and Different.


The “83” that anchors each side of this graphic reminds us that even though individual cuppers involved in the Trial did have a significant preference for one variety over another, on average the cuppers who evaluated the coffees did not, on average, find any separation between Castillo and Caturra on Q’s vertical scale.  They were equal.

The attributes assigned to each variety show how the sensory assessors who evaluated the coffees “horizontally” using the WCR lexicon found them to be different.  The differences were narrow and occurred in a context of significant overlap in the “sensory footprints” of the two varieties, but they were significant.


  • What an awesome and easily digestible breakdown of the nuances between these beans. Throughly fascinating from start to end. Thank you.

  • Ewan Reid says:

    Maybe what this also tells us is that the Q vocab scoring doesn’t stack up statistically given it blends subjective, objective and hedonics into a reductive score?

Leave a Reply

Your email address will not be published. Required fields are marked *

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS