I explained here last week that we subjected the Castillo and Caturra samples we collected for the Colombia Sensory Trial to two different kinds of sensory evaluation: two cupping panels at Intelligentsia Roasting Works in Chicago that applied the CQI’s Q protocols and two sensory panels at the Sensory Analysis Center at Kansas State University that applied the new World Coffee Research lexicon.
The rock-star cuppers who gathered at Intelligentsia told us the samples were equal.
The highly trained assessors who gathered at KSU told us they were different.
Equal, but different.
Consider these two photos.
The one on the left is Brasil’s National Cathedral in the capital Brasilia.
The one on the right is the Iglesia San Francisco in the Historic Center of Ecuador’s capital, Quito.
Both churches are architectural icons. Both are visited by large numbers of tourists, many of whom, like me, post photos like these on Instagram. And both are regarded in elite and popular opinion alike as beautiful.
In these important regards, I suppose you could say the buildings are “equal.” But clearly, they are different.
Imagine for a moment–and I know this will be hard, but give it a shot–that you didn’t have a smartphone and there was no such thing as Instagram. Imagine that you had to describe each of these buildings to someone using only–gasp!–words.
Think of three or four terms you might use to describe the building on the left.
Now think of three or four terms you might use to describe the building on the right.
There may be some overlap, but chances are those sets of terms are mostly different.
Equal, but different. It was kind of like that with the Castillo and Caturra samples in the Colombia Sensory Trial.
Over two panels, the cuppers at Intelli evaluated 22 samples of Castillo and 22 samples of Caturra using the Q form. Two repetitions per sample per panel. Eight cuppers. Hundreds of data points. When we totaled them up and averaged them out, both coffees came in right around 83 points. There was a 0.3-point difference between the average for Castillo and the average for Caturra, but it was not statistically significant. The coffees were, in this regard, equal.
Over two panels, the assessors at KSU evaluated the same samples using 36 of the 108 attributes in the new WCR lexicon. These 36 were selected based on a preliminary analysis of the Trial samples and the lab’s previous experience with Colombian coffees.
What did the assesors find?
First, they found that all 36 attributes were present in both varieties. In other words, it wasn’t that one variety presented certain attributes and the other variety presented different attributes. The difference was one of degree: certain attributes were more intense in the Castillo samples and other attributes were more intense in the Caturra samples.
So even though the assessors at KSU did find evidence of statistically significant, if narrow, differences between Castillo and Caturra, they came against the backdrop of a signficiant degree of overlap in the “sensory footprints” of the two varieties. These attributes were more intense in Castillo than Caturra:
What jumps out at me is how many attributes were more intense in the Castillo samples: 27 of the 36 attributes the assessors considered, or 3 in every 4. These 27 include some that we would consider desirable in a coffee–caramelized, fruity dark, cocoa–and others we would almost certainly not: petroleum, moldy damp and astringent aftertaste.
These are the attributes that were more intense in Caturra than Castillo:
Only nine of the 36 attributes were more intense in the Caturra samples than the Castillo samples. As in the case of Castillo, the attributes that were more intense for Caturra than Castillo include both attributes we would want to taste in our coffees and others we wouldn’t.
Here’s the thing, most of these differences–the differences registered for 23 of 36 of the attributes–are not statistically significant.
For the 13 remaining attributes, an analysis of variance test produced a p-value below 0.05, which means we can say with 95 percent confidene that these differences are significant.
For 10 of those attributes, Castillo registered a significantly higher intensity than Caturra. For three, Caturra was more intense than Castillo.
The “83” that anchors each side of this graphic reminds us that even though individual cuppers involved in the Trial did have a significant preference for one variety over another, on average the cuppers who evaluated the coffees did not, on average, find any separation between Castillo and Caturra on Q’s vertical scale. They were equal.
The attributes assigned to each variety show how the sensory assessors who evaluated the coffees “horizontally” using the WCR lexicon found them to be different. The differences were narrow and occurred in a context of significant overlap in the “sensory footprints” of the two varieties, but they were significant.