The Blog

view all

431. The CRS Colombian Varietal Cuppings: Counter Culture

Counter Culture Coffee is a roaster that needs no introduction.  Its innovations in coffee sourcing, quality and sustainability have outsized influence in the coffee industry.  Counter Culture is, in short, a tastemaker.  And Counter Culture likes Caturra.

A lot.

As part of the CRS Colombian Varietal Cuppings, Counter Culture’s QC Director Timothy Hill cupped 22 sample pairs (each pair consists of 1 Castillo sample and 1 Caturra sample taken from the same farm) twice in his lab in Durham.  He also cupped those samples as part of the first Colombia Sensory Trial panel in Chicago.

Across the three cupping events, he preferred Caturra by a ratio of more than three-to-one.

When he preferred Caturra, he preferred it by wider margin than when he preferred Castillo.

He felt that the best coffees were consistently Caturras and that the worst coffees were overwhelmingly Castillo.

Perhaps most importantly, the average score he awarded to Caturra was over 85 points while the average Castillo was under 83; the company buys coffees that score 85+ but doesn’t buy coffees <83.

Today I talk with Tim about the results of the cuppings, summarized in the graphic below.



What were your expectations going into these trials?

My hypothesis was that Castillo would perform better than expected, and even take some of the best scores.  I thought that it would have at least four places in the top 10 scores, maybe even more. But I also believed it would really dominate the low scores.  I thought as many as seven out of the bottom 10 would be Castillo.  In general I was expecting the Caturra to do slightly better overall, with more of the highest scores.

I will say that honestly I was rooting for the Castillo.  I always root for the underdog, and if indeed the Castillo could outperform the Caturra, it would be a victory for quality and for producers since it would combine the quality I look for with the yields and disease resistance growers need.


Did you bring any baggage with you to this process?

The only bias I had going into the tastings was my experience with the coffee we have historically bought from Colombia, which comes from a group in Cauca.  Over the past seven years, as the percentage of Castillo has gone up the scores have steadily gone down in our lab.  When we first started buying this coffee, it was averaging 87 points.  At that time the percentage of Castillo was zero.  Now we estimate that coffee is 80 percent Castillo, and it doesn’t crack 86 points.  The last lot we bought scored 84.5.  While there is likely more at play than just the percentage of Castillo to consider it is certainly a factor we recognize as part of the quality problem.


You have had the opportunity now to cup these coffees three times—twice at your lab in Durham with colleagues and once as part of the first Colombia Sensory Trial in Chicago last month.  I want to talk about all three tastings but I want to start in Durham.  How did you set up the first cupping there?

We cupped the 44 samples over a period of three days.  All coffees were given internal reference numbers, and no prior knowledge of the samples was known by the tasters. All samples were cupped randomized with no consideration for what farm they were from or what variety they were.  And the set-up did not take into consideration the sets or sample pairs—the coffees were just assigned blindly in flights of 10-14 coffees. All coffees were cupped blind 18-24 hours after roasting. Green samples were placed in front of the coffees when tasting.

Only 3 cups were used, as consistency was not a consideration of score.  We used our internal scoring sheet, which puts the primary focus on Flavor, Acidity, Fragrance/Aroma, with a small allowance for Body.

The only two people who cupped consistently throughout the three days were me and one trainee.  Two other cuppers were part of the process but did not score each round or every coffee, so I have submitted only my results.


And what were those results?

Of the 22 sets, I preferred Caturra 17 times and Castillo three times.  On two sets quality was equal.

Average Caturra score was an 85.27 and average score for Castillo was an 82.3

Most notably, Castillo was given a below-specialty score seven times while Caturra was not given a single below-specialty score.


Wow.  That is a decisive preference for Caturra.  I know you prepped the coffees and cupped them a second time at your lab.  How did you set up the second cupping?

To focus in on the decision a farm would make regarding what variety to plant, we decided to set up the coffees in pairs from the same farm–Castillo and Caturra side-by-side.  This time around we covered every sample so there was no visual clue or information on roast level or moisture, and the order was completely randomized.  We used all of the same samples from round one that were roasted over a three-day period (and put in a one way valve) and re-cupped the same samples.


And what happened that time?

Eighteen preferences for Caturra.  Four preferences for Castillo.

Average score for Caturra was 85.05.  Average Score for Castillo was 81.


The preference was even more decisive the second time, then.  But that’s not to say that Castillo is not capable of producing a nice cup of coffee.  You awarded some high scores to Castillo samples: a 90, an 89, a couple of 88s, an 87.5, a pair of 87s.  Were you surprised by that?

90 points from a Castillo is surprising.

But when our Castillo scores are averaged (even the scores from the round that produced the 89- and the 90-point Castillos) the average was 82.3 points—a coffee we would not consider for purchase. The Caturra for that round averaged 85.3—a coffee we would purchase. This is a 3-point difference and the difference between what we purchase and what we do not. This is what coffee buyers worry about. While it is true that Castillo can be great, it seems that on average Castillos compared to the Caturras from the same farm are considerably less good and less valuable.


I have to ask: what happened between Chicago and Durham?  In Chicago, you preferred Caturra 52 percent of the time by an average margin of 2.81 points and gave it an average score of 83.59.  By the second Durham cupping, you preferred Caturra 86 percent of the time, by an average margin of 6 points and gave it an average score of 85.05 points.  Those are some serious shifts.  To what do you attribute them?

Various things.

  • Sensory preference.  From the Chicago tasting I did discover one important thing, that likely influenced the following tastings greatly, and that is once you focus in on the flavors of Castillo and Caturra, they become very easy to recognize.  The flavor that I came to recognize in the Castillo samples, (slightly vegetal tendency, very good-to-slightly-too-aggressive acidity, and sometimes even earthy or woody notes) was indeed the flavor that I was tasting in Counter Culture’s Colombian offerings and the flavor to which I attributed the lower scores over the years of purchasing that coffee.   Once those flavors are recognized, when a coffee shows them, it becomes very hard for me to reward the good character and almost impossible not to punish what is perceived as a flaw.  This is what I found as I tasted these samples all over again in our Durham lab.
  • Roasting. This turned out to be by far the most challenging aspect to the process of evaluation.  With the moisture known, and the weight loss known for each sample, one can determine if the roast was spot on or slightly off.  If a roast was slightly off (all coffees were within on specified roasting specifications of weight loss, so there was only slight variation) it was still accounted for and tasters tried to assess the quality of the coffee independently of roast.  This is extremely challenging, as the score then only becomes the best guess and not a reflection of exactly what is on the table.  It is also extremely important to note that the coffees that were on the dark side of the appropriate roast level were by far the most debated coffees in the trials, and overall these samples were almost always Caturra. Which brings me to the next point.
  • Visual bias.  This is something that is tough to change.  Being a roaster and sample roaster, one becomes calibrated for what denser higher quality coffee looks like.  In this regard, Castillo is at a huge disadvantage.  In my experience, 90 percent of Castillo samples develop in an entirely different way, and look and act like coffee that is much less dense than Caturra samples.  Their outer color generally (but not always) appears lighter in roast level, and the surface is much more monotone, making it appear much less dense than Caturra.  Visually I would say it is easy to pick which samples are the Castillo and which are Caturra.  This was noted and cuppers were directed to not play into any bias, but it has to be noted that this of course could have played into the results.
  • Roasted sample.  Our decision to put a roasted sample on the cupping table likely swayed results to a small degree since the information that is given or can be gathered from the sample—moisture, weight loss during the roasting process, bean shape, and of course roasted consistency—influence a cupper’s perceptions of quality.  The positive to having the reference sample, is that we of course take into consideration the visual of a coffee when we purchase it — and this is how we cup for purchasing.  If 5 or 10 cups taste great, but we can see a lot of “quakers” or inconsistency in the sample, we would be skeptical of the coffee and not purchase it.
  • Moisture.  Because moisture percentage could be seen by the taster, on at least one Caturra sample and at least one Castillo–coffees with high moisture were also scored based on intrinsic quality not flavors associated to high moisture (savory, fruity notes).

That said, at the end of the day, it is fascinating that both sets regardless of score variation did in fact yield the same basic results.  That my preference leans very heavily towards Caturra.  And that it is of a magnitude of 3-4 points per coffee.


So this was a case of you dialing in your preferences through multiple repetitions, is that it?

I think panels are all about calibration.  Bringing people together in terms of how they are scoring coffees.  But when you get into your own lab and your own domain you start to apply more of your individual ideology.  That was true here.

I left Chicago with some clarity about my preference and struggling with the Castillo profile.  When I started cupping the coffees here I really started polarizing them based on what was a clear and consistent difference in their profiles.


If the most notable result from these three cupping sessions is your overwhelming preference for Caturra, then a close second has to be the enormous variation in the scores you assigned to the same coffee over four reps in Chicago and Durham.  I don’t know many people out there who have been more frustrated than you with the accepted formats for measuring coffee quality, or done more tinkering to improve them.  But even your approach has not succeeded in assigning coffees the same scores over multiple reps.  What gives?

Cuppers are terrible.

I found myself mystified and somewhat ashamed at the variation in cup score.  I recognize that I am a taster that likes to reward and punish coffees to great degrees with a larger range than many of my peers, but the numbers were still surprising.

Almost every set has one coffee that has a wide variation (greater than 3 points) from one round to the next, which makes it hard to consider these reliable data.  These data really point to the fine, very fickle/temperamental points of quality, and how nearly impossible it is to set up a proper protocol for quality evaluation.  We found that the subtle difference in roast date, in roast level, moisture content, and the inconsistency of the samples make for a impossible “true” laboratory setting.

We believe that much more work needs to go into the proper protocols for evaluating these and other sample sets.  It seems more repetition, calibration samples, more unique table setups, and other metrics need to be put in place for any true statistician to take the cupping numbers seriously.


So, what?  In the end, what are the big takeaways for you as a result of this exercise?

The main conclusion that can be drawn is this: there is a flavor difference between Castillo and Caturra, and it is a recognizable and relatively clear difference.  If the reports I have heard are true, then Colombia’s coffee is upwards of 60 percent Castillo. I believe this is single largest flavor profile shift a country has ever made in such a short time.


What are the implications of this shift for the marketplace?

I firmly believe that many buyers will actually gravitate towards the brighter acidity of Castillo.  For me it seems that this was a goal for this variety, and it was executed well.  I personally did not gravitate towards this profile, and opted for the slightly rounder, sweeter, and what I found to be more delicate, complex coffees that had a softer less aggressive finish—traits I found mostly in the Caturra samples.


What about growers? 

The ultimate question for me is this: is the 50- to 60-cent-per-pound premium that I believe the Caturra variety is worth based on this trial enough for the producer?  If that answer is “no,” then at the end of the end of the day, I cannot and will not recommend Caturra. But if that answer is “yes,” then it is up to the buyers that prefer this profile to pay up for it.


This conversation with Tim Hill of Counter Culture Coffee is the first in a series of weekly interviews on the CRS Colombian Varietal Cuppings

Next week: Tim Wendelboe >>

 – – – – –

The Colombia Sensory Trial and the CRS Colombian Varietal Cuppings are supported by a grant from the Howard G. Buffett Foundation.


  • P Baker says:

    Hmmm – that’s interesting and disappointing, but not conclusive!

    The killer fact concerns density: less dense beans will of course give a poorer taste – the beans have not filled optimally so have less of the compounds that deliver cup quality.

    But why were the Castillo samples less dense? I think there are three possible reasons?

    1 The Castillo came from lower altitude/more sun-exposed parts of the farm than the Caturra; lower altitude means higher temperatures, so the pulp ripens quicker and the farmer has to harvest before the beans inside have matured. I’m assuming that height differences are small and so it’s an unlikely cause, but needs to be accounted for – farmers would naturally replace their most rust-susceptible coffee plots and these would tend to be lower altitude parts of the farm.

    2 Castillo bore more berries than the Caturra; heavy bearing means the tree struggles to feed them all, so filling is incomplete. This could be because of less shading that provokes more flowering, or just because the Castillo was younger and/or in a full production year, so flowered more heavily. (BTW: if Caturra premiums are 20% but Castillo yields are 20% higher, then the farmer is better off because he will have spent less on fungicides? What is the reality in the field?]

    3 It could be a varietal trait, some varieties synthesize more ethylene than others and ripen more quickly, hence, all other things being equal, quality would tend to be lower.

    So it’s tricky to do a really rigorous comparison, but quantifying bean density is absolutely vital and I’m always surprised that roasters don’t do this routinely for their most prized coffee origins. By doing this, they should be able to spot subtle changes before roasting (e.g. an El Niño year ought to give a lower density on average) and modify their roasting time and temperature regime accordingly. With global warming, one would expect that bean density from the same farm has been falling over the decades, a pity no one is looking at this.

    The acid test would be to equalize flowering loads on adjacent Castillo and Caturra plots. My bet is that then there would be no detectable quality difference!

    • Michael Sheridan says:


      Stay tuned for more installments in this series. The preference for Caturra was NOT universal.

      Meantime, let me reject your first hypothesis: there was no appreciable difference in elevation or solar exposure between the Castillo and Caturra lots on participating farms. In one case out of 22 there was a 106-m difference between the lots where the two varieties were grown, but across the remaining 21 sample pairs, the average difference in elevation was just 3 m. While we did not collect measurements of shade cover, visual inspection of the Castillo and Caturra lots was a part of the screening process and growers with significant difference in shade cover were eliminated from participation.


  • Very interesting study. Great work, keep it going!

    • Tim Hill says:

      Just to clarify one point with what I said in the interview….the castillo looked like, and roasted like a less dense coffee. We however, were not able to do any proper evaluation or true test to determine density. I think this should be looked at, as it is important even when just thinking about roasting a blend of these two varieties (which is something that is very likely in a real world setting) and the challenge that could for quality and roast consistency.

Leave a Reply

Your email address will not be published. Required fields are marked *

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS