Coffee Design of Experiments (Part 1: Screening)

I will invoke a variety of principals from experiment design methods. The truth is, though, that I’m self-taught in experiment design. It is altogether probable that there are better ways of conducting this experiment which my ignorance has hidden from me. I’m open to suggestions!

Quick Review

The variables in control are temperature t, extraction time T, and coffee-to-water ratio (by weight) r. Each of these will be considered over a domain of 5 values. Temperature and coffee-to-water ratio are linearly divided into 5 steps between their minimum and maximum values, which I selected based on published coffee brewing advice. Extraction time is divided logarithmically over its domain, since my intuition is that extraction is an approximately logarithmic process in time.

Linear Uncorrelated Approach

I might assume that t, T, and r are uncorrelated and independent. Then, without assuming that Q is linear in t, T, or r, I could easily conduct the following experiments (if the mathese bothers you, skip ahead to the text):

  • T = 195 F, t = 55 seconds, r =
    (0.035, 0.045, 0.055, 0.065, 0.075) g/ml
  • T = 195 F, t = (10, 23, 55, 128, 300) seconds, r = rmax = argmaxr
    Q(r, T = 195, t = 55)
  • T = (185, 190, 195, 200, 205) F, t = tmax = argmaxt Q(r = rmax, T = 195, t)

In English, first find the best point by finding the optimum coffee-to-water ratio for a fixed extraction period and temperature. My experience suggests that the ratio is likely to be one of the more sensitive parameters, so sweeping this first should provide good insight. Then, using that optimum C/W ratio, sweep the extraction time to find the best duration. Finally, sweep the temperature.

Really, this is a sort of bastardized gradient search. It doesn’t account for correlation among parameters. For example, you might imagine that a very high C/W ratio combined with a short extraction time would produce a very different flavor than the same high C/W ratio and a long extraction time. In fact, one might be delicious and the other dreadful. The proposed methodology, however, would not reveal that condition.

Another problem is that it really isn’t safe to assume only 15 trials would be required. The quality function (my enjoyment) is likely to be quite subjective and quite variable—I’ll assume for argument that it is also Gaussian. Q can take integer values from 1 to 5, and it is quite reasonable to assume that my variability has a standard deviation is 1.5 or so. To know the actual quality accurate to a single value of Q with 95% certainty requires the standard deviation to be about 0.25. We can reduce the standard deviation of the estimated Q by averaging multiple experiments. How many? (1.5/0.25)2 = 25. Now the original 15 trials have turned into 15×25 = 375, and at 1 experiment per day pushed the answer a year into the future.

A savvy person might ask if there is a way to find an equally good answer with fewer experiments or if there is a better way to arrange those 375 experiments to get more broadly useful answers.

NIST provides a sublime search tree for selecting an experiment design, and it is clear that what I’ve outlined is best served by a response surface objective (RSO) on 3 factors, or possibly a main effects design first, followed by a response surface objective on 2 or 3 factors, or even a simple 1-factor search.

While I expect to consider the RSO approach in terms of number of experiments, and then examine a screen+RSO approach to see if there is a design which offers possibly reduced number of experiments. The Box-Behnken design for RSO in 3-factors requires a paltry 15 experiments. Still, that is three weeks away, and I would prefer to have some data to work with sooner.

The screening test is much smaller, and perhaps worthwhile in that the data can be included in the RSO experiment, even though not strictly required. For my 3-factor system the Level III screening design requires a modest 4 runs. This would be small enough that I could even do two trials of each, which would be of great help in reducing my variability. The experiments are listed in the following table.

Screening Design

Actual Values

Statistics Jargon

Trial

r (g/ml)

t (sec)

T (F)

r

t

T

1

0.035

10

205

-1

-1

+1

2

0.075

10

185

+1

-1

-1

3

0.035

300

185

-1

+1

-1

4

0.075

300

205

+1

+1

+1

If I Had a Grinder

The other major effect that I would like to measure is grind size, predicated on substantially uniform product. My grinder doesn’t do it. Consider, though, the 4-factor version of this same exercise. The four-factor RSO requires 33, 46, or 52 experiments, depending on the method. The four-factor screen requires 8 experiments. Even if the screening doesn’t reduce the number of factors substantially, the experiment is tractable at 41 trials. Of course all this assumes Q has tolerable variance.

If you donate the grinder, I’ll do the experiments…