Sunday, 28 February 2010

The Shake Experiment: Here Are The Final Facts

At long last, the shake experiment has concluded. But what conclusions can be concluded from that conclusion?

Well... maybe not all that much. Taking into account the pie chart below:
it seems rather like there's no real difference between the seven groupings. "Goo" looks a little slender in comparison to its colleagues, but that's about all. If we consider the mean value of each shake category:

Fruit 5.10
Sweets 5.75
Chocolate 6.70
Breakfast Cereal 6.05
Cakes 6.65
Biscuit 5.30
Goo 4.4

it's a bit easier to gauge the differences. Broadly speaking, the seven categories can be divided into four sets, {Chocolate, Cakes}, {Breakfast Cereal, Sweets}, {Biscuit, Fruit} and {Goo}, where the difference between the mean values of elements between sets is greater than the difference between mean values of elements within the set. From this, we might conclude that that first set is the optimal choice.

But is that the whole story? What about the variance of each category. Once again, we can illustrate this (rather crudely) with a bar chart:
This time the graph is a bit more helpful. Clearly chocolate is not only a good bet, but a reasonably consistent one as well. Cakes are even more consistent, which will make it difficult to choose between them. At the other end of the scale, goo is neither particularly enjoyable but desperately variable in addition. About the only further statement one might be willing to make is that the higher mean of biscuit over fruit combined with a slightly reduced variability might make us want to split up their category above.

Is that variation true, however, or just what the graph suggests. Let's check using the standard deviation of each category:

Fruit 1.834
Sweets 1.953
Chocolate 1.237
Breakfast Cereal 1.690
Cakes 0.487
Biscuit 1.634
Goo 2.155

We argue then that within each set, category X dominates category Y if the mean of X is greater than Y and the standard deviation of X is less than Y.

This process creates the following ordering: {Chocolate, Cakes}>{Breakfast Cereal}>{Sweets}>{Biscuit}>{Fruit}>{Goo}.

Obviously, this is only one method. But I have to say it tracks with my experience as a shake expert (I guess this makes this a Bayesian experiment, though BigHead will argue that's been true for some time). I am thus content to recommend those that visit Shake'a'Holic (though for Durham dwellers at least you might want to do it quickly, before the shop's seemingly inevitable lingering death) partake either of a chocolate or cake-based shape.

Right. That's that. Time to start planning the experimentation process by which I can compare cheese and beer...


BigHead said...

I dislike your ordering criterion. It doesn't seem entirely obvious that a category providing, say, 3 shakes at 10 and one at 0 should be considered inferior to a category providing 4 shakes at 7.5 + \epsilon.

Instead you could think up some sort of parametric model and test the hypothesis that two categories have the same mean.

But of course, giving recommendations for purchases based on these average scores doesn't take into account the full decision problem with which the shake-drinker is confronted. In practice, a shake of score zero and a shake of score 1 are effectively equivalent, since the optimal action upon taking a taste of such a stinky shake is to immediately throw it away. Therefore categories providing awesome shakes with occasional stinkers should be considered superior to categories with less awesome shakes and the occasional stinker that isn't quite so bad as the previous category's, even if the mean of the latter category is higher.

SpaceSquid said...

I agree entirely that it's a hideously limited method, but I have to consider the non-mathmos amongst my readers. And in your example, I agree that you can argue the point, but I think most people would agree that arguing a 75% chance of a 10 shake and a 25% chance of wasting your money is less wise than choosing a category for which every shake is 7.500001 (in other situations of course this might not hold). If we sit and attempt to create an ordering system that no-one will ever question, we'll be here forever, and neither of us will get to do any real maths.

I take the point that 0 and 1 are probably immediately bin-bound. Maybe a 2 as well, actually (2.5 I managed to get most of the way through). Regardless, does it really follow from that that less variation around a slightly lower mean is preferable to higher variation about a higher mean, even when (as you're effectively doing) you assume a symmetrical distribution? I'd say if anything this was an argument for rescaling the data somehow.

Senior Spielbergo said...

Ohhhhh I smell utility theory...