Tuesday, August 16, 2011

the cereal experiment: ranking system

I assume that if you are actually taking the time to read this you are in it for the nerdy facts, not witty humor. So, I'll tell it to you straight - here is how my ranking system for the cereal experiment works. I'm open to modifications and suggestions so feel free to comment. Hooray statistics!

Standardization.

Each variable (e.g., grams of protein, price per ounce) was standardized to that variable's mean by calculating Z-scores (i.e., a common metric with a mean of 0 and a standard deviation of 1). In this way, the Z-scores are relative to the cereal in the experiment, not to an absolute standard. To do this, an average and standard deviation was then calculated for each variable. The Z-score was calculated as follows:

Z = (variable value) - (average of variable values for all cereals) /
(standard deviation of variable values for all cereals)

I assumed a higher Z-score indicated 'worse' cereal characteristics. Those variables that didn't follow this trend (i.e., where higher values indicated 'better' characteristics) (e.g., taste, texture, fiber, protein) were reverse coded by multiplying their Z-score by negative one (-1). 

Weighting.

Each Z-score was then multiplied by its appropriate weight as determined, albeit subjectively, below (adding up to 100%).

Taste: 20%
Texture: 15%
Calories per cup: 10%
Fat: 5%
Fiber: 15%
Protein: 15%
% calories from sugar: 10%
Price: 10%

Composite Score and Ranking.

All of the weighted Z-scores were then summed to create a weighted composite score for each cereal. The composite scores were sorted by value, with the lowest composite score indicating the best cereal. Based on this sorted list, each cereal was assigned a rank.

You can see the current table with these ranks on my experiment page.

7 comments:

  1. Fun!

    What is your method of identifying outliers? Using a standard deviation is good, however with a small sample size I wonder if you need to remove outliers in order to obtain a better distribution.

    ReplyDelete
  2. @Matt. This thought crossed my mind, but I haven't dealt with outliers yet at all. Something to consider, for sure, especially with the small number of cereals, as you point out. Any ideas? ;) I imagine this formula will get tweaked a bit at the end of the experiment. I also want to play around with the weights before I make my final conclusion.

    ReplyDelete
  3. But I thought Kashi Golden Goodness was the front runner -- this is why we cannot underestimate the value of qualitative data. ;)

    ReplyDelete
  4. @Imagine: No, I'm afraid although I liked Golden Goodness, it has dropped to number 3! The formula just can't ignore Kashi GoLean Original's nutrition - even though GG gets higher marks for taste and texture.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Perhaps, given small numbers and problems with outliers, you should use Hyunh's (1982, 2000) robust z Statistic (z_r). It can be calculated z_r = (D - Md)/.74(IQR), where D is the variable, Md is the median, and IQR is the inter-quartile range. This makes the same assumption of D (i.e., drawn from a normal distribution), although this does not hold. However, it does add beneficial properties in small numbers and with ordinal data beyond the standard z.

    Just my $0.02.

    ReplyDelete