“Ok, who wants to explain statistics to me?”
No, seriously. I used to be a big math geek (though I was never a Mathlete). I expect to be able to understand these sorts of things; I just never got around to bothering with statistics. One concept that’s come up again and again that I have only just now looked up is standard deviation.
I’ve always understood, on a basic level, that standard deviation is a way of measuring how spread out your data is, on average. There are actually two ways to look at this, it turns out. There’s variance, which is the average of the squares of the distance to the mean of your data (which glosses over the difference between “mean” and “expected value,” which is also something I don’t understand), and standard deviation, which is the square root of the variance.
This is where I get somewhat confused, however, because the articles linked above mention that there are TWO formulae or methods for getting variance and standard deviation: one used when you’ve got the whole population (and these formulae are the basic versions you can derive from what I’ve written above), and one you use when you’ve only got a sample. Why is this?
The differences seem big; using Excel and the set (1, 2, 3, 4, 5), the variance for the whole population is 2, and the St Dev is 1.41.
We get there simply: the average is 3, so we add (3 – 1)^2 + (3 – 2)^2 + (3 – 3)^2 + (4 – 3)^2 + (5 – 3)^2 and get 10, and then divide the lot by the number of values (5) to get 2. The square root of 2 is 1.41.
However, Excel tells me that the sampled variance is 2.5, and the sampled St Dev 1.58. So I ask you, lazy Heathen, if someone might enlighten me. (Does it have something to do with assuming a normal distribution?)