Wednesday, March 24, 2010

Using statistical significance (95% confidence) to your advantage


(Wrong)


For anyone who's taken a science or statistics course, you'd be most likely familiar with the Holy Grail of statistical significance; the 95% confidence interval. We've all read something to the effect of, "Group A and Group B are different along C variable due to drug D, statistically significant to a 95% confidence level." The 95% confidence level is today's gold standard.

However, these numbers tend to lie. Here's a great example:

Consider the use of drug tests to detect cheaters in sports. Suppose the test for steroid use among baseball players is 95 percent accurate — that is, it correctly identifies actual steroid users 95 percent of the time, and misidentifies non-users as users 5 percent of the time.

Suppose an anonymous player tests positive. What is the probability that he really is using steroids? Since the test really is accurate 95 percent of the time, the na├»ve answer would be that probability of guilt is 95 percent. But a Bayesian knows that such a conclusion cannot be drawn from the test alone. You would need to know some additional facts not included in this evidence. In this case, you need to know how many baseball players use steroids to begin with — that would be what a Bayesian would call the prior probability.

Now suppose, based on previous testing, that experts have established that about 5 percent of professional baseball players use steroids. Now suppose you test 400 players. How many would test positive?

• Out of the 400 players, 20 are users (5 percent) and 380 are not users.

• Of the 20 users, 19 (95 percent) would be identified correctly as users.

• Of the 380 nonusers, 19 (5 percent) would incorrectly be indicated as users.

So if you tested 400 players, 38 would test positive. Of those, 19 would be guilty users and 19 would be innocent nonusers. So if any single player’s test is positive, the chances that he really is a user are 50 percent, since an equal number of users and nonusers test positive.

Because of the way statistical significance works it takes a large difference to achieve statistical significance on a small sample size but as the sample size gets larger and larger a smaller and smaller difference will yield statistical significance. If the sample size is large enough, you can get statistical significance for a difference that is very small, unimportant, and sometimes just plain wrong.

Just something to keep in mind next time you read test results, and how numbers can lie when the entire story is not told.

No comments:

Post a Comment