by Greg Kotkowski

Recently I read an interesting article written by Gerd Gigerenzer and Adrian Edwards (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC200816/). It deals with patients and doctors misunderstanding the statistical nomenclature for testing and examination. I’d like to discuss the article’s subject, because the majority of the population doesn’t understand the results of statistical inference and takes wrong decisions on a daily basis.

The first thing worth mentioning is the single event probability.  For example, what does it mean when the weather forecast says that the probability of rain is 30%? Some people understand that it is going to rain for 30% of the day.

The same applies to medical diagnosis. When my friend got pregnant a nuchal scan of her baby was performed and she was informed that the probability of her baby to have the Down syndrome was about 28%. Is it low or is it high? What should she do with such information?

The nuchal scan is a prenatal examination consisting in a few measurements of the baby at the end of the first trimester. From the results, the risk of abnormal development is calculated based on multiple statistical tests. But what does that mean? How is the risk of an abnormality really calculated and, most importantly, how does it have to be interpreted? Isn’t it easier to say that for example 2 out of 5 tests turned out to be positive?

First of all, it is wrong to say, for example, that you as a patient have a 40% probability to have cancer. You are not a random variable, you are either sick (100%) or not (0%). The right thing to say would be: your test result is positive and 40% of all people with a positive result on this test have cancer. But would you bother to take another examination to prove your illness? You have a higher chance to be in a lucky group of healthy people, haven’t you?

Tommaso once wrote about two types of errors in statistical testing. Let’s recall it with an example. Let 70% of the population have some given disease (prevalence). We can perform a diagnostic test with a sensitivity of 95% and a specificity of 60%. This means that on average 95% of the people who are sick are correctly identified, but 5% of them are not (type I error). However, the test also sometimes makes mistakes in the other direction. On average, 60% of the healthy people have negative results, but the remaining 40% are tested positive.  That is type II error.

Let us consider a positive result on this test. Without going deeper into mathematical equations, the probability of having the disease given a positive result is 84.7%. Therefore, after having been tested positive by the test, the probability of actually being ill is increased by 14.7% compared to the 70% of the total population, but we are still not sure if any treatment should be performed after the positive test.

Therefore, another (more exhaustive) test could be carried out. The second test might have a sensitivity of 99% and a specificity of 96%. Let’s suppose a negative result was obtained. Now the probability of being sick is 5,46%. The patients who get positive on the first test and negative on the second one are send back home with the information to repeat the test after a certain time (when the disease would be more developed). The pity is that we can never be 100% sure when any randomness is present (even if in 5σ distance).

Coming back to the article, it is stated that about half of the doctors understand in a completely wrong away (meaning don’t have ANY idea about) terms like sensitivity, specificity or prevalence. Because of this, countless patients are misled and make improper decisions about their health care.

Think about the nuchal scan that has type II error equal to 5% in detecting Down syndrome. Think about all the healthy babies who were aborted, because of scoring false positive. Just because the specificity of the test is low, the uninformed parents and doctors claim it “risky” or “probable” and they do what should not be done.

Think about some physicists who have a huge pressure for publications and might claim the false positive proofs. It’s a pity, but it seems that statistics can be cruel and play tricks on us.

(Featured image credit: