Significance Testing

Significance Testing lies at the heart of all the inferences that we do from a sampling exercise. We always start with a ‘Null Hypothesis’ in the jargon. A test of significance is a test of that hypothesis. We analyse the data from the sample and try to estimate what would be the probability of getting that data if the hypothesis were true in the universe.

For the below reading, it is first important to understand the difference between accuracy and precision.

ACCURACY

Accuracy is the proximity to truth. If we knew the truth we would totally not estimate it altogether. So, accuracy of an estimate is a totally useless concept altogether for population statistic estimation.

PRECISION

Suppose you have the task of adding up long list of numbers – perhaps your daily expenditures over a month. You do your sum and get a particular result. But you’re not sure whether you got it right. You may have made a mistake in adding or punching in the numbers if you were using a calculator.

What do you do? You do the sum again. And if you’re a cautious accountant you might even do it a third time. If you get the same result every time you feel you have got it right.

Lesson: When in doubt, repeat. Repeatability of the result generates confidence in it. Repeatability is reliability.

Actually, our example of adding up a list of numbers is not a good one. Because, in this case there is only one true answer and we shall get it every time we do our sum correctly. But, the real life situations that we are interested in are the results that we get from measuring a sample of people from some universe. Again, we are not sure if the results are true. So, in line with our commonsense philosophy, we should be repeating the sampling exercise. If we did, it is highly unlikely that we would get exactly the same result, because different people would be included this time. In fact, if we repeated the sampling exercise many times and measured the same thing on different samples of people, we would find that most of the results fall within a range.

We would be entitled to come to a conclusion that, most probably, the truth that we are trying to estimate must lie somewhere in that range.

If we had a method of being more precise and if we could say, for example, that after repeating the sampling exercise many times, 95 percent of the results would fall within a certain range, then there would be a 95 percent chance that the truth would lie in that range.

The width of this range is a measure of the precision of our estimate – narrower the range, higher the precision. Our objective is to narrow this range as much as possible, because that would bring us closer to the elusive truth. Precision replaces the concept of accuracy. We will never be able to say how accurate is our estimate of the truth, but we can say how precise it is.

But how do we get a fix on this range? Taking just one sample in real life is problematic and costly enough. Repeating the exercise many times may be conceptually brilliant, but completely undoable in practice.

Actually, you don’t have to repeat the sampling exercise. This is where the science of inferential statistics comes in. By analysing the data in one sample that you have taken, specifically the variation contained in it, and by making some assumptions about the pattern of variation in the total universe, it can calculate the 95 percent or 99 percent or any other precision range that would actually come to pass if you did take the repeated samples. The whole purpose of inferential statistics is to save you the trouble of actually repeating the sampling exercise by inferring what would happen if you did.

It sounds like magic, but it is only logic. This logic completely depends on a crucial aspect of reality, namely the ‘Laws of Chance’, more commonly known as ‘Probability’.

So, the whole stuff is all about how precise are we in our estimate of a population statistic. After all, we all know the statistics of the sample. The problem is to understand the average height of the population in India, if you have a sample whose average height is known. This is where it all starts, and this is the role of the Central Limit Theorem (CLT).  CLT assumes the population to have a normal distribution, else the ‘n’ value has to be a minimum of 30.

CLT says that if you have a sample mean (x-bar) and the standard deviation of the sample is σ, then the probability that the population mean(µ) lies between the confidence intervals for a desired confidence level (z) (read it as a confidence level for now, I will come back to it later)

 

 

 

which is nothing but

 

 

 

For now, understand that CLT will provide with a confidence limit for the population statistic if you know the sample statistic and the standard deviation. Understanding the nuances of how CLT works and what its details are decently complicated and I will come back to it later.

Let us take a practical requirement for our understanding. Take for example, we have done a product test among men and women in a population and we asked the purchase intention of a product. Let us say, the results look as follows: (the numbers quoted are just for understanding the concept and may not hold the law of statistics)

Since we want to examine the differences in scores between men and women, we formulate the ‘null’ hypothesis that ‘there are no differences in the real scores in the population among men and women’ implying that the differences in the scores in the sample have come about by chance, and if we had repeated the sampling exercise, the differences would have disappeared.

The first thing to do is to calculate the confidence belts for both the scores by analysing the ‘variance’(using CLT) in the sample scores among men and women. Various situations can arise as follows:

Situation 1

Sample Size: 100 each

95% of range of scores of men is: 4.5———-4———–3.5

95% of range of scores of women is:                                      3.4———–3————2.5

There is only a small chance that men’s scores will be lower than 3.5 and women’s scores higher than 3.4. Therefore, the statement that ‘Men score higher than women’ has only a 5% chance of being wrong. Scores are significantly different at 5% level.

So if the 95% confidence belts don’t overlap much, then we can say that the scores are significantly different and cannot come by chance. So, we reject the null hypothesis. Here the degree of risk in rejecting the hypothesis is 5%.

Situation 2

Sample Size: 100 each

95% of range of scores of men is: 6———-4———–2

95% of range of scores of women is:    5———–3————1

No evidence for believing men score higher.

Scores are not significantly different at 5% level. We therefore don’t reject the null hypothesis.

Scores are not significantly different at 5% level. We therefore don’t reject the null hypothesis.

We can make this case to be significantly different by taking decreasing the confidence level or increasing the sample size as follows:

90% range of men: 4.6———–4———–3.6

90% range of women: 3.5———-3———–2.5

Scores significantly different at the increased risk level of 10%

We can also increase the sample size

Situation

Sample Size: 200 (As we increase the sample size, the range for confidence level will decrease which may lead to significant difference even when the confidence level)

95% range of scores of men: 4.2———-4————3.8

95% range of scores of women: 3.3———–3————2.7

Scores significantly different at 5% level

Therefore, increasing sample size will make smaller differences significant.

Interpretation of Significant Results

The fact that a survey result is found to be significant, by carrying out a statistical significance test, often leads to confusion when such a result is presented to people unfamiliar with recent methodology. The layman, when told that something is significant, often assumes that the researcher considers the result to be “important”.  Always remember when the researcher says significant he means that the result is statistically significant. In statistical terms, if, for example, a difference between two percentages is declared significant, it simply means that this difference, no matter whether it is a large or small difference, cannot have occurred by chance.

References:

http://dsearls.org/courses/M120Concepts/ClassNotes/Statistics/530_conf_int_mean.htm

http://dsearls.org/courses/M120Concepts/ClassNotes/Statistics/530G_Derivation.htm

http://lssacademy.com/2007/07/16/explaining-the-central-limit-theorem/

http://www.southalabama.edu/coe/bset/johnson/lectures/lec16.htm

http://www.socialresearchmethods.net/kb/stat_t.php

Advertisements

2 thoughts on “Significance Testing

  1. Richard

    Excellent article Sai..very crisp, clear and practical…Looking forward to your contribution to the Friday gurukul sessions.
    Richard

Comments are closed.