About this lesson
When inferring statistical values based upon a sample, there is a band of uncertainty around the sample statistic in which the population statistic lies. This band of uncertainty can be calculated and to an extent controlled by the sampling approach used.
When to use
When inferential statistics are used instead of descriptive statistics, a confidence interval and confidence level should always accompany the statistical values.
Descriptive statistics provide a complete statistical description of a data population. However, often the full population is not available. Therefore, inferential statistics are used. This is done by calculating descriptive statistics for a sample from the population and inferring from those statistics the likely population statistics. However, since the sample does not include all data points from the population, the actual population statistics will likely be different than the sample statistics. It is possible to calculate the zone in which the population statistics will fall based upon information from the sample and the population. This zone or range is called the confidence interval. The size of this interval will depend in part upon the level of desired confidence that the actual statistic will be within the interval. This level is known as the confidence level.
The formula for the confidence interval is:
Where: CI is the Confidence Interval range.
X-bar is the mean from the sample
Sigma is the standard deviation from the sample
n is the number of items in the sample
Alpha is 1 – Confidence level %
Z is the Z transformation for an area that represents alpha/2 data from either end of the distribution curve. (see diagram below)
A diagram of the confidence level formula is shown below.
Generally, we want the confidence interval to be as small as possible so that there is little uncertainty with regards to population statistics. Based upon this formula we can draw some important conclusions. First if the standard deviation decreases, the confidence interval will decrease. Second if the sample size increases, the confidence level will decrease. Third if the confidence level is reduced, the confidence level will decrease. This third conclusion is based upon the value of Z for common Confidence levels.
The Confidence Interval formula can be transformed so that a required sample size (n) can be determined based upon the desired spread of the Confidence Interval. This is done by just using the plus and minus term from the confidence interval formula and manipulating terms to solve for the sample size. This formula is:
This formula can be used when planning sample data collection.
Hints & tips
- The only two elements that you can impact are the confidence level and the sample size. The mean and standard deviation come from the data. If you want to reduce your confidence interval, without reducing your confidence level, your only option is to collect more data in your sample.
- The actual formula for these calculations uses the standard deviation from the full population not from the sample. However, Walter Shewhart’s research showed us that once a sample has at least 30 points in it, the standard deviation no longer changes and that standard deviation is an excellent approximation of the full population standard deviation – provided of course that the sample is representative and random.
Lesson notes are only available for subscribers.