About this lesson
One of the most important criteria for selecting an hypothesis test is based upon whether the data is normal or non-normal. The normality question does not prove or disprove the hypothesis, rather it steers the nature of the analysis. This lesson reviews this concept and its application in hypothesis testing.
Normal versus Non-Normal
Hypothesis tests can be done with either normal or non-normal data. But different tests are used. Therefore, a Lean Six Sigma team must be able to determine if their data is normal or non-normal so that they can choose the correct hypothesis test.
When to use
Prior to actually conducting the hypothesis test, the data should always be checked to determine if it is normal or non-normal so as to be able to choose the correct test.
The normal distribution, which is also called the Gaussian distribution or the bell-shaped curve, is characterized by a symmetric distribution. There are as many data points above the mean as below the mean. Also, there is a central tendency. The points are clustered near the mean. That is why when it is graphed, the center is high and the edges or tails are very small and approach zero.
A normal data distribution represents random variation that occurs within every physical system. Characteristics that could cause a data distribution to become non-normal are too many extreme points or outliers, an overlap of several different processes in the data, or physical limits that truncates one of the tails prematurely.
Hypothesis testing can be done with either normal or non-normal data. There are different tests that are done depending upon the type of data. That is why this is the first question that is asked in the Hypothesis Testing Decision Tree after special causes have been removed. Depending upon this answer, a completely different set of tests will be involved.
Normality is determined using basic descriptive statistics of the data sample. When doing that test, several parameters are determined:
- Mean – the average of all the data points. This is often used in Hypothesis tests with normal data.
- Median – the midpoint of the data points. This is often used in Hypothesis tests with non-normal data.
- Standard Deviation – a measure of the spread or width of the distribution. This measures and Variance, which is the standard deviation squared are often used in hypothesis testing.
- Skewness – this is a measure of symmetry. A symmetrical distribution will have a skewness value of zero. The distribution is considered normal as long as the value is between -.8 and +.8.
- Kurtosis – this is a measure of whether the tails are “heavy” or “light.” When they are light, they taper down to near zero on the upper and lower edges of the distribution. Kurtosis can be measured in several ways. The method used in Excel is “Sample Excess Kurtosis.” This measure has the advantage that a Normal curve score will be zero – just like with Skewness. In this case, values from -3 to +3 is still considered Normal.
Normalcy can be checked in either Excel or Minitab.
- Select “Data Analysis” on the “Data” ribbon.
- Select “Descriptive Statistics” and click “OK.”
- Enter the range for your data in “Input Range.”
- Select where you want the results – in a new worksheet or in a location in the current worksheet.
- Select “Summary Statistics.”
- Click on “OK.”
- “Stat” Menu
- Select “Basic Statistics”
- Select “Normality Test”
- Enter the name of the column with your sample data in the “Variable” window
- Ensure the “Anderson-Darling” box is checked
Excel will provide a table with the statistical values and you can then decide if the data is normal or non-normal. Minitab will provide a plot of the data against a normal line and provide a P value that can be used to determine if the data is normal.
Hints & tips
- If the Data Analysis Menu does not show on your Data ribbon in Excel, you need to add the Analysis ToolPak Add-in. Go to File menu, select Options, then select Add-in. Enable the Analysis ToolPak add-in. This is a free feature that is already in Excel, you just need to enable it. You may need to close and reopen Excel for the menu to appear.
- If you don’t have Minitab, consider downloading a free trial. Minitab normally has a 30-day free trial period. All of the hypothesis tests will be demonstrated in Minitab. Approximately half of the tests will also be demonstrated in Excel, but the other half are not available in Excel. If you want to practice doing all the tests, you will need Minitab. Be sure you complete the course within the 30 days before your trial expires.
- When using Minitab, data must always be entered into columns - never into rows. Minitab uses the column names for identifying data sets.
- Data can be copied and pasted back and forth between Excel and Minitab. I often collect data in Excel because that is easier for data collection, and then copy it to Minitab for analysis.
Lesson notes are only available for subscribers.