## About this lesson

## Exercise files

Download this lesson’s related exercise files.

Non-normal Data.xlsx14.5 KB Non-normal Data - Solution.docx

229.5 KB

## Quick reference

### Non-normal Data

Many processes have non-normal variation which generates non-normal data. There are several reasons that will cause this condition. However, the Central Limit Theorem is presented as a tool to normalize non-normal data.

### When to use

A process either generates non-normal data or it does not. When non-normal data exists, the underlying cause should be determined. In many cases, the non-normal data can be transformed into normal data and then controlled using SPC.

### Instructions

Non-normal data can exist for many reasons. In order to use SPC with a process, that non-normal data must be transformed into normal data. Some of the reasons for non-normal data represent a process out of control and some of those could occur with a process that is in control. Let’s first consider the reasons and then what can be done with non-normal data.

- Too many extreme points. This indicates a process that is out of control The extreme points prevent the ability to predict process performance. In this case, identify and remove special causes that created the extreme point from the process. You cannot use SPC until this is done.
- Overlap of maultiple processes in the data. This will often generate a distribution that is lumpy. A lump occurs at the center value for each of the processes in the data. The best approach is to stratify and separate the processes. However, you can also use the Central Limit Theorem to create a normal distribution.
- Sorted Data. IN this case, the process or system automatically sorts the data into a specific order or the data at the extremes is automatically reworded so that it is closer to the central value. Move upstream in the data collection and use original data points instead of the “reworked” data. If you cannot do that, the Central Limit Theorem can normalize this data.
- Natural Limit. In this case, one of the tails on the bell-shaped curve is truncated. This is due to an equipment or natural limit in the process. You can transform data skewed by physical limitation either through the Central Limit Theorem or other transformation process.
- Insufficient data discrimination. In this case, the data is only able to take on a few values such as on/off or true/false. The raw data can never form a normal bell-shaped curve. You may be able to improve the measurement system. Otherwise, use the Central Limit Theorem to transform the data.

The primary technique for This states, “The average of many values tends to have a normal distribution.” So instead of plotting the raw data, a small sample or subset of data is collected and a total value for the subset is determined. These subset data values will likely be normal for all causes extreme data points. The obvious question is, “How many data points in a subset?” This depends upon the nature of the non-normality. The rule of thumb is that if the data is symmetrical, use at least 5 points. If the data is not symmetrical – such as skewed or a physical limit – use 30 data points. Regardless, the more data points, the more likely the transformed data will be normal.

Data can also be transformed through several transformation algorithms. The most popular transformation in the SPC world is the Box-Cox transformation. Most statistical software applications, such as Minitab, can do this transformation with just a few mouse clicks.

Once the data is transformed either through the Central Limit Theorem or an algorithm, the new normal data can now be evaluated using SPC tools.

### Hints & tips

- If the non-normality is due to extreme points, you must first get the process under control by eliminating those causes. Until then, SPC will not add much value.
- If there are multiple processes present, it is best to separate those and put each one under statistical control. Otherwise, it is difficult to know what to fix when SPC indicates a problem.
- When creating the subsets or samples to support a Central Limit Theorem transformation, try to use logical subsets such as all the points in a shift. The key is that they are from the same time period.

Lesson notes are only available for subscribers.