Subscriber only lesson.

Sign up to the Lean Six Sigma Principles - Green Belt course to view this lesson.

## About this lesson

Statistical tests are often used to aid the problem analysis. The statistical analysis of a small sample of data can point to root causes of problems in the full data set.

## Exercise files

Download this lessonâ€™s related exercise files.

Statistical Analysis.docx201.5 KB Statistical Analysis - Solution.docx

202.4 KB

## Quick reference

### Statistical Analysis

Statistical tests are often used to aid the problem analysis by indicating whether to accept or reject the Null hypothesis. The statistical analysis of a small sample of data can point to root causes of problems in the full data set.

### When to use

In some cases the problem analysis will point to an obvious root cause, but often the cause is not as apparent. Statistical analysis is used to confirm a hypothesis when doing hypothesis testing as part of the Analyze phase of a Lean Six Sigma project.

### Instructions

There is a context to statistical analysis and it is important to understand that context. Normally, statistical analysis is being used in conjunction with hypothesis testing. The statistical analysis confirms whether the Null hypothesis should be accepted or rejected. There are literally dozens of possible statistical tests. Which test to use depends upon the structure of the hypothesis and the characteristics of the data. Selecting the correct test will be discussed in the Hypothesis Testing course.

When the statistical analysis is completed, a “P” value, or probability value, will be generated. Based upon the “P” value, the Null hypothesis is accepted or rejected.

#### Inferential Statistics

In most cases, the data set being used in the Lean Six Sigma project will not represent the entire possible population of the problem. It is difficult or impossible to get data from all potential instances with all relevant customers, relevant products, relevant locations, across all time periods – past, present, and future. Therefore, a subset, or sample, of the data is used. The statistical analysis is applied to the data from the sample and then the results are inferred to apply to the entire population.

In order to do this inference, additional analysis should be done based upon sampling approach and confidence levels. By confidence level we mean being able to state something about the entire population based upon the sample data with a 90%, 95%, or 99% confidence. The detailed analysis of confidence levels and confidence intervals is addressed in the Hypothesis Testing course.

The desired confidence level and the descriptive statistics of the sample data (mean and standard deviation) can be used to calculate a confidence interval for the location of the total population mean. The confidence interval calculation can be used in reverse to determine the confidence that a particular value is the mean of the total population based upon the sample mean.

### Hints & tips

- Don’t blindly apply a statistical analysis to your project. If the root cause is obvious, no analysis is needed. If the root cause is not obvious, create a set of hypotheses and based upon the hypotheses and characteristics of the data, apply the one right test for each hypothesis.
- Confidence level is not precisely the probability that the population statistics falls within the range associated with that confidence level, but is essentially that.
- Most organization use a 95% confidence level and that is the default in Minitab.

Lesson notes are only available for subscribers.