Subscriber only lesson.

Sign up to the Lean Six Sigma - Yellow Belt course to view this lesson.

## About this lesson

## Exercise files

Download this lesson’s related exercise files.

Descriptive Statistics.docx205.9 KB Descriptive Statistics - Solution.docx

201.7 KB

## Quick reference

### Descriptive Statistics

Lean Six Sigma methodology relies heavily on statistical analysis of problems and solutions. A single data point is not sufficient, rather a collection of data is needed for analysis. This collection will have some natural variability within it and descriptive statistics explain the boundaries of that variability.

### When to use

Descriptive statistics are used whenever there is a data set to be analyzed. That will occur in the Measure, Analyze, Improve, and Control phases.

### Instructions

While a single data point is interesting, a set of data values for a process parameter provides a much richer and more complete picture of that aspect of the process. In fact, the more data points the more accurate the picture. However, a large set of numbers is awkward to work with, so the data set is described using a set of standard statistical measures. These descriptive statistics are used throughout the Lean Six Sigma process.

The first three statistics are often used to describe the central tendency of the data set.

Mean – the average value of the data set. It is calculated by adding all the values in the data set and dividing that sum by the number of data values. It is often expressed as (x̄). The mean can be heavily influenced by outlier values.

Median – the middle point in the data set. Order the data set from smallest value to largest value. The center point is the median. If the data set has an even number of data points, the average of the two center points is the median. This is a better measure of central tendency when the data is skewed or there are outliers.

Mode – the most frequently occurring value within the data set. This statistic is seldom used in Lean Six Sigma.

The next three statistics are used to describe some aspect of the span or width of the data set.

Range – the value of the data set span. Subtract the smallest data value from the largest data value.

Deviation – the span from the average value of the data set to specific data point. Deviation is always associated with a specific point, not the entire data set.

Standard Deviation – The square root of the average of deviation squared. This value provides a measure of the width of the data set that accounts for the central tendency and the full range of the data. This statistic is often represented by the Greek symbol, σ.

### Hints & tips

- Know these definitions and how to find these values. You will be using them often.

Lesson notes are only available for subscribers.