Back to course

Statistical Analysis

Locked lesson.

About this lesson

Statistical tests are often used to aid the problem analysis. The statistical analysis of a small sample of data can point to root causes of problems in the full data set.

Exercise files

Download this lesson’s related exercise files.

Statistical Analysis.docx
59.2 KB Statistical Analysis - Solution.docx
60.3 KB

Statistical Analysis

Statistical tests are often used to aid the problem analysis by indicating whether to accept or reject the Null hypothesis. The statistical analysis of a small sample of data can point to root causes of problems in the full data set.

When to use

In some cases the problem analysis will point to an obvious root cause, but often the cause is not as apparent. Statistical analysis is used to confirm a hypothesis when doing hypothesis testing as part of the Analyze phase of a Lean Six Sigma project.

Instructions

There is a context to statistical analysis and it is important to understand that context. Normally, statistical analysis is being used in conjunction with hypothesis testing. The statistical analysis confirms whether the Null hypothesis should be accepted or rejected. There are literally dozens of possible statistical tests. Which test to use depends upon the structure of the hypothesis and the characteristics of the data. Selecting the correct test will be discussed in the Hypothesis Testing course.

When the statistical analysis is completed, a “P” value, or probability value, will be generated. Based upon the “P” value, the Null hypothesis is accepted or rejected.

Inferential Statistics

In most cases, the data set being used in the Lean Six Sigma project will not represent the entire possible population of the problem. It is difficult or impossible to get data from all potential instances with all relevant customers, relevant products, relevant locations, across all time periods – past, present, and future. Therefore, a subset, or sample, of the data is used. The statistical analysis is applied to the data from the sample and then the results are inferred to apply to the entire population.

In order to do this inference, additional analysis should be done based upon sampling approach and confidence levels. By confidence level we mean being able to state something about the entire population based upon the sample data with a 90%, 95%, or 99% confidence. The detailed analysis of confidence levels and confidence intervals is addressed in the Hypothesis Testing course.

The desired confidence level and the descriptive statistics of the sample data (mean and standard deviation) can be used to calculate a confidence interval for the location of the total population mean. The confidence interval calculation can be used in reverse to determine the confidence that a particular value is the mean of the total population based upon the sample mean.

Hints & tips

Don’t blindly apply a statistical analysis to your project. If the root cause is obvious, no analysis is needed. If the root cause is not obvious, create a set of hypotheses and based upon the hypotheses and characteristics of the data, apply the one right test for each hypothesis.
Confidence level is not precisely the probability that the population statistics falls within the range associated with that confidence level, but is essentially that.
Most organization use a 95% confidence level and that is the default in Minitab.

00:05 Hi, I'm Ray Sheen.
00:06 During the Analyze stage of a Lean Six Sigma project,
00:09 we will often be conducting statistical analysis.
00:13 Let's cover a few of the principles involved in this analysis.
00:18 Let me start by saying you need to understand your statistics so
00:22 that you don't fool yourself into thinking something is good or bad when it isn't.
00:27 I'll illustrate this point by using a simple statistical measure, the mean or
00:32 average value.
00:33 No one statistics provide everything you need to know about the data set.
00:36 So while the mean is a very important number, and we use it often,
00:40 it still does not tell the whole story.
00:43 That's because statistical analysis works with the data set, not an individual
00:48 data point, but your customer feels their individual data point.
00:52 Their instance is what matters to them, so
00:54 while the average value in a data set may be a problem for most customers.
00:59 You may still have some customers with perfectly acceptable instances and
01:03 vice versa.
01:04 The average may be fine, but
01:06 there're isolated customers who are not satisfied with the process performance.
01:11 Let me illustrate with an example,
01:13 we have two call centers that are answering customers questions.
01:17 The customer calls an stays on hold until a customer service rep speaks with them.
01:21 The average hold time in both call centers is the same, seven minutes.
01:26 Now, that level itself may be unacceptable for some industries or customer groups but
01:30 maybe considered normal performance for others.
01:33 However, let's look in a little more detail at the data.
01:36 In the first call center, customers waited between five and
01:40 eight minutes with an average of seven minutes.
01:42 Depending upon the industry that might mean four angry customers or
01:45 four customers who are enjoying the sound of that beautiful telephone hold music.
01:50 In the second call center, three of the four customers only waited for
01:54 1 minute, but one customer waited for 25 minutes.
01:58 So in that case, there are three customers who are pleased with
02:01 the responsiveness and one customer who is contemplating how to
02:04 create a viral YouTube post complaining about the awful service.
02:08 Average in both cases are the same, but
02:10 the customer experiences are very different.
02:14 So be careful just throwing around statistics,
02:16 make sure you understand what they mean.
02:19 So how should we approach the use of statistical analysis in our problem?
02:23 Well, start with the basic problem-solving elements that we've already discussed.
02:28 Use inductive and deductive reasoning to create a null and alternative hypothesis.
02:33 We will then use statistical analysis to accept or reject the hypothesis.
02:37 Depending upon the structure of your hypothesis and the characteristics of your
02:41 data set, such as whether it is variable or attribute data.
02:44 And how many data sets you have to work with,
02:46 there are a number of statistical tests that you can use.
02:50 Different tests work with different types of data, unfortunately,
02:54 there's no universal test that works with everything.
02:57 That is why we create a separate hypothesis testing course.
03:01 It allows us to focus on each of the statistical tests used
03:04 in Lean Six Sigma analysis.
03:06 And while there's no universal test, there is a universal measure for
03:11 the statistical test analysis used when doing hypothesis testing,
03:16 that universal measure is the P value or probability value.
03:21 This measure tells you the probability that your null hypothesis is true.
03:26 Based upon the size of the P value, you can either accept or
03:29 reject the null hypothesis.
03:32 Now there's a caution here,
03:34 the reason we will be picky about how to write the null and alternative hypothesis,
03:38 is that the P value tells you whether you can reject the null hypothesis or not.
03:43 It does not specifically say that your alternative hypothesis is true.
03:47 A poorly constructed alternative hypothesis
03:50 that is not a true inverse of the null hypothesis may or may not be true.
03:55 But more about all that in the hypothesis testing course.
03:58 For now, it's enough to know that when I write a good hypothesis,
04:02 there're statistical tests that will tell me how to accept or reject them.
04:07 Another important topic to discuss is inferential statistics.
04:12 That means answering the question,
04:14 is the data in your analysis similar to all the problem data?
04:18 Inferential statistics allows us to draw conclusions about all the data in
04:22 a population, even though we only have access to a portion of that data.
04:26 Political posters do this all the time, they do a poll of several hundred or
04:30 thousand people, and use the statistics from that
04:33 data sample to predict the outcome of a national election.
04:37 Occasionally, you'll be lucky enough to have all the data for your problem.
04:41 But most of the time, we only have a portion of a data set,
04:44 and that is why we need to do inferential statistics.
04:48 We can't go back in time and collect data that was never recorded.
04:52 And we may not have access to all the products or customers or locations.
04:56 Now let's be clear, when we analyze a data set,
04:59 that is only getting information about the items represented by that data set.
05:04 But with some additional analysis,
05:06 we can determine to what degree the analysis of that data set can be used
05:11 to infer the characteristics of the full population of instances.
05:17 This additional analysis is based upon sampling rules and confidence intervals.
05:22 The purpose of inferential statistics is to allow us to use sample statistics
05:27 as a surrogate for the entire population statistics.
05:31 Now the statistical value calculated such as a mean is a point, but
05:36 there's some uncertainty is to how close the sample point is to the real point for
05:41 the entire population.
05:42 That uncertainty is described with a confidence Interval.
05:46 Essentially, that is the range around the calculated point in which
05:50 the real point actually exists.
05:53 The size of this range is based upon the characteristics of your sample data,
05:58 and the desired confidence level 90%, 95, 99.
06:02 Although not precisely correct,
06:03 it is essentially correct to say that it is a probability that the actual mean will
06:07 be within the confidence interval around the sample mean.
06:10 I won't go through all the calculations right now we'll do better
06:14 the hypothesis test in course.
06:17 But to use this confidence interval, we need to have the standard deviation
06:21 of the population, which we can derive from the size of the sample and
06:25 the standard deviation of the sample.
06:27 Then we either need to decide what confidence interval to use, most
06:32 people use 95%, or we need to decide what interval width we are comfortable with.
06:37 If we start with a confidence level, we will calculate the confidence interval
06:41 if we start with the interval, we will determine the confidence level.
06:45 The confidence interval will ultimately be described with a point estimate and
06:49 a range around that point estimate.
06:51 The size of the range depends upon the confidence level, and
06:55 is essentially plus or minus the Z value that you see in this table.
07:00 Statistical analysis will allow us to move from guessing the problem to knowing
07:05 the problem.
07:06 And with inferential statistics,
07:08 we can even apply a confidence level to our knowledge of this situation.

Lesson notes are only available for subscribers.

PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.

Statistical Analysis

About this lesson

Exercise files

Quick reference

Statistical Analysis

When to use

Instructions

Inferential Statistics

Hints & tips