Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
The P Value.docx61 KB The P Value - Solution.docx
59.7 KB
Quick reference
The P Value
The P value is the universal measure that is used in hypothesis testing to determine whether to reject the Null hypothesis or fail to reject it. Each hypothesis test will calculate the appropriate P value based upon the test statistic.
When to use
P Values are associated with all hypothesis tests. If you are doing hypothesis testing, you will be using a P value.
Instructions
The P Value is the probability that differences in your data are due to purely random effects. The P Value represents how well the statistical analysis supports the Null Hypothesis. Each hypothesis test will calculate the P value based upon the characteristics of the test calculations. In addition, the P value for a test can change based upon the size of the confidence interval, so the sample size and same standard deviation will impact the P value.
The P Value can range from 0 to 1.0. A P value threshold is set based upon the confidence level. The P threshold value is normally set to the alpha value. In most Lean Six Sigma projects the P threshold value is .05. Whenever the P value from the test is less than .05, the Null hypothesis is rejected. If the P value is greater than .05 the decision is to fail to reject the Null hypothesis.
The P value is the universal measure used with Lean Six Sigma hypothesis testing. Each test will calculate a P value based upon the results of the test. However, keep in mind, the P value is testing the Null hypothesis. The Null and Alternative hypotheses must be well-written for a Null hypothesis rejection to lead to an Alternative hypothesis acceptance.
Hints & tips
- The P value will always be based upon the Null hypothesis and the Null hypothesis is always supporting the position that there is nothing unusual about the data or samples.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 In this session I'll introduce the P Value.
- 00:09 This is the most universally used statistic in hypothesis testing.
- 00:13 It will be the P Value which will tell us whether or
- 00:16 not we should reject the null hypothesis, or fail to reject it.
- 00:20 So let's start with background on the P value.
- 00:24 The P value statistic is used with inferential statistics.
- 00:28 Remember that with inferential statistics,
- 00:30 we are inferring a statistical conclusion about an entire population of data
- 00:34 based upon the analysis of only a subset of that data.
- 00:38 The P value tests that assumption.
- 00:41 The P value is the probability that any difference we see between the data sets
- 00:46 that are being analyzed is due to strictly random events and
- 00:50 not due to the alternative hypothesis.
- 00:53 In other words, the P Value tells us whether the statistics
- 00:56 are supporting the Null Hypothesis.
- 00:59 Is there a statistics significant relationship between the data sets?
- 01:03 And the way the P value is calculated, a high P value says that the null hypothesis
- 01:07 should be accepted but a low P value says that the null must be rejected.
- 01:15 So let's look at the P value and the level it can take on.
- 01:18 The P Value is based upon the statistical analysis
- 01:21 of the data sets that are being analyzed as part of the hypothesis test.
- 01:25 Different test will analyze different aspects of the data set.
- 01:29 The test may be focused on the mean or the median of the data.
- 01:32 The test may be using the standard deviation or
- 01:34 the variance, which is just the standard deviation squared.
- 01:38 And the test will often take into consideration the number of data points in
- 01:41 the sample.
- 01:42 Whatever the nature of the statistical value,
- 01:44 the P value is calculated by comparing the data sets to determine
- 01:48 that the difference is statistically significant.
- 01:51 Of course, there are always differences between the data sets.
- 01:54 There's always some level of inherent common cause variation that's occurring.
- 01:58 This P Value represents the probability that this difference is due to random
- 02:02 chance variation, as compared to it being due to a fundamental difference
- 02:07 in the processes that the data sets represent.
- 02:10 The P value is a probability number, so that value is always between zero and one.
- 02:16 Essentially, between 0% and 100%.
- 02:20 The P value is also related to our confidence interval.
- 02:23 Since we normally use the .95 confidence interval and Lean Six Sigma projects, a P
- 02:29 value of .05 is our threshold for whether to accept or reject the Null Hypothesis.
- 02:35 Another way of saying this is that the P value threshold
- 02:38 is the alpha value in the confidence level equation.
- 02:42 While this next statement is not precisely true, it is essentially true.
- 02:47 A P value of 0.05 says that there is less than a 5% chance
- 02:51 that if I reject the null hypothesis I made the wrong decision.
- 02:55 Let's run through a few examples to illustrate how the P Value works.
- 02:59 We'll start with the null hypothesis that says that the difference in the median
- 03:03 household income in the different counties of Florida is due to random chance.
- 03:08 And the alternative hypothesis is that the differences are not due to random chance,
- 03:13 and therefore the median household income across the counties of Florida
- 03:16 is not normally distributed.
- 03:18 Now, to check for these hypotheses is to look at the median values of income across
- 03:23 Florida and determine if the resulting distribution is normal.
- 03:26 So that is what I did, and you can see the plot of that distribution on this graph.
- 03:31 A normal curve should have a high peak in the center and
- 03:34 be reducing down to a very low value at the upper-lower level.
- 03:38 That is not what the distribution shows.
- 03:40 And in fact, there's a P Value metric that was calculated by my statistical software,
- 03:45 it this case Minitab, that is a value of 0.016.
- 03:50 This is less than our threshold of 0.05, so reject the null hypothesis that
- 03:54 said the median income in Florida was randomly distributed.
- 03:59 Now let's look at another example.
- 04:01 In this case I wanna compare the median income between Florida and
- 04:05 Louisiana to see if they are the same.
- 04:07 So my null hypothesis is that there's no statistical difference between the median
- 04:11 income in Florida and Louisiana.
- 04:14 My alternative hypothesis is that the median incomes are not the same.
- 04:18 I could've been even more specific and
- 04:20 said that the median income of Florida is higher than Louisiana.
- 04:23 Now, if I find that there is a difference, I will check to see which one is higher.
- 04:28 You can see in the results that Florida is definitely higher than Louisiana.
- 04:32 The Florida median income in 2000 was $42,047, and
- 04:36 the Louisiana median income was only $39,823.
- 04:42 Well that sounds like a lot.
- 04:43 But when we check the P Value,
- 04:45 it turns out that the difference is not statistically significant.
- 04:49 The P value is 0.099, which is greater than 0.05,
- 04:54 so we do not reject the null hypothesis.
- 04:57 For our purposes, we can consider the median income between these two states
- 05:02 to be in the same data set.
- 05:05 The P Value is a convenient metric to use
- 05:08 to determine whether the observed differences are statistically significant.
- 05:12 It is with the P value that we either reject the null hypothesis or
- 05:16 fail to reject it.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.