About this lesson
A statistical analysis or test creates a mathematical model to fit the data in the sample. The real world data seldom precisely fits the model. The differences between the model and the actual data is known as residuals. The residuals in any analysis, whether a regression analysis or another statistical analysis, will indicate how well the statistical model fits the data. When the residuals indicate a bad fit, a different analytical approach should be selected. This lesson explains how to read residual graphs and analysis.
Residuals are the difference between the actual data and the predicted data values based upon the Hypothesis test solution. The analysis of the residuals is a way of assessing the validity of the Hypothesis test.
When to use
When the Hypothesis test creates a formula or prediction of the data values, residuals can be calculated. Residuals are created for hypothesis tests that use regression analysis and ANOVA.
Some hypothesis tests form a “best fit” equation to model the system performance based upon the data set. These “best fit” equations should closely approximate the real world. But normally the actual values will be slightly different. When creating the “best fit” the actual values are compared to the predicted value and the difference is a residual. The “best fit” solution is determined by a set of calculations of these residuals. The mean of the residuals must be zero and the absolute value of the sum of the residuals is at a minimum.
The residuals can be plotted and a review of the residuals will provide an assessment of whether the “best fit” plot is truly a good model for the data set. There are several things to consider when reviewing the residuals. The first is whether the residual plot is normal. A valid “best fit” should result in a normal plot. That of course is characterized by a mean of zero. But also, there are approximately the same number of points above and below the line – it is not skewed. And there is a central tendency to the data – meaning appropriate kurtosis. When plotted in a histogram, the residuals should have a bell-shaped curve. When plotted against the normal line, the residuals should fall on the line or very near it.
In addition, the value of the residuals should not be dependent upon the time-wise nature of the process. That means that neither the mean or absolute value are time dependent. In the example shown below the value is time dependent and indicates that the “best fit” equation is missing a term that would capture this effect.
Finally, when considering the residual plot that is either based upon the order of the residuals occurring or the value of the response variable, watch for patterns in the data. Again a strong pattern is an indication that the “best fit” is missing something. The graph below illustrates this point.
When the residual analysis indicates a problem with the “best fit,” the solution is normally to switch to a multivariate solution or a non-linear solution. Both of those approaches introduce additional terms into the “best fit” equations that will account for the observed issues. These topics are discussed more in later lessons.
Hints & tips
- Minitab will create the residuals by selecting the graphs button and choosing which residual graphs to use. I normally select the 3-in-one or 4-in-one views.
Lesson notes are only available for subscribers.