- 720p
- 540p
- 360p

- 0.50x
- 0.75x
- 1.00x
- 1.25x
- 1.50x
- 1.75x
- 2.00x

We hope you enjoyed this lesson.

Cool lesson, huh? Share it with your friends

## About this lesson

Many times there are multiple factors that are influencing the response variable in a problem. Multiple regression determines the relationship between the response factor and multiple control factors. Like with simple linear regression, a formula is created that allows both analysis and prediction of the process and problem.

## Exercise files

Download this lesson’s related exercise files.

Multiple Linear Regression11.5 KB Multiple Linear Regression - Solution

229.7 KB

## Quick reference

### Multiple Linear Regression

Multiple linear regression analysis is the creation of an equation with multiple independent X variables that all influence a Y response variable. This equation is based upon an existing data set and models the conditions represented in the data.

### When to use

When there are multiple independent variables that correlate with the system response, a multiple linear regression should be done. This can be used to predict process performance and identify which factors have the primary impact on process performance.

### Instructions

Multiple linear regression is the appropriate technique to use when the data set has multiple continuous independent input variables and a continuous response variable. The technique determines which variables are statistically significant and creates an equation that shows the relationship of the variables to the response. To improve the accuracy of the analysis, there should be at least ten data points for each independent variable. The equation takes on the form:

Y = a + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{3} + …

Where the absolute value of the “b” coefficients shows the relative importance of each variable.

Multiple linear regression can be used to predict process performance based on the values of the inputs. Input levels for ideal performance can be defined and tolerance levels that ensure acceptable performance can be determined using the regression equation. The equation will also be helpful for setting process controls.

Excel does not have a multiple linear regression function. The analysis can be done in Minitab using the “Fit Regression Model” option in the Regression menu. This will display an input panel where the response variable and input variables can be selected. If the analysis shows a variable is not statistically significant, check the residual plots to see if the result is normal. If not, remove the variable that is not statistically significant and rerun the analysis. The normality of the residuals should be improved.

### Hints & tips

- Too many variables increase uncertainty in the analysis. There should be at least ten data points for each variable (e.g. if using three variables have at least 30 data points).
- Drop variables that are not statistically significant to improve the accuracy of the equation.
- The analysis assumes a linear (straight line) effect. If the residuals indicate a bad fit, you will need to add higher-order terms and create a non-linear analysis. This is discussed in another lesson.
- Always check the residual analysis to ensure it is normally distributed with equal variance and indicates independence.

Lesson notes are only available for subscribers.

PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.