## About this lesson

## Exercise files

Download this lesson’s related exercise files.

Multiple Linear Regression.xlsx11.4 KB Multiple Linear Regression - Solution.docx

373.2 KB

## Quick reference

### Multiple Linear Regression

Multiple linear regression analysis is the creation of an equation with multiple independent X variables that all influence a Y response variable. This equation is based upon the data set and models the conditions represented in the data.

### When to use

When there are multiple independent variables that correlate with the system response, a multiple linear regression should be done. This can be used to predict process performance and identify which factors have the primary impact on process performance.

### Instructions

Multiple linear regression is the appropriate technique to use when the data set has multiple continuous independent input variables and a continuous response variable. The technique determines which variables are statistically significant and creates an equation that shows the relationship of the variables to the response. To improve the accuracy of the analysis, there should be at least ten data points for each independent variable. The equation takes on the form:

- Y = β0 + β1X1 + β2X2 + β3X3 + …

Where the beta coefficients show the relative importance of each variable.

Multiple linear regression can be used to predict process performance based upon the values of the inputs. Input levels for ideal performance can be defined and tolerance levels that ensure acceptable performance can be determined using the regression equation. The equation will also be helpful for setting process controls.

Excel does not have a multiple linear regression function. The analysis can be done in Minitab using the “Fit Regression Model” option in the Regression menu. This will display an input panel where the response variable and input variables can be selected. If the analysis shows a variable is not statistically significant, check the residual plots to see if the result is normal. If not, remove the variable that is not statistically significant and rerun the analysis. The normality of the residuals should be improved.

### Hints & tips

- Too many variables increases uncertainty in the analysis. There should be at least ten data points for each variable. Drop variables that are not statistically significant to improve the accuracy of the equation.
- The analysis assumes a linear (straight line) effect. If the residuals indicate a bad fit, you will need to add higher order terms and create a non-linear analysis. This is discussed in another lesson.

Lesson notes are only available for subscribers.