Multiple Linear Regression Definition
A Multiple linear regression (MLR) is a statistical technique, usually multivariate, which is used in examining the relationship between the explanatory and response variables. MLR examines and explains the interconnectedness or correlations between two or more variables. Many explanatory variables are adopted hy MLR in a bid to envision the outcome of a response variable.
The model for multiple linear regression is; yi = B0 + B1xi1 + B2xi2 + … + Bpxip + E.
MLR models and explains the relationship between two or more variables (explanatory variables and a response variable).
A Little More on What is Multiple Linear Regression
A multiple linear regression differs from a simple linear regression. A simple linear regression predicts and explains one variable using the information known about another variable. A multiple linear regression on the other hand is a statistical technique that determines the relationship among a number of random variables. More than two variables must be present before a multiple linear regression is used.
MLR is used in explaining the correlation between one continuous dependent variable and two or more independent variables or (two continuous variables – an independent variable and a dependent variable). Basically, MLR examines how independent variables are related to one dependent variable.
When used for an observation, the model for MLR is:
yi = B0 + B1xi1 + B2xi2 + … + Bpxip + E. Below is a breakdown of the items in the model;
Yi =dependent variable
Xi =explanatory variables or independent variables
Β0 =y-intercept (constant term)
Βp =slope coefficients for each explanatory variable
ϵ=the model’s error term (also known as the residuals)
Xi1 – Xi4 are all independent variables.
When a multiple linear regression is used as a technique for explaining the relationship between explanatory and response variables, each independent variable or explanatory variable is differentiated with a number, 1,2, 3, 4.
There are some assumptions that inform the usage of the multiple linear regression (MLR) model, these assumptions include the following;
- A relationship exists between independent variables and dependent variables, which are otherwise called explanatory variables and response variables.
- That explanatory variables are not highly correlated with one another as the case may be.
- That residuals should be normally distributed with a mean of 0 and variance σ.
- That observations under yi are selected independently and randomly from the population
Also, R-squared or R2, which is the coefficient of determination is a statistical metric that measures how much variation in an outcome is explainable by the variation in the independent variables.
R2 always increases as more predictors are added to the MLR model even if they have no connection with the outcome variable, hence R2 is not an appropriate metric to identify the predictors that should be included or excluded in a model.
References for Multiple Regression Analysis