R is one of the most important languages in the field of data analysis and analytics, and so the multiple linear regression in R carries importance. It defines the case where a single response variable Y is linearly dependent on multiple predictor variables.
What is Multiple Linear Regression?
A technique used for predicting a variable result that depends on two or more variables is a multilinear regression. It is also called multiple regression. It is a linear regression extension. The calculated variable is the dependent variable, which is referred to as independent or informative variables in the variables used to predict the dependent variable meaning.
Multilinear regression allows researchers to assess the model variance and the relative contribution of each independent variable. Multiple regression is of two forms, linear and nonlinear regression.
The general mathematical equation for multiple regression is −
y = b + b1x1 + b2x2 +...bnxn
Description of the parameters used −
- y is the response variable.
- b, b1, b2…bn are the coefficients.
- x1, x2, …xn are the predictor variables.
We create the regression model using the
lm() function in R. The model determines the value of the coefficients using the input data. Next, we can predict the value of the response variable for a given set of predictor variables using these coefficients.
This function creates the relationship model between the predictor and the response variable.
The basic syntax for lm() function in multiple regression is −
> lm(y ~ x1+x2+x3...,data)
Description of the parameters used −
- formula: is a symbol presenting the relationship between the response variable and predictor variables.
- data: is the vector that is used in the formula.
Let’s do Multiple Regression
Loading the data
Consider the data set “freeny” available in the R environment.
In our dataset market potential is the dependent variable whereas rate, income, and revenue are the independent variables. Now let’s see the code to establish the relationship between these variables.
> df <- freeny > head(df) y lag.quarterly.revenue price.index income.level market.potential 1962.25 8.79236 8.79636 4.70997 5.82110 12.9699 1962.5 8.79137 8.79236 4.70217 5.82558 12.9733 1962.75 8.81486 8.79137 4.68944 5.83112 12.9774 1963 8.81301 8.81486 4.68558 5.84046 12.9806 1963.25 8.90751 8.81301 4.64019 5.85036 12.9831 1963.5 8.93673 8.90751 4.62553 5.86464 12.9854 # plotting the data to determine the linearity > plot(df, col="navy", main="Matrix Scatterplot")
Create Relationship Model
As you can see from the above scatter plot we can determine the variables in the database
freeny are in linearity.
# Constructing a model that predicts the market potential using the help of revenue price.index and income.level > model <- lm(market.potential ~ price.index + income.level, data = df) > model Call: lm(formula = market.potential ~ price.index + income.level, data = df) Coefficients: (Intercept) price.index income.level 13.2720 -0.3093 0.1963
The sample code above shows how to build a linear model with two predictors. In this example Price.index and income.level are two
in the same way, predictors used to predict market potential. From the above output, we have determined that the intercept is 13.2720, the
coefficients for rate Index is -0.3093, and the coefficient for income level is 0.1963. Hence the complete regression Equation is market
potential = 13.270 + (-0.3093)* price.index + 0.1963*income level.
> summary(model) Call: lm(formula = market.potential ~ price.index + income.level, data = df) Residuals: Min 1Q Median 3Q Max -0.0101512 -0.0054213 -0.0005416 0.0036681 0.0119975 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.27199 0.29086 45.631 < 2e-16 *** price.index -0.30929 0.02630 -11.758 6.92e-14 *** income.level 0.19631 0.02912 6.741 7.20e-08 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.006491 on 36 degrees of freedom Multiple R-squared: 0.9904, Adjusted R-squared: 0.9899 F-statistic: 1858 on 2 and 36 DF, p-value: < 2.2e-16
Predicting New Values
We can use the regression equation created above to predict the market.potential when a new set of values for price.index and income.level is provided.
For a income.level= 6.555 and price.index = 7.555 the predicted market.potentialis −
Hence, in this article, we have shown how to forecast the value of the dependent variable with the help of two or more independent variables by using the linear multiple regression model. In this case, the initial linearity test was taken into account to satisfy the linearity. Since the variables are linear, with multiple linear regression models we have gone on. Along with the assistance of rate and income indicator variables, we have been able to forecast demand prospects.
This brings the end of this Blog. We really appreciate your time.
Hope you liked it.
Do visit our page www.zigya.com/blog for more informative blogs on Data Science
Keep Reading! Cheers!