StatsCalculators.com

Multiple Linear Regression

Calculator

1. Load Your Data

2. Select Variables & Options

Learn More

Multiple Linear Regression

Definition

Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables, assuming a linear relationship. It extends simple linear regression to account for multiple predictors.

Model Equation

y=β0+β1x1+β2x2+...+βkxk+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + \epsilon

Where:

  • yy = dependent variable
  • xix_i = independent variables
  • β0\beta_0 = intercept
  • βi\beta_i = regression coefficients
  • ϵ\epsilon = error term

Key Formulas:

Sum of Squares:

SST=(yiyˉ)2SST = \sum(y_i - \bar{y})^2SSR=(y^iyˉ)2SSR = \sum(\hat{y}_i - \bar{y})^2SSE=(yiy^i)2SSE = \sum(y_i - \hat{y}_i)^2

Where y^i\hat{y}_i is the predicted value and yˉ\bar{y} is the mean

R-squared:

R2=SSRSST=1SSESSTR^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}

Adjusted R-squared:

Radj2=1(1R2)n1nk1R^2_{adj} = 1 - (1-R^2)\frac{n-1}{n-k-1}

Key Assumptions

Linearity: Linear relationship between variables
Independence: Independent residuals
Homoscedasticity: Constant variance of residuals
Normality: Normal distribution of residuals
No Multicollinearity: Independent variables not highly correlated

Practical Example

Step 1: State the Data

Housing prices model:

HousePrice (K)SqftAgeBedrooms
13001500153
22501200202
34002000104
4550240054
53171600123
6389180083
Step 2: Calculate Matrix Operations

Design matrix X:

X=[11500153112002021180083]\mathbf{X} = \begin{bmatrix} 1 & 1500 & 15 & 3 \\ 1 & 1200 & 20 & 2 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1800 & 8 & 3 \end{bmatrix}

Coefficients calculation:

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
Step 3: Model Results

Fitted equation:

y^=50.2+0.21xsqft3.15xage+25.3xbedrooms\hat{y} = -50.2 + 0.21x_{sqft} - 3.15x_{age} + 25.3x_{bedrooms}
  • R² = 0.892
  • Adjusted R² = 0.821
  • F-statistic = 13.78 (p-value = 0.014)
Step 4: Interpretation
  • For each additional square foot, price increases by $210
  • Each year of age decreases price by $3,150
  • Each additional bedroom adds $25,300 to price
  • Model explains 89.2% of price variation

Model Diagnostics

Key diagnostic measures:

  • VIF (Variance Inflation Factor):
    VIFj=11Rj2VIF_j = \frac{1}{1-R^2_j}
  • Residual Standard Error:
    RSE=SSEnk1RSE = \sqrt{\frac{SSE}{n-k-1}}

Code Examples

R
library(tidyverse)
library(broom)

# Example data
data <- tibble(
  price = c(300, 250, 400, 550, 317, 389),
  sqft = c(1500, 1200, 2000, 2400, 1600, 1800),
  age = c(15, 20, 10, 5, 12, 8),
  bedrooms = c(3, 2, 4, 4, 3, 3)
)

# Fit model
model <- lm(price ~ sqft + age + bedrooms, data = data)

# Model summary
tidy(model)      # Coefficients
glance(model)    # Model statistics

par(mfrow = c(2, 2))  # Arrange plots in a 2x2 grid
plot(model)

# Predictions
new_data <- tibble(
  sqft = 1800,
  age = 10,
  bedrooms = 3
)
pred = predict(model, new_data)
print(str_glue("Predicted price: {pred}"))
Python
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
import statsmodels.api as sm

# Example data
df = pd.DataFrame({
    'price': [300, 250, 400, 550, 317, 389],
    'sqft': [1500, 1200, 2000, 2400, 1600, 1800],
    'age': [15, 20, 10, 5, 12, 8],
    'bedrooms': [3, 2, 4, 4, 3, 3]
})

# Fit the model
model = ols('price ~ sqft + age + bedrooms', data=df).fit()

# Print summary
print(model.summary())

# For just coefficients and R-squared
print("Coefficients:")
print(model.params)
print("R-squared:", model.rsquared)

# Predictions
X_new = pd.DataFrame({
    'sqft': [1800],
    'age': [10],
    'bedrooms': [3]
})
predictions = model.predict(X_new)
print("Predicted price:", predictions[0])

Alternative Methods

Consider these alternatives:

  • Ridge Regression: For handling multicollinearity
  • Lasso Regression: For feature selection
  • Polynomial Regression: For non-linear relationships

Related Calculators