Multiple Linear Regression
Under Construction
This calculator is still under development. Some of the functionality may not work as expected. Please use with caution and verify results with other tools.
Calculator
1. Load Your Data
2. Select Variables & Options
Learn More
Multiple Linear Regression
Definition
Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables, assuming a linear relationship. It extends simple linear regression to account for multiple predictors.
Model Equation
Where:
- = dependent variable
- = independent variables
- = intercept
- = regression coefficients
- = error term
Key Formulas:
Sum of Squares:
Where is the predicted value and is the mean
R-squared:
Adjusted R-squared:
Key Assumptions
Linearity: Linear relationship between variables
Independence: Independent residuals
Homoscedasticity: Constant variance of residuals
Normality: Normal distribution of residuals
No Multicollinearity: Independent variables not highly correlated
Practical Example
Step 1: State the Data
Housing prices model:
House | Price (K) | Sqft | Age | Bedrooms |
---|---|---|---|---|
1 | 300 | 1500 | 15 | 3 |
2 | 250 | 1200 | 20 | 2 |
3 | 400 | 2000 | 10 | 4 |
4 | 550 | 2400 | 5 | 4 |
5 | 317 | 1600 | 12 | 3 |
6 | 389 | 1800 | 8 | 3 |
Step 2: Calculate Matrix Operations
Design matrix X:
Coefficients calculation:
Step 3: Model Results
Fitted equation:
- R² = 0.892
- Adjusted R² = 0.821
- F-statistic = 13.78 (p-value = 0.014)
Step 4: Interpretation
- For each additional square foot, price increases by $210
- Each year of age decreases price by $3,150
- Each additional bedroom adds $25,300 to price
- Model explains 89.2% of price variation
Model Diagnostics
Key diagnostic measures:
- VIF (Variance Inflation Factor):
- Residual Standard Error:
Code Examples
R
library(tidyverse)
library(broom)
# Example data
data <- tibble(
price = c(300, 250, 400, 550, 317, 389),
sqft = c(1500, 1200, 2000, 2400, 1600, 1800),
age = c(15, 20, 10, 5, 12, 8),
bedrooms = c(3, 2, 4, 4, 3, 3)
)
# Fit model
model <- lm(price ~ sqft + age + bedrooms, data = data)
# Model summary
tidy(model) # Coefficients
glance(model) # Model statistics
par(mfrow = c(2, 2)) # Arrange plots in a 2x2 grid
plot(model)
# Predictions
new_data <- tibble(
sqft = 1800,
age = 10,
bedrooms = 3
)
pred = predict(model, new_data)
print(str_glue("Predicted price: {pred}"))
Python
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
import statsmodels.api as sm
# Example data
df = pd.DataFrame({
'price': [300, 250, 400, 550, 317, 389],
'sqft': [1500, 1200, 2000, 2400, 1600, 1800],
'age': [15, 20, 10, 5, 12, 8],
'bedrooms': [3, 2, 4, 4, 3, 3]
})
# Fit the model
model = ols('price ~ sqft + age + bedrooms', data=df).fit()
# Print summary
print(model.summary())
# For just coefficients and R-squared
print("Coefficients:")
print(model.params)
print("R-squared:", model.rsquared)
# Predictions
X_new = pd.DataFrame({
'sqft': [1800],
'age': [10],
'bedrooms': [3]
})
predictions = model.predict(X_new)
print("Predicted price:", predictions[0])
Alternative Methods
Consider these alternatives:
- Ridge Regression: For handling multicollinearity
- Lasso Regression: For feature selection
- Polynomial Regression: For non-linear relationships