Skip to content

Linear Regression

Formula

\[ \hat{\beta} = (X^T X)^{-1} X^T y \]
\[ \hat{y} = X\hat{\beta} \]

Parameters

  • \(X\): design matrix
  • \(y\): target vector
  • \(\hat{\beta}\): fitted coefficients

What it means

Fits a linear relationship between features and a continuous target by minimizing squared error.

What it's used for

  • Baseline regression model.
  • Interpretable effect estimates (with assumptions).
  • Feature screening and trend modeling.

Key properties

  • Closed-form least-squares solution when \(X^TX\) is invertible.
  • Coefficients depend on feature scaling and collinearity.

Common gotchas

  • Linear fit can underperform on nonlinear relationships.
  • Extrapolation outside the observed range is risky.
  • Outliers can strongly affect coefficients.

Example

Fit house price as a linear combination of square footage, bedrooms, and age.

How to Compute (Pseudocode)

Input: design matrix X (n x d), targets y
Output: fitted coefficients beta and predictions y_hat

# One common route: least squares / QR solver
beta <- solve_least_squares(X, y)
y_hat <- X beta
return beta, y_hat

Complexity

  • Time: Depends on the solver; dense QR-based fitting is commonly \(O(nd^2)\) for \(n \ge d\), and prediction is \(O(nd)\)
  • Space: Depends on the solver/data representation; dense storage is typically \(O(nd)\) plus model coefficients \(O(d)\)
  • Assumptions: \(n\) samples, \(d\) features; complexity shown for a dense least-squares workflow rather than a specific library implementation

See also