Skip to content

Elastic Net

Formula

\[\n\hat{\beta}=\arg\min_\beta \|y-X\beta\|_2^2 + \lambda\big(\alpha\|\beta\|_1+(1-\alpha)\|\beta\|_2^2\big)\n\]

Parameters

  • \(X,y\): data and targets
  • \(\beta\): coefficients
  • \(\lambda\): regularization strength
  • \(\alpha\): L1/L2 mixing weight (elastic net only)

What it means

Combines L1 and L2 penalties to balance sparsity and stability.

What it's used for

  • Correlated high-dimensional features.
  • Regularized linear/logistic models with sparse structure.

Key properties

  • Interpolates between ridge and lasso using \(\alpha\).
  • Often more stable than pure lasso with correlated predictors.

Common gotchas

  • Need to tune both \(\lambda\) and \(\alpha\).
  • Scaling features is still important.

Example

Elastic net is common for text and genomics where predictors are numerous and correlated.

How to Compute (Pseudocode)

Input: design matrix X, targets y, regularization lambda, mixing alpha
Output: elastic-net coefficients beta

initialize beta
repeat until convergence:
  for each coordinate j:
    compute partial residual excluding feature j
    apply coordinate update with L1 soft-thresholding and L2 shrinkage terms

return beta

Complexity

  • Time: Depends on the solver; coordinate-descent implementations are often analyzed similarly to lasso, roughly \(O(Tnd)\) for \(T\) passes on dense data
  • Space: Typically \(O(nd + d)\) for dense data and coefficient/work vectors
  • Assumptions: \(n\) samples, \(d\) features; costs depend on sparsity, convergence tolerance, and whether a regularization path over many \((\lambda, \alpha)\) settings is computed

See also