Elastic Net¶
Formula¶
\[\n\hat{\beta}=\arg\min_\beta \|y-X\beta\|_2^2 + \lambda\big(\alpha\|\beta\|_1+(1-\alpha)\|\beta\|_2^2\big)\n\]
Parameters¶
- \(X,y\): data and targets
- \(\beta\): coefficients
- \(\lambda\): regularization strength
- \(\alpha\): L1/L2 mixing weight (elastic net only)
What it means¶
Combines L1 and L2 penalties to balance sparsity and stability.
What it's used for¶
- Correlated high-dimensional features.
- Regularized linear/logistic models with sparse structure.
Key properties¶
- Interpolates between ridge and lasso using \(\alpha\).
- Often more stable than pure lasso with correlated predictors.
Common gotchas¶
- Need to tune both \(\lambda\) and \(\alpha\).
- Scaling features is still important.
Example¶
Elastic net is common for text and genomics where predictors are numerous and correlated.
How to Compute (Pseudocode)¶
Input: design matrix X, targets y, regularization lambda, mixing alpha
Output: elastic-net coefficients beta
initialize beta
repeat until convergence:
for each coordinate j:
compute partial residual excluding feature j
apply coordinate update with L1 soft-thresholding and L2 shrinkage terms
return beta
Complexity¶
- Time: Depends on the solver; coordinate-descent implementations are often analyzed similarly to lasso, roughly \(O(Tnd)\) for \(T\) passes on dense data
- Space: Typically \(O(nd + d)\) for dense data and coefficient/work vectors
- Assumptions: \(n\) samples, \(d\) features; costs depend on sparsity, convergence tolerance, and whether a regularization path over many \((\lambda, \alpha)\) settings is computed