Gradient Boosting¶

Formula¶

\[ F_m(x)=F_{m-1}(x)+\eta\,h_m(x) \]

Parameters¶

\(F_m\): boosted model after step \(m\)
\(h_m\): weak learner fit to residuals/gradients
\(\eta\): learning rate

What it means¶

Gradient boosting builds an additive model by sequentially fitting weak learners to reduce current errors.

What it's used for¶

High-performing tabular models (XGBoost/LightGBM/CatBoost family).
Flexible classification and regression.

Key properties¶

Bias can be reduced with enough trees; variance controlled by shrinkage and regularization.
Handles nonlinearities and interactions well.

Common gotchas¶

Can overfit if too many deep trees or high learning rate.
Careful validation and early stopping are important.

Example¶

Use shallow trees with a small learning rate and early stopping on validation AUC.

How to Compute (Pseudocode)¶

Input: training data X, y; rounds M; learning rate eta; weak learner family
Output: boosted model F_M

initialize model F_0(x) (for example, constant baseline)
for m from 1 to M:
  compute pseudo-residuals / gradients from current model F_{m-1}
  fit weak learner h_m to the residual targets
  choose step size (or use eta)
  update F_m(x) <- F_{m-1}(x) + eta * h_m(x)
  optionally monitor validation metric for early stopping

return F_M

Complexity¶

Time: Roughly \(M\) times the cost of fitting one weak learner plus gradient/residual computation (commonly dominated by tree fitting in tree-boosting implementations)
Space: Depends on the number of learners \(M\) and learner size, plus training data and gradient/residual buffers
Assumptions: Exact complexity depends on weak learner type, tree depth/histogram methods, subsampling, and regularization/early-stopping settings