Skip to content

Gradient Boosting

Formula

\[ F_m(x)=F_{m-1}(x)+\eta\,h_m(x) \]

Parameters

  • \(F_m\): boosted model after step \(m\)
  • \(h_m\): weak learner fit to residuals/gradients
  • \(\eta\): learning rate

What it means

Gradient boosting builds an additive model by sequentially fitting weak learners to reduce current errors.

What it's used for

  • High-performing tabular models (XGBoost/LightGBM/CatBoost family).
  • Flexible classification and regression.

Key properties

  • Bias can be reduced with enough trees; variance controlled by shrinkage and regularization.
  • Handles nonlinearities and interactions well.

Common gotchas

  • Can overfit if too many deep trees or high learning rate.
  • Careful validation and early stopping are important.

Example

Use shallow trees with a small learning rate and early stopping on validation AUC.

How to Compute (Pseudocode)

Input: training data X, y; rounds M; learning rate eta; weak learner family
Output: boosted model F_M

initialize model F_0(x) (for example, constant baseline)
for m from 1 to M:
  compute pseudo-residuals / gradients from current model F_{m-1}
  fit weak learner h_m to the residual targets
  choose step size (or use eta)
  update F_m(x) <- F_{m-1}(x) + eta * h_m(x)
  optionally monitor validation metric for early stopping

return F_M

Complexity

  • Time: Roughly \(M\) times the cost of fitting one weak learner plus gradient/residual computation (commonly dominated by tree fitting in tree-boosting implementations)
  • Space: Depends on the number of learners \(M\) and learner size, plus training data and gradient/residual buffers
  • Assumptions: Exact complexity depends on weak learner type, tree depth/histogram methods, subsampling, and regularization/early-stopping settings

See also