Logistic Regression¶
Plot¶
fn: 1/(1+exp(-x))
xmin: -6
xmax: 6
ymin: -0.05
ymax: 1.05
height: 280
title: Logistic link / sigmoid over score z
Parameters¶
- \(x\): feature vector
- \(w,b\): model parameters
- \(p\): predicted probability of positive class
What it means¶
Models the log-odds of a binary outcome as a linear function of features, then maps to a probability with a sigmoid.
What it's used for¶
- Binary classification with calibrated-ish probabilities (often).
- Interpretable baseline for tabular problems.
- Threshold-based decisions.
Key properties¶
- Decision boundary is linear in feature space.
- Optimized with log loss, not MSE.
Common gotchas¶
- High accuracy can still hide poor calibration or threshold choice.
- Coefficients are in log-odds units, not direct probability changes.
- Needs feature engineering for nonlinear boundaries.
Example¶
Predict churn probability and choose a threshold based on business cost, not just 0.5.
How to Compute (Pseudocode)¶
Input: training data X, labels y, learning settings
Output: parameters w, b and predicted probabilities
initialize w, b
repeat until convergence (or max iterations):
p <- sigmoid(X w + b)
compute gradient of log-loss with respect to w, b
update w, b using an optimizer (for example, gradient descent or LBFGS/Newton-style method)
return w, b
Complexity¶
- Time: Depends on the optimizer; many iterative methods cost roughly \(O(Tnd)\) for \(T\) passes/iterations with dense gradients, plus optimizer-specific overhead
- Space: Typically \(O(nd)\) for dense data plus \(O(d)\) model/gradient state (more for second-order/quasi-Newton methods)
- Assumptions: \(n\) samples, \(d\) features, binary logistic regression; convergence rate and constants depend on regularization, conditioning, and optimizer choice