Feature Scaling (Standardization vs Normalization)¶

Formula¶

\[ z = \frac{x-\mu}{\sigma} \]

\[ x' = \frac{x-x_{min}}{x_{max}-x_{min}} \]

Plot¶

fn: x/10
xmin: 0
xmax: 10
ymin: -0.05
ymax: 1.05
height: 280
title: Example min-max scaling to [0,1]

Parameters¶

First formula: standardization (z-score).
Second formula: min-max normalization.

What it means¶

Rescales numeric features so optimization and distance-based methods behave more predictably.

What it's used for¶

Preprocessing for linear models, SVMs, k-NN, k-means, and neural networks.
Comparing coefficient magnitudes more sensibly.

Key properties¶

Standardization centers and scales by spread.
Min-max normalization maps to a bounded range but is sensitive to outliers.

Common gotchas¶

Fit scaling parameters on training data only.
Tree-based models often need less scaling than distance-based models.

Example¶

Standardize income and age before logistic regression so regularization treats them comparably.

How to Compute (Pseudocode)¶

Input: training feature column x and scaling method
Output: fitted scaling parameters and transformed values

if standardization:
  compute mu <- mean(x_train), sigma <- std(x_train)
  transform each value as (x - mu) / sigma
if min-max scaling:
  compute x_min, x_max on training data
  transform each value as (x - x_min) / (x_max - x_min)

store fitted parameters for reuse on validation/test/production data

Complexity¶

Time: \(O(n)\) per numeric feature to fit summary statistics and \(O(n)\) to transform \(n\) values
Space: \(O(1)\) fitted state per feature (plus transformed output storage)
Assumptions: \(n\) values in one feature column; extending to \(d\) features scales roughly linearly in \(d\)