Skip to content

Feature Scaling (Standardization vs Normalization)

Formula

\[ z = \frac{x-\mu}{\sigma} \]
\[ x' = \frac{x-x_{min}}{x_{max}-x_{min}} \]

Plot

fn: x/10
xmin: 0
xmax: 10
ymin: -0.05
ymax: 1.05
height: 280
title: Example min-max scaling to [0,1]

Parameters

  • First formula: standardization (z-score).
  • Second formula: min-max normalization.

What it means

Rescales numeric features so optimization and distance-based methods behave more predictably.

What it's used for

  • Preprocessing for linear models, SVMs, k-NN, k-means, and neural networks.
  • Comparing coefficient magnitudes more sensibly.

Key properties

  • Standardization centers and scales by spread.
  • Min-max normalization maps to a bounded range but is sensitive to outliers.

Common gotchas

  • Fit scaling parameters on training data only.
  • Tree-based models often need less scaling than distance-based models.

Example

Standardize income and age before logistic regression so regularization treats them comparably.

How to Compute (Pseudocode)

Input: training feature column x and scaling method
Output: fitted scaling parameters and transformed values

if standardization:
  compute mu <- mean(x_train), sigma <- std(x_train)
  transform each value as (x - mu) / sigma
if min-max scaling:
  compute x_min, x_max on training data
  transform each value as (x - x_min) / (x_max - x_min)

store fitted parameters for reuse on validation/test/production data

Complexity

  • Time: \(O(n)\) per numeric feature to fit summary statistics and \(O(n)\) to transform \(n\) values
  • Space: \(O(1)\) fitted state per feature (plus transformed output storage)
  • Assumptions: \(n\) values in one feature column; extending to \(d\) features scales roughly linearly in \(d\)

See also