K-Nearest Neighbors (k-NN)¶

Formula¶

\[ \mathcal{N}_k(x)=\text{k nearest training points to } x \]

\[ \hat y= \begin{cases} \mathrm{mode}\{y_i: x_i\in \mathcal{N}_k(x)\}, & \text{classification}\\ \frac{1}{k}\sum_{x_i\in \mathcal{N}_k(x)} y_i, & \text{regression} \end{cases} \]

Parameters¶

\(x\): query point
\(\mathcal{N}_k(x)\): set of \(k\) nearest neighbors of \(x\)
\(k\): number of neighbors
\(y_i\): label/target of neighbor \(x_i\)

What it means¶

k-NN predicts using the labels/targets of nearby training examples in feature space.

What it's used for¶

Simple baseline for classification and regression.
Nonlinear decision boundaries without explicit parametric training.
Local similarity-based prediction.

Key properties¶

Instance-based (lazy) learner: most computation happens at prediction time.
Strongly depends on distance metric and feature scaling.
Smaller \(k\) lowers bias but increases variance.

Common gotchas¶

k-NN is not a clustering algorithm; it is typically supervised.
Unscaled features can dominate distance calculations.
Prediction can be slow on large datasets without indexing/approximate search.
Class imbalance can bias majority-vote outcomes.

Example¶

To classify a new point, find its 5 nearest labeled examples and predict the majority class among them.

How to Compute (Pseudocode)¶

Input: training set {(x_i, y_i)}, query point x, neighbors k
Output: prediction y_hat

for each training point x_i:
  d[i] <- distance(x, x_i)

idx <- indices of the k smallest distances in d
if classification:
  return majority_vote(y_i for i in idx)
if regression:
  return average(y_i for i in idx)

Complexity¶

Time: Training/store phase is typically \(O(nd)\) just to store data; naive prediction is \(O(nd + n\log n)\) per query (or \(O(nd + n)\) with partial selection)
Space: \(O(nd)\) to store \(n\) training points with \(d\) features (plus labels)
Assumptions: \(n\) training examples, \(d\) features, brute-force neighbor search; tree/ANN indexes change prediction-time complexity and constants