K-Nearest Neighbors (k-NN)¶
Formula¶
\[
\mathcal{N}_k(x)=\text{k nearest training points to } x
\]
\[
\hat y=
\begin{cases}
\mathrm{mode}\{y_i: x_i\in \mathcal{N}_k(x)\}, & \text{classification}\\
\frac{1}{k}\sum_{x_i\in \mathcal{N}_k(x)} y_i, & \text{regression}
\end{cases}
\]
Parameters¶
- \(x\): query point
- \(\mathcal{N}_k(x)\): set of \(k\) nearest neighbors of \(x\)
- \(k\): number of neighbors
- \(y_i\): label/target of neighbor \(x_i\)
What it means¶
k-NN predicts using the labels/targets of nearby training examples in feature space.
What it's used for¶
- Simple baseline for classification and regression.
- Nonlinear decision boundaries without explicit parametric training.
- Local similarity-based prediction.
Key properties¶
- Instance-based (lazy) learner: most computation happens at prediction time.
- Strongly depends on distance metric and feature scaling.
- Smaller \(k\) lowers bias but increases variance.
Common gotchas¶
- k-NN is not a clustering algorithm; it is typically supervised.
- Unscaled features can dominate distance calculations.
- Prediction can be slow on large datasets without indexing/approximate search.
- Class imbalance can bias majority-vote outcomes.
Example¶
To classify a new point, find its 5 nearest labeled examples and predict the majority class among them.
How to Compute (Pseudocode)¶
Input: training set {(x_i, y_i)}, query point x, neighbors k
Output: prediction y_hat
for each training point x_i:
d[i] <- distance(x, x_i)
idx <- indices of the k smallest distances in d
if classification:
return majority_vote(y_i for i in idx)
if regression:
return average(y_i for i in idx)
Complexity¶
- Time: Training/store phase is typically \(O(nd)\) just to store data; naive prediction is \(O(nd + n\log n)\) per query (or \(O(nd + n)\) with partial selection)
- Space: \(O(nd)\) to store \(n\) training points with \(d\) features (plus labels)
- Assumptions: \(n\) training examples, \(d\) features, brute-force neighbor search; tree/ANN indexes change prediction-time complexity and constants