Maximum A Posteriori (MAP)¶

Formula¶

\[ \hat{\theta}_{\text{MAP}}=\arg\max_\theta p(\theta\mid x) =\arg\max_\theta p(x\mid \theta)p(\theta) \]

Parameters¶

\(\theta\): parameters
\(x\): observed data
\(p(\theta)\): prior
\(p(\theta\mid x)\): posterior

What it means¶

MAP estimation chooses the most probable parameter value after combining data likelihood with a prior.

What it's used for¶

Bayesian-inspired parameter estimation.
Regularized optimization (many regularizers correspond to priors).

Key properties¶

Reduces to MLE with a flat/constant prior.
Prior influences estimates, especially with limited data.

Common gotchas¶

MAP is a point estimate, not the full posterior.
Prior choice can dominate when data are scarce.

Example¶

Gaussian prior on weights plus Gaussian noise yields a ridge-like objective (L2 regularization).

How to Compute (Pseudocode)¶

Input: data likelihood p(x|theta) and prior p(theta)
Output: MAP estimate theta_hat

form the log posterior up to a constant: log p(x|theta) + log p(theta)
maximize it over theta (closed form or numerical optimization)
return the maximizing theta_hat

Complexity¶

Time: Depends on posterior objective evaluation and optimization method (same iterative considerations as MLE, plus prior terms)
Space: Depends on parameter dimension and optimizer state, plus data storage
Assumptions: Posterior normalization constant is not needed for MAP optimization; optimization method determines runtime details