Skip to content

Maximum A Posteriori (MAP)

Formula

\[ \hat{\theta}_{\text{MAP}}=\arg\max_\theta p(\theta\mid x) =\arg\max_\theta p(x\mid \theta)p(\theta) \]

Parameters

  • \(\theta\): parameters
  • \(x\): observed data
  • \(p(\theta)\): prior
  • \(p(\theta\mid x)\): posterior

What it means

MAP estimation chooses the most probable parameter value after combining data likelihood with a prior.

What it's used for

  • Bayesian-inspired parameter estimation.
  • Regularized optimization (many regularizers correspond to priors).

Key properties

  • Reduces to MLE with a flat/constant prior.
  • Prior influences estimates, especially with limited data.

Common gotchas

  • MAP is a point estimate, not the full posterior.
  • Prior choice can dominate when data are scarce.

Example

Gaussian prior on weights plus Gaussian noise yields a ridge-like objective (L2 regularization).

How to Compute (Pseudocode)

Input: data likelihood p(x|theta) and prior p(theta)
Output: MAP estimate theta_hat

form the log posterior up to a constant: log p(x|theta) + log p(theta)
maximize it over theta (closed form or numerical optimization)
return the maximizing theta_hat

Complexity

  • Time: Depends on posterior objective evaluation and optimization method (same iterative considerations as MLE, plus prior terms)
  • Space: Depends on parameter dimension and optimizer state, plus data storage
  • Assumptions: Posterior normalization constant is not needed for MAP optimization; optimization method determines runtime details

See also