Skip to content

Temperature (Sampling)

Formula

\[ P_i^{(T)}=\frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)} \]

Parameters

  • \(z_i\): logit for token \(i\)
  • \(T>0\): temperature

What it means

Temperature rescales logits before softmax to control randomness in sampling.

What it's used for

  • Adjusting creativity/diversity in generation.
  • Decoding calibration with top-k/top-p sampling.

Key properties

  • Lower \(T\) makes output distribution sharper.
  • Higher \(T\) makes output distribution flatter.

Common gotchas

  • Very low \(T\) can become repetitive.
  • Very high \(T\) can produce incoherent text.

Example

At \(T=0.7\), top tokens become more concentrated than at \(T=1.0\).

How to Compute (Pseudocode)

Input: logits z[1..V], temperature T
Output: sampled token (or adjusted distribution)

scale logits: z'[i] <- z[i] / T
p <- softmax(z')
sample next token from p
return token (or p)

Complexity

  • Time: \(O(V)\) per decoding step for vocabulary size \(V\) (softmax + sampling), excluding model forward-pass cost
  • Space: \(O(V)\) for logits/probabilities at the decoding step
  • Assumptions: One decoding step shown; total generation cost multiplies by generated length and is often dominated by model inference

See also