Temperature (Sampling)¶

Formula¶

\[ P_i^{(T)}=\frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)} \]

Parameters¶

\(z_i\): logit for token \(i\)
\(T>0\): temperature

What it means¶

Temperature rescales logits before softmax to control randomness in sampling.

What it's used for¶

Adjusting creativity/diversity in generation.
Decoding calibration with top-k/top-p sampling.

Key properties¶

Lower \(T\) makes output distribution sharper.
Higher \(T\) makes output distribution flatter.

Common gotchas¶

Very low \(T\) can become repetitive.
Very high \(T\) can produce incoherent text.

Example¶

At \(T=0.7\), top tokens become more concentrated than at \(T=1.0\).

How to Compute (Pseudocode)¶

Input: logits z[1..V], temperature T
Output: sampled token (or adjusted distribution)

scale logits: z'[i] <- z[i] / T
p <- softmax(z')
sample next token from p
return token (or p)

Complexity¶

Time: \(O(V)\) per decoding step for vocabulary size \(V\) (softmax + sampling), excluding model forward-pass cost
Space: \(O(V)\) for logits/probabilities at the decoding step
Assumptions: One decoding step shown; total generation cost multiplies by generated length and is often dominated by model inference

Temperature (Sampling)¶

Formula¶

Parameters¶

What it means¶

What it's used for¶

Key properties¶

Common gotchas¶

Example¶

How to Compute (Pseudocode)¶

Complexity¶

See also¶