Temperature (Sampling)¶
Formula¶
\[
P_i^{(T)}=\frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)}
\]
Parameters¶
- \(z_i\): logit for token \(i\)
- \(T>0\): temperature
What it means¶
Temperature rescales logits before softmax to control randomness in sampling.
What it's used for¶
- Adjusting creativity/diversity in generation.
- Decoding calibration with top-k/top-p sampling.
Key properties¶
- Lower \(T\) makes output distribution sharper.
- Higher \(T\) makes output distribution flatter.
Common gotchas¶
- Very low \(T\) can become repetitive.
- Very high \(T\) can produce incoherent text.
Example¶
At \(T=0.7\), top tokens become more concentrated than at \(T=1.0\).
How to Compute (Pseudocode)¶
Input: logits z[1..V], temperature T
Output: sampled token (or adjusted distribution)
scale logits: z'[i] <- z[i] / T
p <- softmax(z')
sample next token from p
return token (or p)
Complexity¶
- Time: \(O(V)\) per decoding step for vocabulary size \(V\) (softmax + sampling), excluding model forward-pass cost
- Space: \(O(V)\) for logits/probabilities at the decoding step
- Assumptions: One decoding step shown; total generation cost multiplies by generated length and is often dominated by model inference