Tanh (Hyperbolic Tangent)¶
Formula¶
\[
\tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}
\]
Plot¶
fn: tanh(x)
xmin: -4
xmax: 4
ymin: -1.2
ymax: 1.2
height: 280
title: tanh(x)
Parameters¶
- \(x\): scalar input (applied elementwise)
What it means¶
Tanh maps real inputs to \((-1,1)\), giving a zero-centered bounded activation.
What it's used for¶
- Hidden activations in older MLP/RNN setups.
- State updates in recurrent models.
Key properties¶
- Smooth, monotonic, odd function.
- Zero-centered output unlike sigmoid.
Common gotchas¶
- Saturates for large \(|x|\), leading to vanishing gradients.
- Often outperformed by ReLU-family activations in deep feedforward networks.
Example¶
\(\tanh(0)=0\), \(\tanh(2)\approx 0.964\).
How to Compute (Pseudocode)¶
Input: tensor/vector x
Output: y = tanh(x) applied elementwise
for each element x_i in x:
y_i <- tanh(x_i)
return y
Complexity¶
- Time: \(O(m)\) elementwise operations for \(m\) inputs
- Space: \(O(m)\) for the output tensor/vector (or \(O(1)\) extra if done in place)
- Assumptions: Elementwise application over \(m\) scalars; exact constant factors depend on operations like \(\exp\), \(\tanh\), or \(\mathrm{erf}/\Phi\) approximations