ELU (Exponential Linear Unit)¶

Formula¶

\[ \mathrm{ELU}(x)= \begin{cases} x, & x>0\\ \alpha(e^x-1), & x\le 0 \end{cases} \]

Plot¶

fn: (x+abs(x))/2 + (exp((x-abs(x))/2)-1)
xmin: -4
xmax: 4
ymin: -1.2
ymax: 4
height: 280
title: ELU(x) (alpha=1)

Parameters¶

\(x\): scalar input (applied elementwise)
\(\alpha>0\): negative-side scale

What it means¶

ELU behaves like ReLU for positive inputs but uses a smooth exponential curve for negative inputs.

What it's used for¶

Hidden-layer activations as a ReLU alternative.
Reducing hard-zero behavior on the negative side.

Key properties¶

Smooth for \(x\neq 0\) and bounded below by \(-\alpha\).
Negative outputs can help keep activations closer to zero mean.

Common gotchas¶

More expensive than ReLU due to exponentials.
Hyperparameter \(\alpha\) affects behavior and is sometimes left implicit.

Example¶

With \(\alpha=1\), \(\mathrm{ELU}(-1)=e^{-1}-1\approx -0.632\).

How to Compute (Pseudocode)¶

Input: tensor/vector x
Output: y = ELU(x) applied elementwise

for each element x_i in x:
  y_i <- x_i if x_i > 0 else alpha * (exp(x_i) - 1)
return y

Complexity¶

Time: \(O(m)\) elementwise operations for \(m\) inputs
Space: \(O(m)\) for the output tensor/vector (or \(O(1)\) extra if done in place)
Assumptions: Elementwise application over \(m\) scalars; exact constant factors depend on operations like \(\exp\), \(\tanh\), or \(\mathrm{erf}/\Phi\) approximations