Skip to content

ELU (Exponential Linear Unit)

Formula

\[ \mathrm{ELU}(x)= \begin{cases} x, & x>0\\ \alpha(e^x-1), & x\le 0 \end{cases} \]

Plot

fn: (x+abs(x))/2 + (exp((x-abs(x))/2)-1)
xmin: -4
xmax: 4
ymin: -1.2
ymax: 4
height: 280
title: ELU(x) (alpha=1)

Parameters

  • \(x\): scalar input (applied elementwise)
  • \(\alpha>0\): negative-side scale

What it means

ELU behaves like ReLU for positive inputs but uses a smooth exponential curve for negative inputs.

What it's used for

  • Hidden-layer activations as a ReLU alternative.
  • Reducing hard-zero behavior on the negative side.

Key properties

  • Smooth for \(x\neq 0\) and bounded below by \(-\alpha\).
  • Negative outputs can help keep activations closer to zero mean.

Common gotchas

  • More expensive than ReLU due to exponentials.
  • Hyperparameter \(\alpha\) affects behavior and is sometimes left implicit.

Example

With \(\alpha=1\), \(\mathrm{ELU}(-1)=e^{-1}-1\approx -0.632\).

How to Compute (Pseudocode)

Input: tensor/vector x
Output: y = ELU(x) applied elementwise

for each element x_i in x:
  y_i <- x_i if x_i > 0 else alpha * (exp(x_i) - 1)
return y

Complexity

  • Time: \(O(m)\) elementwise operations for \(m\) inputs
  • Space: \(O(m)\) for the output tensor/vector (or \(O(1)\) extra if done in place)
  • Assumptions: Elementwise application over \(m\) scalars; exact constant factors depend on operations like \(\exp\), \(\tanh\), or \(\mathrm{erf}/\Phi\) approximations

See also