Leaky ReLU¶
Formula¶
\[
f(x)=\max(\alpha x, x),\quad \alpha \in (0,1)
\]
Plot¶
fn: 0.55*x + 0.45*abs(x)
xmin: -4
xmax: 4
ymin: -1
ymax: 4
height: 280
title: Leaky ReLU (alpha=0.1)
Parameters¶
- \(x\): scalar input (applied elementwise)
- \(\alpha\): small negative slope (e.g., \(0.01\))
What it means¶
Like ReLU, but negative inputs keep a small nonzero slope instead of being fully zeroed out.
What it's used for¶
- Reducing dead-neuron issues from standard ReLU.
- Hidden activations in feedforward and convolutional networks.
Key properties¶
- Piecewise linear.
- Gradient on negative side is \(\alpha\), not 0.
Common gotchas¶
- \(\alpha\) is a hyperparameter unless using PReLU (learned slope).
- Still not smooth at \(x=0\).
Example¶
With \(\alpha=0.01\), \(f([-2,3])=[-0.02,3]\).
How to Compute (Pseudocode)¶
Input: tensor/vector x
Output: y = LeakyReLU(x) applied elementwise
for each element x_i in x:
y_i <- max(alpha * x_i, x_i)
return y
Complexity¶
- Time: \(O(m)\) elementwise operations for \(m\) inputs
- Space: \(O(m)\) for the output tensor/vector (or \(O(1)\) extra if done in place)
- Assumptions: Elementwise application over \(m\) scalars; exact constant factors depend on operations like \(\exp\), \(\tanh\), or \(\mathrm{erf}/\Phi\) approximations