ReLU (Rectified Linear Unit)¶
Formula¶
\[
\mathrm{ReLU}(x)=\max(0,x)
\]
Plot¶
fn: (x+abs(x))/2
xmin: -4
xmax: 4
ymin: -1
ymax: 4
height: 280
title: ReLU(x)
Parameters¶
- \(x\): scalar (applied elementwise to vectors/tensors)
What it means¶
ReLU keeps positive values and clips negative values to zero.
What it's used for¶
- Default hidden-layer activation in many MLPs and CNNs.
- Sparse activations and simple, fast computation.
Key properties¶
- Piecewise linear and non-saturating for \(x>0\).
- Derivative is \(1\) for \(x>0\), \(0\) for \(x<0\) (undefined at 0; set by convention).
Common gotchas¶
- Dead ReLUs: neurons can get stuck outputting zero.
- Not zero-centered, which can affect optimization dynamics.
Example¶
\(\mathrm{ReLU}([-2,0,3])=[0,0,3]\).
How to Compute (Pseudocode)¶
Input: tensor/vector x
Output: y = ReLU(x) applied elementwise
for each element x_i in x:
y_i <- max(0, x_i)
return y
Complexity¶
- Time: \(O(m)\) elementwise operations for \(m\) inputs
- Space: \(O(m)\) for the output tensor/vector (or \(O(1)\) extra if done in place)
- Assumptions: Elementwise application over \(m\) scalars; exact constant factors depend on operations like \(\exp\), \(\tanh\), or \(\mathrm{erf}/\Phi\) approximations