Skip to content

Partial Dependence Plot (PDP)

Formula

\[ \mathrm{PD}_S(z_S)=\mathbb{E}_{X_C}[f(z_S, X_C)] \]

Plot

fn: 0.6*sin(x)+0.12*x
xmin: -6
xmax: 6
ymin: -1.4
ymax: 1.4
height: 280
title: Example partial dependence shape (illustrative)

Parameters

  • \(S\): feature subset of interest
  • \(C\): complementary feature set
  • \(z_S\): fixed value(s) for features in \(S\)

What it means

A PDP shows the average model prediction as selected feature values vary while averaging over other features.

What it's used for

  • Global effect visualization for tabular models.
  • Exploring nonlinear feature-response shapes.

Key properties

  • Easy to read for one or two features.
  • Averages can hide heterogeneous effects across subgroups.

Common gotchas

  • Can be misleading with strongly correlated features or unrealistic feature combinations.
  • Check ICE plots when interactions matter.

Example

Plot predicted churn risk versus monthly usage while averaging over other customer attributes.

How to Compute (Pseudocode)

Input: trained model f, dataset X, target feature(s) S, grid values G
Output: PDP values over the grid

for each grid value z in G:
  X_copy <- copy of X
  set feature(s) S in every row of X_copy to z
  preds <- model predictions f(X_copy)
  PD[z] <- average(preds)

return {(z, PD[z])}

Complexity

  • Time: \(O(|G|\cdot \mathrm{PredCost}(X))\) for \(|G|\) grid points, where \(\mathrm{PredCost}(X)\) is the batch prediction cost on the dataset
  • Space: Typically \(O(n d)\) if modified dataset copies are materialized (or lower with in-place feature replacement)
  • Assumptions: \(n\) examples, \(d\) features; one-way PDP shown (two-way PDPs multiply grid size)