Skip to content

Data Science Field Guide

Partial Dependence Plot (PDP)

Partial Dependence Plot (PDP)¶

Formula¶

\[ \mathrm{PD}_S(z_S)=\mathbb{E}_{X_C}[f(z_S, X_C)] \]

Plot¶

fn: 0.6*sin(x)+0.12*x
xmin: -6
xmax: 6
ymin: -1.4
ymax: 1.4
height: 280
title: Example partial dependence shape (illustrative)

Parameters¶

\(S\): feature subset of interest
\(C\): complementary feature set
\(z_S\): fixed value(s) for features in \(S\)

What it means¶

A PDP shows the average model prediction as selected feature values vary while averaging over other features.

What it's used for¶

Global effect visualization for tabular models.
Exploring nonlinear feature-response shapes.

Key properties¶

Easy to read for one or two features.
Averages can hide heterogeneous effects across subgroups.

Common gotchas¶

Can be misleading with strongly correlated features or unrealistic feature combinations.
Check ICE plots when interactions matter.

Example¶

Plot predicted churn risk versus monthly usage while averaging over other customer attributes.

How to Compute (Pseudocode)¶

Input: trained model f, dataset X, target feature(s) S, grid values G
Output: PDP values over the grid

for each grid value z in G:
  X_copy <- copy of X
  set feature(s) S in every row of X_copy to z
  preds <- model predictions f(X_copy)
  PD[z] <- average(preds)

return {(z, PD[z])}

Complexity¶

Time: \(O(|G|\cdot \mathrm{PredCost}(X))\) for \(|G|\) grid points, where \(\mathrm{PredCost}(X)\) is the batch prediction cost on the dataset
Space: Typically \(O(n d)\) if modified dataset copies are materialized (or lower with in-place feature replacement)
Assumptions: \(n\) examples, \(d\) features; one-way PDP shown (two-way PDPs multiply grid size)