Decision Tree¶

Formula¶

\[ \text{Choose split } s^* = \arg\max_s\ \Delta I(s) \]

\[ \Delta I(s)=I(\text{parent})-\sum_{c\in\{L,R\}}\frac{n_c}{n}I(c) \]

Parameters¶

\(I\): impurity (e.g., Gini/entropy/MSE)
\(s\): candidate split

What it means¶

A decision tree recursively partitions feature space into regions with simple prediction rules.

What it's used for¶

Interpretable tabular modeling.
Capturing nonlinear interactions without feature scaling.

Key properties¶

Handles mixed feature types with preprocessing choices.
Can overfit without depth/min-samples constraints.

Common gotchas¶

Small data changes can produce very different trees.
Greedy splitting is not globally optimal.

Example¶

A tree might first split on income, then on account age to predict default risk.

How to Compute (Pseudocode)¶

Input: training data X, targets y, stopping rules
Output: decision tree

function BUILD_NODE(data):
  if stopping_rule_met(data):
    return leaf(prediction_from(data))

  evaluate candidate splits and choose best split s*
  split data into left_data, right_data using s*

  node <- internal_node(split=s*)
  node.left <- BUILD_NODE(left_data)
  node.right <- BUILD_NODE(right_data)
  return node

return BUILD_NODE((X, y))

Complexity¶

Time: Depends on the split search strategy and implementation; a common training-time bound is roughly \(O(n d \log n)\) to \(O(n d n_{levels})\) for greedy tree growth on tabular data
Space: Typically \(O(n d)\) for data storage plus tree structure and recursion/work buffers
Assumptions: \(n\) samples, \(d\) features; exact split search, feature types, and sorting reuse strongly affect practical complexity