GBDT Case Study

GBDT (Gradient Boosting Decision Tree) is an ensemble learning method belonging to the Boosting family. It combines multiple weak learners (typically decision trees) by iteratively stacking them, continuously optimizing the prediction error of the previous model to form a strong learner. Commonly used for classification and regression tasks.

The core idea of GBDT:

  • Initialize a model, typically predicting a constant (e.g., the mean).
  • Calculate residuals (the difference between true values and current model predictions).
  • Train a decision tree to fit these residuals.
  • Add the new tree’s output to the original model (with a learning rate coefficient).
  • Repeat the above steps until the number of iterations reaches the set value or the error is small enough.

Data Introduction

Based on the following data, predict the height of the last sample.

Model Training

Set parameters:

  • Learning rate: learning_rate = 0.1
  • Number of iterations: n_trees = 5
  • Tree depth: max_depth = 3

Start training

Initialize the weak learner:

The loss function is squared loss. Since squared loss is a convex function, taking the derivative and setting it to zero gives c.

Setting the derivative to zero:

So during initialization, c takes the mean of all training sample labels: c = (1.1 + 1.3 + 1.7 + 1.8) / 4 = 1.475

The initial learner is:

f0(x) = c = 1.475

For iteration rounds m = 1, 2, …, M:

Since we set n_trees = 5, here M = 5. Calculate the negative gradient. Since the loss function is squared loss, the negative gradient equals the residual.

The residuals are used as the true values to train the weak learner f1(x).

Next, find the best split node for the regression tree by traversing each possible value of each feature. Calculate the squared error (SE) for the two groups after splitting, and find the split node that minimizes SEsum = SEL + SER.

For example: using age 21 as the split node, samples younger than 21 go to the left node, samples older than 21 go to the right node.

Repeat this step until m > 5, generating 5 trees in total.

The final strong learner is:

Predicting a Sample

  • f0(x) = 1.475
  • In f1(x): sample 4’s age is 25, greater than split node 21 but less than 30, so predicted as 0.2250
  • In f2(x): sample 4 is predicted as 0.2025
  • In f3(x): sample 4 is predicted as 0.1823
  • In f4(x): sample 4 is predicted as 0.1640
  • In f5(x): sample 4 is predicted as 0.1476

Final prediction result:

f(x) = 1.475 + 0.1 × (0.225 + 0.2025 + 0.1823 + 0.164 + 0.1476) = 1.56714