Decision Tree Regression is a versatile machine learning algorithm used for predicting a continuous quantity. Unlike its counterpart used for classification tasks, the regression decision tree aims to predict a quantitative response. This documentation provides an in-depth look at the decision tree regression algorithm, emphasizing various criteria used for splitting nodes. We cover its theoretical background, mathematical formulations, procedural steps, applications, strengths, limitations, and advanced topics.
Decision tree regression operates by splitting the data into distinct subsets based on certain criteria. The tree is built by splitting the dataset into branches, which represent decisions or conditions leading to different outcomes. In the context of regression, the decision at each node is made with the goal of reducing variance within each node, leading to a prediction that is as accurate as possible.
For regression tasks, decision trees primarily use variance reduction as the criterion for splitting. The goal is to find the feature and threshold that result in the highest decrease in variance for the target variable among the resulting subsets. The most commonly used criteria for regression trees are:
Mathematically, for a given node , let be the set of samples at that node. The variance before the split is given by:
where is the number of samples in node , is the target value of sample , and is the mean target value in .
The improvement in variance, or variance reduction, for a split that divides into two subsets and is given by:
The goal is to maximize .
max_depth
: int
, default = 10min_samples_split
: int
, default = 2min_samples_leaf
: int
, default = 1max_features
: int
, default = Nonemin_variance_decrease
: float
, default = 0.0max_leaf_nodes
: int
, default = Nonerandom_state
: int
, default = Nonefrom luma.regressor.tree import DecisionTreeRegressor
from luma.visual.evaluation import ResidualPlot
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
X = np.linspace(-3, 3, 200).reshape(-1, 1)
y = (2 * np.cos(3 * X) - X).flatten() + 3 * np.random.rand(200)
tree = DecisionTreeRegressor(max_depth=6)
tree.fit(X, y)
y_pred = tree.predict(X)
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(X, y, s=10, c="black", alpha=0.4)
ax1.plot(X, y_pred, lw=2, c="teal", label="Predicted Plot")
ax1.fill_between(
X.flatten(), y_pred, y, color="teal", alpha=0.1, label="Residual Area"
)
ax1.set_xlabel("x")
ax1.set_ylabel("y")
ax1.set_title(f"{type(tree).__name__} Estimation [MSE: {tree.score(X, y):.4f}]")
ax1.legend()
ax1.grid(alpha=0.2)
res = ResidualPlot(tree, X, y)
res.plot(ax=ax2, show=True)
Improving decision tree regression performance often involves using ensemble methods, such as Random Forests and Gradient Boosted Trees. These methods build multiple trees and aggregate their predictions to improve accuracy and robustness.
Pruning is a technique used to reduce the size of
a decision tree by removing parts of the tree that do not provide additional power to classify instances. This can help improve the model's generalizability and reduce overfitting.
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. "Classification and Regression Trees". Wadsworth, 1984.
- T. Hastie, R. Tibshirani, and J. Friedman. "The Elements of Statistical Learning: Data Mining, Inference, and Prediction". Springer Series in Statistics, 2009.
- S. Raschka. "Python Machine Learning". Packt Publishing, 2015.