How can decision tree be used to implement a regressor in Python?

A Decision Tree Regressor is a machine learning algorithm that predicts continuous target values by splitting data into subsets based on feature values. Unlike classification trees that predict discrete classes, regression trees predict numerical values by averaging target values in leaf nodes.

How Decision Tree Regression Works

Decision trees work by recursively splitting the dataset into smaller subsets based on feature values that minimize prediction error. The algorithm uses criteria like Mean Squared Error (MSE) to determine the best splits at each node.

Feature ? 3.5? Yes No Predict: 0.425 Feature ? 6? Predict: 1.5 Predict: 1.73

Syntax

class sklearn.tree.DecisionTreeRegressor(*, criterion='squared_error', max_depth=None, ...)

Example

Here's how to implement a Decision Tree Regressor using scikit-learn ?

from sklearn import tree

# Training data: [feature1, feature2]
my_data = [[1, 1], [5, 5], [2, 3], [7, 11]]
target_vals = [0.1, 1.5, 0.75, 1.73]

# Create and train the regressor
clf = tree.DecisionTreeRegressor()
print("The decision tree regressor has been created")

DTreg = clf.fit(my_data, target_vals)
print("Data has been fit to the model")

# Make predictions
pred_val = DTreg.predict([[4, 7]])
print("The predicted value is:")
print(pred_val)
The decision tree regressor has been created
Data has been fit to the model
The predicted value is:
[1.5]

Key Parameters

Important parameters for fine-tuning the Decision Tree Regressor ?

from sklearn import tree
from sklearn.model_selection import train_test_split
import numpy as np

# Generate sample data
X = np.random.rand(100, 2) * 10
y = X[:, 0] * 2 + X[:, 1] * 1.5 + np.random.randn(100) * 0.1

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create regressor with custom parameters
regressor = tree.DecisionTreeRegressor(
    max_depth=5,           # Limit tree depth
    min_samples_split=10,  # Minimum samples to split
    min_samples_leaf=5,    # Minimum samples in leaf
    random_state=42
)

# Train and evaluate
regressor.fit(X_train, y_train)
score = regressor.score(X_test, y_test)
print(f"R² Score: {score:.4f}")

# Make predictions
predictions = regressor.predict(X_test[:5])
print("Sample predictions:", predictions[:5])
R² Score: 0.9756
Sample predictions: [13.30892857 11.96857143  5.07964286  8.66892857  9.01964286]

Advantages and Disadvantages

Advantages Disadvantages
Easy to understand and interpret Prone to overfitting
Handles both numerical and categorical data Can be unstable (small data changes affect tree)
Requires little data preparation May create biased trees with unbalanced data
Can capture non-linear relationships Not optimal for continuous numerical values

Conclusion

Decision Tree Regressors are powerful tools for predicting continuous values with interpretable results. Use proper parameter tuning and consider ensemble methods like Random Forest to improve performance and reduce overfitting.

Updated on: 2026-03-25T13:20:30+05:30

245 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements