Article Categories

Selected Reading

How can decision tree be used to construct a classifier in Python?

Python Server Side Programming Programming

Decision trees are one of the most intuitive and widely-used algorithms in machine learning for classification tasks. They work by recursively splitting the dataset based on feature values to create a tree-like model that makes predictions by following decision paths from root to leaf nodes.

How Decision Trees Work

A decision tree splits the input space into regions based on feature values. Each internal node represents a decision based on a feature, while leaf nodes contain the final prediction. The algorithm uses measures like Gini impurity to determine the best splits that maximize information gain.

The tree construction process continues until stopping criteria are met, such as maximum depth or minimum samples per leaf. This greedy approach ensures that each split provides the maximum reduction in impurity.

DecisionTreeClassifier Syntax

Scikit-learn provides the DecisionTreeClassifier class for building decision tree models ?

class sklearn.tree.DecisionTreeClassifier(
    criterion='gini',
    max_depth=None,
    min_samples_split=2,
    min_samples_leaf=1,
    random_state=None
)

Example: Gender Classification

Let's build a decision tree classifier to predict gender based on two features ?

from sklearn import tree
from sklearn.model_selection import train_test_split

# Sample data with two features
my_data = [[16,19],[17,32],[13,3],[14,5],[141,28],[13,34],[186,2],[126,25],[176,28],
[131,32],[166,6],[128,32],[79,110],[12,38],[19,91],[71,136],[116,25],[17,200], 
[15,25], [14,32],[13,35]]

target_vals = ['Man','Woman','Man','Woman','Woman','Man','Woman','Woman','Woman',
'Woman','Woman','Man','Man','Man','Woman', 'Woman', 'Woman','Woman','Man','Woman','Woman']

data_feature_names = ['Feature_1','Feature_2']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(my_data, target_vals, test_size=0.2, random_state=1)

# Create and train the decision tree classifier
clf = tree.DecisionTreeClassifier(random_state=42)
print("Training the decision tree classifier...")
DTclf = clf.fit(X_train, y_train)

# Make predictions
test_prediction = DTclf.predict(X_test)
new_prediction = DTclf.predict([[135,29]])

print("Test set predictions:", test_prediction)
print("New sample prediction:", new_prediction)

Training the decision tree classifier...
Test set predictions: ['Woman' 'Man' 'Man' 'Woman']
New sample prediction: ['Woman']

Key Parameters

Important parameters for tuning decision tree performance ?

criterion − Split quality measure ('gini' or 'entropy')
max_depth − Maximum tree depth to prevent overfitting
min_samples_split − Minimum samples required to split a node
min_samples_leaf − Minimum samples required at leaf nodes
random_state − Controls randomness for reproducible results

Advantages and Limitations

Advantages	Limitations
Easy to interpret and visualize	Prone to overfitting
No need for feature scaling	Can be unstable with small data changes
Handles both numerical and categorical data	Bias toward features with many levels
Requires little data preparation	May create overly complex trees

Conclusion

Decision trees provide an intuitive approach to classification with clear decision paths. While they can overfit, proper parameter tuning and ensemble methods like Random Forest can significantly improve their performance and robustness.

AmitDiwan

Updated on: 2026-03-25T13:15:07+05:30

302 Views

Previous Next