Article Categories

Selected Reading

How to create a random forest classifier using Python Scikit-learn?

Python Scikit-learn Server Side Programming Programming

Random Forest is a supervised machine learning algorithm that creates multiple decision trees on data samples and combines their predictions through voting. This ensemble approach reduces overfitting and typically produces better results than a single decision tree.

The algorithm works by training multiple decision trees on different subsets of the data and features, then averaging their predictions for regression or using majority voting for classification.

Steps to Create Random Forest Classifier

Follow these steps to create a random forest classifier using Python Scikit-learn:

Step 1 ? Import the required libraries

Step 2 ? Load the dataset

Step 3 ? Split dataset into training and test sets

Step 4 ? Import RandomForestClassifier from sklearn.ensemble

Step 5 ? Create and train the random forest model

Step 6 ? Make predictions on test data

Step 7 ? Evaluate model accuracy

Example: Iris Dataset Classification

Let's build a random forest classifier using the famous Iris dataset to classify flower species:

# Import required libraries
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

# Load the iris dataset
iris_data = datasets.load_iris()
print("Target classes:", iris_data.target_names)
print("Features:", iris_data.feature_names)

# Split into features and target
X, y = iris_data.data, iris_data.target

# Split into training and test sets (60% train, 40% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40, random_state=42)

# Create Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
rf_classifier.fit(X_train, y_train)

# Make predictions
y_pred = rf_classifier.predict(X_test)

# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, y_pred) * 100
print(f"\nAccuracy of Random Forest Classifier: {accuracy:.1f}%")

Target classes: ['setosa' 'versicolor' 'virginica']
Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Accuracy of Random Forest Classifier: 100.0%

Making Predictions

Now let's use our trained classifier to predict flower species for new samples:

# Predict flower type for new samples
# Format: [sepal_length, sepal_width, petal_length, petal_width]

# Sample 1
prediction1 = rf_classifier.predict([[5.0, 4.0, 3.0, 1.0]])
print("Sample 1 prediction:", prediction1)
print("Species:", iris_data.target_names[prediction1[0]])

# Sample 2
prediction2 = rf_classifier.predict([[5.0, 4.0, 5.0, 2.0]])
print("Sample 2 prediction:", prediction2)
print("Species:", iris_data.target_names[prediction2[0]])

Sample 1 prediction: [1]
Species: versicolor
Sample 2 prediction: [2]
Species: virginica

Key Parameters

Important RandomForestClassifier parameters include:

n_estimators: Number of trees in the forest (default: 100)
max_depth: Maximum depth of trees (default: None)
random_state: Controls randomness for reproducible results
max_features: Number of features to consider for splits

Conclusion

Random Forest is an effective ensemble method that combines multiple decision trees to create a robust classifier. It handles overfitting well and often achieves high accuracy on various classification tasks, making it a popular choice for machine learning projects.

Gaurav Leekha

Updated on: 2026-03-26T22:13:37+05:30

1K+ Views

Previous Next