How to create a random forest classifier using Python Scikit-learn?

Python Scikit-learn Server Side Programming Programming

Random forest is a supervised machine learning algorithm that is used for classification, regression, and other tasks by creating decision trees on data samples. After creating the decision trees, a random forest classifier collects the prediction from each of them and selects the best solution by means of voting.

One of the best advantages of a random forest classifier is that it reduces overfitting by averaging the result. That is the reason we get better results as compared to a single decision tree.

Steps to Create Random Forest Classifier

We can follow the below steps to create a random forest classifier using Python Scikit-learn −

Step 1 − Import the required libraries.

Step 2 − Load the dataset.

Step 3 − Divide dataset into training and test datasets.

Step 4 − Import random forest classifier from sklearn.ensemble module.

Step 5 − Create dataframe of dataset.

Step 6 − Create a random forest classifier and train the model using fit() function.

Step 7 − Predict from test dataset.

Step 8 − Import metrics to find the accuracy of the classifier.

Step 9 − Print the accuracy of the random forest classifier.

Example

In the below example, we will be using Iris Plants dataset to build a random forest classifier:

# Import required libraries
import sklearn
import pandas as pd
from sklearn import datasets

# Load the iris dataset from sklearn
iris_clf = datasets.load_iris()
print(iris_clf.target_names)
print(iris_clf.feature_names)

# Dividing the datasets into training datasets and test datasets
X, y = datasets.load_iris( return_X_y = True)
from sklearn.model_selection import train_test_split

# 60 % training dataset and 40 % test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)

# Import random forest classifier from sklearn assemble module
from sklearn.ensemble import RandomForestClassifier

# Create dataframe
data = pd.DataFrame({'sepallength': iris_clf.data[:, 0],
'sepalwidth': iris_clf.data[:, 1],
'petallength': iris_clf.data[:, 2],
'petalwidth': iris_clf.data[:, 3],
'species': iris_clf.target})

# Create a Random Forest classifier
RForest_clf = RandomForestClassifier(n_estimators = 100)

# Train the model on the training dataset by using fit() function
RForest_clf.fit(X_train, y_train)

# Predict from the test dataset
y_pred = RForest_clf.predict(X_test)

# Import metrics for accuracy calculation
from sklearn import metrics
print('\n'"Accuracy of our Random Forst Classifier is: ",
metrics.accuracy_score(y_test, y_pred)*100)

Output

It will produce the following output −

['setosa' 'versicolor' 'virginica']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Accuracy of our Random Forst Classifier is: 95.0

Let’s predict the type of flowers using our classifier −

# Predicting the type of flower
RForest_clf.predict([[5, 4, 3, 1]])

Output

It will produce the following output −

array([1])

array([1]) represents the versicolor type.

# Predicting the type of flower
RForest_clf.predict([[5, 4, 5, 2]])

Output

It will produce the following output −

array([2])

Here the array([2]) represents the virginica type.

Gaurav Leekha

Updated on: 04-Oct-2022

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started