How to create a random forest classifier using Python Scikit-learn?


Random forest is a supervised machine learning algorithm that is used for classification, regression, and other tasks by creating decision trees on data samples. After creating the decision trees, a random forest classifier collects the prediction from each of them and selects the best solution by means of voting.

One of the best advantages of a random forest classifier is that it reduces overfitting by averaging the result. That is the reason we get better results as compared to a single decision tree.

Steps to Create Random Forest Classifier

We can follow the below steps to create a random forest classifier using Python Scikit-learn −

Step 1 − Import the required libraries.

Step 2 − Load the dataset.

Step 3 − Divide dataset into training and test datasets.

Step 4 − Import random forest classifier from sklearn.ensemble module.

Step 5 − Create dataframe of dataset.

Step 6 − Create a random forest classifier and train the model using fit() function.

Step 7 − Predict from test dataset.

Step 8 − Import metrics to find the accuracy of the classifier.

Step 9 − Print the accuracy of the random forest classifier.

Example

In the below example, we will be using Iris Plants dataset to build a random forest classifier:

# Import required libraries import sklearn import pandas as pd from sklearn import datasets # Load the iris dataset from sklearn iris_clf = datasets.load_iris() print(iris_clf.target_names) print(iris_clf.feature_names) # Dividing the datasets into training datasets and test datasets X, y = datasets.load_iris( return_X_y = True) from sklearn.model_selection import train_test_split # 60 % training dataset and 40 % test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40) # Import random forest classifier from sklearn assemble module from sklearn.ensemble import RandomForestClassifier # Create dataframe data = pd.DataFrame({'sepallength': iris_clf.data[:, 0], 'sepalwidth': iris_clf.data[:, 1], 'petallength': iris_clf.data[:, 2], 'petalwidth': iris_clf.data[:, 3], 'species': iris_clf.target}) # Create a Random Forest classifier RForest_clf = RandomForestClassifier(n_estimators = 100) # Train the model on the training dataset by using fit() function RForest_clf.fit(X_train, y_train) # Predict from the test dataset y_pred = RForest_clf.predict(X_test) # Import metrics for accuracy calculation from sklearn import metrics print('\n'"Accuracy of our Random Forst Classifier is: ", metrics.accuracy_score(y_test, y_pred)*100)

Output

It will produce the following output −

['setosa' 'versicolor' 'virginica']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Accuracy of our Random Forst Classifier is: 95.0

Let’s predict the type of flowers using our classifier −

# Predicting the type of flower RForest_clf.predict([[5, 4, 3, 1]])

Output

It will produce the following output −

array([1])

array([1]) represents the versicolor type.

# Predicting the type of flower RForest_clf.predict([[5, 4, 5, 2]])

Output

It will produce the following output −

array([2])

Here the array([2]) represents the virginica type.

Updated on: 04-Oct-2022

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements