
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
How to create a random forest classifier using Python Scikit-learn?
Random forest is a supervised machine learning algorithm that is used for classification, regression, and other tasks by creating decision trees on data samples. After creating the decision trees, a random forest classifier collects the prediction from each of them and selects the best solution by means of voting.
One of the best advantages of a random forest classifier is that it reduces overfitting by averaging the result. That is the reason we get better results as compared to a single decision tree.
Steps to Create Random Forest Classifier
We can follow the below steps to create a random forest classifier using Python Scikit-learn −
Step 1 − Import the required libraries.
Step 2 − Load the dataset.
Step 3 − Divide dataset into training and test datasets.
Step 4 − Import random forest classifier from sklearn.ensemble module.
Step 5 − Create dataframe of dataset.
Step 6 − Create a random forest classifier and train the model using fit() function.
Step 7 − Predict from test dataset.
Step 8 − Import metrics to find the accuracy of the classifier.
Step 9 − Print the accuracy of the random forest classifier.
Example
In the below example, we will be using Iris Plants dataset to build a random forest classifier:
# Import required libraries import sklearn import pandas as pd from sklearn import datasets # Load the iris dataset from sklearn iris_clf = datasets.load_iris() print(iris_clf.target_names) print(iris_clf.feature_names) # Dividing the datasets into training datasets and test datasets X, y = datasets.load_iris( return_X_y = True) from sklearn.model_selection import train_test_split # 60 % training dataset and 40 % test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40) # Import random forest classifier from sklearn assemble module from sklearn.ensemble import RandomForestClassifier # Create dataframe data = pd.DataFrame({'sepallength': iris_clf.data[:, 0], 'sepalwidth': iris_clf.data[:, 1], 'petallength': iris_clf.data[:, 2], 'petalwidth': iris_clf.data[:, 3], 'species': iris_clf.target}) # Create a Random Forest classifier RForest_clf = RandomForestClassifier(n_estimators = 100) # Train the model on the training dataset by using fit() function RForest_clf.fit(X_train, y_train) # Predict from the test dataset y_pred = RForest_clf.predict(X_test) # Import metrics for accuracy calculation from sklearn import metrics print('\n'"Accuracy of our Random Forst Classifier is: ", metrics.accuracy_score(y_test, y_pred)*100)
Output
It will produce the following output −
['setosa' 'versicolor' 'virginica'] ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Accuracy of our Random Forst Classifier is: 95.0
Let’s predict the type of flowers using our classifier −
# Predicting the type of flower RForest_clf.predict([[5, 4, 3, 1]])
Output
It will produce the following output −
array([1])
array([1]) represents the versicolor type.
# Predicting the type of flower RForest_clf.predict([[5, 4, 5, 2]])
Output
It will produce the following output −
array([2])
Here the array([2]) represents the virginica type.
- Related Articles
- How to implement Random Projection using Python Scikit-learn?
- How to generate random regression problems using Python Scikit-learn?
- How to create a sample dataset using Python Scikit-learn?
- How to binarize the data using Python Scikit-learn?
- How to perform dimensionality reduction using Python Scikit-learn?
- How to build Naive Bayes classifiers using Python Scikit-learn?
- How to generate a symmetric positive-definite matrix using Python Scikit-Learn?
- How to generate and plot classification dataset using Python Scikit-learn?
- Finding Euclidean distance using Scikit-Learn in Python
- How to find contours of an image using scikit-learn in Python?
- How to get dictionary-like objects from dataset using Python Scikit-learn?
- How can data be scaled using scikit-learn library in Python?
- How to implement linear classification with Python Scikit-learn?
- How to Install Python Scikit-learn on Different Operating Systems?
- How to view the pixel values of an image using scikit-learn in Python?
