Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to create a random forest classifier using Python Scikit-learn?
Random Forest is a supervised machine learning algorithm that creates multiple decision trees on data samples and combines their predictions through voting. This ensemble approach reduces overfitting and typically produces better results than a single decision tree.
The algorithm works by training multiple decision trees on different subsets of the data and features, then averaging their predictions for regression or using majority voting for classification.
Steps to Create Random Forest Classifier
Follow these steps to create a random forest classifier using Python Scikit-learn:
Step 1 ? Import the required libraries
Step 2 ? Load the dataset
Step 3 ? Split dataset into training and test sets
Step 4 ? Import RandomForestClassifier from sklearn.ensemble
Step 5 ? Create and train the random forest model
Step 6 ? Make predictions on test data
Step 7 ? Evaluate model accuracy
Example: Iris Dataset Classification
Let's build a random forest classifier using the famous Iris dataset to classify flower species:
# Import required libraries
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
# Load the iris dataset
iris_data = datasets.load_iris()
print("Target classes:", iris_data.target_names)
print("Features:", iris_data.feature_names)
# Split into features and target
X, y = iris_data.data, iris_data.target
# Split into training and test sets (60% train, 40% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40, random_state=42)
# Create Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the model
rf_classifier.fit(X_train, y_train)
# Make predictions
y_pred = rf_classifier.predict(X_test)
# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, y_pred) * 100
print(f"\nAccuracy of Random Forest Classifier: {accuracy:.1f}%")
Target classes: ['setosa' 'versicolor' 'virginica'] Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Accuracy of Random Forest Classifier: 100.0%
Making Predictions
Now let's use our trained classifier to predict flower species for new samples:
# Predict flower type for new samples
# Format: [sepal_length, sepal_width, petal_length, petal_width]
# Sample 1
prediction1 = rf_classifier.predict([[5.0, 4.0, 3.0, 1.0]])
print("Sample 1 prediction:", prediction1)
print("Species:", iris_data.target_names[prediction1[0]])
# Sample 2
prediction2 = rf_classifier.predict([[5.0, 4.0, 5.0, 2.0]])
print("Sample 2 prediction:", prediction2)
print("Species:", iris_data.target_names[prediction2[0]])
Sample 1 prediction: [1] Species: versicolor Sample 2 prediction: [2] Species: virginica
Key Parameters
Important RandomForestClassifier parameters include:
- n_estimators: Number of trees in the forest (default: 100)
- max_depth: Maximum depth of trees (default: None)
- random_state: Controls randomness for reproducible results
- max_features: Number of features to consider for splits
Conclusion
Random Forest is an effective ensemble method that combines multiple decision trees to create a robust classifier. It handles overfitting well and often achieves high accuracy on various classification tasks, making it a popular choice for machine learning projects.
