- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is OOB error?
Introduction
OOB or Out of Bag error and OOB Score is a term related to Random Forests. Random Forest is an ensemble of decision trees that improves the prediction from that of a single decision tree.OOB error is used to measure the error in the prediction of tree-based models like random forests, decision trees, and other ML models using the bagging method. In an OOB sample, the number of wrong classifications is an OOB error.
In this article let's explore OOB error/score. Before moving ahead let us a short overview of Random Forest and Decision Trees.
Random Forest Algorithm
Random Forest is an ensemble of decision trees. A decision tree model makes a prediction using a rule-based system of dividing the data based on features with simple decisions. Each such point where a decision is made becomes a node. When the predictions from many decision trees are combined it forms a Random Forest model. Random Forest is a bootstrapped aggregated model. Random forests are used for both regression and classification.
Random Forests are better as compared to Decision trees −
They are not sensitive to outliers
Can work with non-linear data
Less overfitting
Can work effectively on big datasets
Higher accuracy than other algorithms
OOB(Out Of Bag Score)
It is a performance metric for Random Forests. In Random Forests Algorithm some samples are not used in the training process. These are known as out-of-bag samples. These samples are used to test the model performance, which generates an OOB score. These samples are not seen by the model during the training process.
Out-of-bag(OOB) Error
The OOB error is calculated using the sci-kit learn package. It can give an estimate of the performance of the Random Forest model based on the OOB samples. For OOB calculation each decision tree is considered while choosing the samples that were not used in the training of the tree. Thus OOB error is calculated for each decision tree and averaged over all the trees to find the OOB error for the Random Forests Model.
Code Implementation using Scikit Learn
import numpy as np from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier ## dataset data_x, data_y = make_classification(n_samples=5000,n_features=20,n_informative=10,n_classes=2) ## create model model = RandomForestClassifier(n_estimators=200,oob_score=True) model.fit(data_x, data_y) err_OOB = 1 - model.oob_score_ print("err_OOB: {}".format(err_OOB))
Output
err_OOB: 0.088
Advantages: OOB Score/Error
It ensures a model with better prediction since the OOB samples on which the score is calculated have not been used in training the model and it is unseen by the model.
It has less variance and no overfitting since there is no exposure to data
The data can be tested while it is being trained so, the computation time is less for each testing set
Disadvantages: OOB Score/Error
The overall time for training the model may increase since the calculation of OOB will take sufficient time
It is suitable for small datasets since large datasets will consume more time.
Conclusion
OOB error is sometimes a winner as compared to other validation metrics concerning Random Forests. It gives better predictions and has a less overfitting problem. However, it takes a bit of time for large datasets.OOB helps in better prediction and to reduce overfitting in models while taking less computation time while testing the data. The popularity of calculating OOB errors extends to both tree-based models as well as other ML algorithms.