Home
Basics
Python Ecosystem
Methods for Machine Learning
Data Loading for ML Projects
Understanding Data with Statistics
Understanding Data with Visualization
Preparing Data
Data Feature Selection
ML Algorithms - Classification
Introduction
Logistic Regression
Support Vector Machine (SVM)
Decision Tree
Naïve Bayes
Random Forest
ML Algorithms - Regression
Random Forest
Linear Regression
ML Algorithms - Clustering
Overview
K-means Algorithm
Mean Shift Algorithm
Hierarchical Clustering
ML Algorithms - KNN Algorithm
Finding Nearest Neighbors
Performance Metrics
Automatic Workflows
Improving Performance of ML Models
Improving Performance of ML Model (Contd…)
ML With Python - Resources
Machine Learning With Python - Quick Guide
Machine Learning with Python - Resources
Machine Learning With Python - Discussion

Machine Learning with Python - Extra Trees

Quiz

It is another extension of bagged decision tree ensemble method. In this method, the random trees are constructed from the samples of the training dataset.

In the following Python recipe, we are going to build extra tree ensemble model by using ExtraTreesClassifier class of sklearn on Pima Indians diabetes dataset.

First, import the required packages as follows −

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import ExtraTreesClassifier

Now, we need to load the Pima diabetes dataset as did in previous examples −

path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names = headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]

Next, give the input for 10-fold cross validation as follows −

seed = 7
kfold = KFold(n_splits = 10, random_state = seed)

We need to provide the number of trees we are going to build. Here we are building 150 trees with split points chosen from 5 features −

num_trees = 150
max_features = 5

Next, build the model with the help of following script −

model = ExtraTreesClassifier(n_estimators = num_trees, max_features = max_features)

Calculate and print the result as follows −

results = cross_val_score(model, X, Y, cv = kfold)
print(results.mean())

Output

0.7551435406698566

The output above shows that we got around 75.5% accuracy of our bagged extra trees classifier model.

machine_learning_with_python_improving_performance_of_ml_models.htm

Print Page