It is also called Gradient Boosting Machines. In the following Python recipe, we are going to build Stochastic Gradient Boostingensemble model for classification by using GradientBoostingClassifier class of sklearn on Pima Indians diabetes dataset.
First, import the required packages as follows −
from pandas import read_csv from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score from sklearn.ensemble import GradientBoostingClassifier
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:\pima-indians-diabetes.csv" headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data = read_csv(path, names = headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, give the input for 10-fold cross validation as follows −
seed = 5 kfold = KFold(n_splits = 10, random_state = seed)
We need to provide the number of trees we are going to build. Here we are building 150 trees with split points chosen from 5 features −
num_trees = 50
Next, build the model with the help of following script −
model = GradientBoostingClassifier(n_estimators = num_trees, random_state = seed)
Calculate and print the result as follows −
results = cross_val_score(model, X, Y, cv = kfold) print(results.mean())
The output above shows that we got around 77.5% accuracy of our Gradient Boosting classifier ensemble model.