XGBoost - Classification



Among the most common uses of XGBoost is classification. It predicts a discrete class label based on the input features. Classification is carried out using the XGBClassifier module, which was created particularly to handle classification tasks.

XGBClassifier Syntax

To improve performance we can adjust the hyperparameters of the XGBClassifier class in XGBoost. The basic syntax for building an XGBoost classifier is shown below −

model = xgb.XGBClassifier(
    objective='multi:softprob',
    num_class=num_classes,      
    max_depth=max_depth,       
    learning_rate=learning_rate,
    subsample=subsample,        
    colsample_bytree=colsample, 
    n_estimators=num_estimators
)

Here is the description of the hyperparameters used in the XGBClassifier syntax −

  • objective='multi:softprob - It is the objective parameter which is optional for multi-class classification and returns a probability score for each class. For binary classification the default value is 'binary:logistic'.

  • num_class=num_classes - It is needed for multi-class classification tasks and shows the number of classes present in the dataset.

  • max_depth=max_depth - It is an optional parameter which shows the maximum depth of each decision tree.

  • learning_rate=learning_rate - It is an optional parameter in which step size shrinkage avoids overfitting.

  • subsample=subsample - It is an optional parameter showing the fraction of samples used for each tree.

  • colsample_bytree=colsample - It is an also optional parameter which shows the fraction of features used for each tree.

  • n_estimators=num_estimators - It is a required parameter which finds the number of boosting iterations and handles the overall complexity of the model.

Example of XGBoost Classification

The Iris dataset is a highly popular dataset in machine learning. It comprises 150 iris flower examples, each with four measurements and three iris flower species need to be categorized.

Let us use the Iris dataset to show classification using the XGBoost library:

   import xgboost as xgb
   from sklearn.datasets import load_iris
   from sklearn.model_selection import train_test_split
   from sklearn.metrics import accuracy_score, classification_report

   # Load the Iris dataset
   data = load_iris()
   X, y = data.data, data.target

   # Split the data into training and test sets
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

   #Create an XGBoost classifier
   model = xgb.XGBClassifier()

   #Train the model on the training data
   model.fit(X_train, y_train)

   #Make predictions on the test set
   predictions = model.predict(X_test)

   #Calculate accuracy
   accuracy = accuracy_score(y_test, predictions)

   print("Model's Accuracy is:", accuracy)
   print("\nModel's Classification Report is:")
   print(classification_report(y_test, predictions, target_names=data.target_names))

Output

This will lead to the following outcome −

Model's Accuracy is: 1.0

Model's Classification Report is:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Summary

XGBoost is a strong tool for machine learning specially for classification tasks. It works well in many situations because it is fast and has features that help prevent overfitting. For example - we used XGBoost to classify iris flowers into their different types, achieving perfect accuracy of 1.0. Its flexibility and efficiency make XGBoost a great choice for many real life classification problems.

Advertisements