
- XGBoost - Home
- XGBoost - Overview
- XGBoost - Architecture
- XGBoost - Installation
- XGBoost - Hyper-parameters
- XGBoost - Tuning with Hyper-parameters
- XGBoost - Using DMatrix
- XGBoost - Classification
- XGBoost - Regressor
- XGBoost - Regularization
- XGBoost - Learning to Rank
- XGBoost - Over-fitting Control
- XGBoost - Quantile Regression
- XGBoost - Bootstrapping Approach
- XGBoost - Python Implementation
- XGBoost vs Other Boosting Algorithms
- XGBoost Useful Resources
- XGBoost - Quick Guide
- XGBoost - Useful Resources
- XGBoost - Discussion
XGBoost - Classification
Among the most common uses of XGBoost is classification. It predicts a discrete class label based on the input features. Classification is carried out using the XGBClassifier module, which was created particularly to handle classification tasks.
XGBClassifier Syntax
To improve performance we can adjust the hyperparameters of the XGBClassifier class in XGBoost. The basic syntax for building an XGBoost classifier is shown below −
model = xgb.XGBClassifier( objective='multi:softprob', num_class=num_classes, max_depth=max_depth, learning_rate=learning_rate, subsample=subsample, colsample_bytree=colsample, n_estimators=num_estimators )
Here is the description of the hyperparameters used in the XGBClassifier syntax −
objective='multi:softprob - It is the objective parameter which is optional for multi-class classification and returns a probability score for each class. For binary classification the default value is 'binary:logistic'.
num_class=num_classes - It is needed for multi-class classification tasks and shows the number of classes present in the dataset.
max_depth=max_depth - It is an optional parameter which shows the maximum depth of each decision tree.
learning_rate=learning_rate - It is an optional parameter in which step size shrinkage avoids overfitting.
subsample=subsample - It is an optional parameter showing the fraction of samples used for each tree.
colsample_bytree=colsample - It is an also optional parameter which shows the fraction of features used for each tree.
n_estimators=num_estimators - It is a required parameter which finds the number of boosting iterations and handles the overall complexity of the model.
Example of XGBoost Classification
The Iris dataset is a highly popular dataset in machine learning. It comprises 150 iris flower examples, each with four measurements and three iris flower species need to be categorized.
Let us use the Iris dataset to show classification using the XGBoost library:
import xgboost as xgb from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification_report # Load the Iris dataset data = load_iris() X, y = data.data, data.target # Split the data into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) #Create an XGBoost classifier model = xgb.XGBClassifier() #Train the model on the training data model.fit(X_train, y_train) #Make predictions on the test set predictions = model.predict(X_test) #Calculate accuracy accuracy = accuracy_score(y_test, predictions) print("Model's Accuracy is:", accuracy) print("\nModel's Classification Report is:") print(classification_report(y_test, predictions, target_names=data.target_names))
Output
This will lead to the following outcome −
Model's Accuracy is: 1.0 Model's Classification Report is: precision recall f1-score support setosa 1.00 1.00 1.00 10 versicolor 1.00 1.00 1.00 9 virginica 1.00 1.00 1.00 11 accuracy 1.00 30 macro avg 1.00 1.00 1.00 30 weighted avg 1.00 1.00 1.00 30
Summary
XGBoost is a strong tool for machine learning specially for classification tasks. It works well in many situations because it is fast and has features that help prevent overfitting. For example - we used XGBoost to classify iris flowers into their different types, achieving perfect accuracy of 1.0. Its flexibility and efficiency make XGBoost a great choice for many real life classification problems.