XGBoost - Hyperparameters



In this chapter we are going to discuss the subset of hyperparameters needed or commonly used XGBoost algorithm. These parameters have been selected to simplify the process of generating model parameters from data. The required hyperparameters that need to be configured are listed in this chapter category wise. The hyperparameters that are settable and optional.

XGBoost Hyperparameters Categories

The overall hyperparameters have been divided into three main categories by XGBoost creators −

  • General Parameters

  • Booster Parameters

  • Learning Task Parameters

Let us discuss these three categories of Hyperparameters in the below section −

General Parameters

The general parameters define the overall functionality and working of the XGBoost model. Here is the list of parameters comes in this category −

  • booster [default=gbtree]: This parameter basically selects the type of model to run at each iteration. It gives 2 options - gbtree: tree-based models and gblinear: linear models.

  • silent [default=0]: It is used to set the model in silent mode. If it is activated and set to 1, means, no running messages will be printed. It is good to keep it 0 because the messages can help in understanding the model.

  • nthread [default to the maximum number of threads available]: It is mainly used for parallel processing and the number of cores in the system should be entered. If you want to run on all cores so the value will not be entered and the algorithm will detect it automatically.

There are two other parameters that XGBoost automatically sets so you do not need to worry about them.

Booster Parameters

Since there are two types of boosters, Here we will only discuss the tree booster because it is less frequently used than the linear booster and consistently performs better.

Parameter Description Typical Values
eta Like the learning speed. Helps control how much the model changes after each step. 0.01-0.2
min_child_weight The smallest total weight of all observations required in a tree's node. Tune with cross-validation
max_depth The deepest level of a tree. Controls overfitting (model being too specific). 3-10
max_leaf_nodes The most leaves (end points) a tree can have.
gamma The smallest amount the loss needs to decrease to split a node. Tune based on loss function
max_delta_step Limits how much a tree's weight can change. Usually not needed
subsample The fraction of the data used to grow each tree. 0.5-1
colsample_bytree The fraction of columns (features) randomly chosen for each tree. 0.5-1
colsample_bylevel The fraction of columns used for each split at every level of the tree. Usually not used
lambda L2 regularization (like Ridge regression), helps reduce overfitting. Try to reduce overfitting
alpha L1 regularization (like Lasso regression), useful for models with many features. Good for high-dimensional data
scale_pos_weight Helps with imbalanced data classes to make the model learn faster. > 0 (for imbalanced data)

Learning Task Parameters

The learning task parameters define the goal of optimization and the metric that will be chosen at each step.

objective [default=reg:linear]

It is used to define the loss function to be minimized. And mostly used values are as follows −

  • binary: logistic - Refers to binary classification, because there are two classifications. It returns the expected probability instead of the actual class.

  • multi: softmax - It is used for multiclass classification. It returns the expected class instead of the probability. You also need to set the additional option num_class in order to tell the model how many unique classes are there.

  • multi: softprob - It is a function that is comparable to softmax in that it provides the probabilities for every possible class that a data point can belong to, instead of only the class that is predicted.

eval_metric [ default according to objective ]

Evaluation metrics must be used with the validation data. The default parameters is rmse used for error classification and regression.

The typical values are as follows −

  • rmse: root mean square error

  • mae: mean absolute error

  • logloss: negative log-likelihood

  • error: Binary classification error rate (0.5 thresholds)

  • merror: Multiclass classification error rate

  • mlogloss: Multiclass logloss

  • auc: Area under the curve

seed [default=0]

It is the random number seed. And it is used for generating reproducible results and also for parameter tuning.

Those of you who never used Scikit-Learn before are unlikely to recognize these parameter names. However, there is a sklearn wrapper for the Python xgboost package called XGBClassifier parameters. It follows the naming convention of the sklearn style. The names of the parameters that will alter are:

  • eta -> learning_rate

  • lambda -> reg_lambda

  • alpha -> reg_alpha

Advertisements