XGBoost - Regularization

Quiz

Strong machine learning algorithm called XGBoost offered a range of regularization techniques with that it reduces over-fitting and improve model generalization.

The following are the main regularization methods for XGBoost −

L1 (Lasso) Regularization: Regulated by the alpha hyperparameter
L2 (Ridge) Regularization: The lambda hyperparameter influences it
Early Stopping: This is controlled by the early_stopping_rounds option.
Minimum Child Weight: Needs each leaf node to have a minimum sum of instance weights.
Gamma: Shows the minimum loss decrease needed to split a leaf node.

Regularization helps manage model complexity by applying penalties to the loss function, stopping the model from fitting noise in the training data.

It is important to know and use these regularization methods in order to optimize XGBoost models and improve performance on unknown data.

L1 and L2 Regularization

XGBoost basically supports 2 primary kind of regularization: first is L1 (Lasso) and second is L2 (Ridge).

L1 (Lasso) Regularization

L1 regularization adds the absolute values of the feature weights to the loss function which encourages weak models by pushing some feature weights to exactly zero.

The alpha hyperparameter in XGBoost controls the L1 regularization strength. At higher alpha levels, more feature weights are set to zero resulting in a simpler and easier-to-understand model.

Mathematical Expression:

Penalty = α × ∑ |weights|

L2 (Ridge) Regularization

L2 regularization is used to add the squared values of the feature weights to the loss function.

As compared with L1, L2 regularization does not drive feature weights to zero; but, it supports lower, more evenly distributed feature weights.

The degree of L2 regularization in XGBoost is controlled by the lambda hyperparameter. Higher lambda values are obtained for a more regularized, feature-weighted, and reduced model.

Mathematical Expression:

Penalty = λ × ∑ (weights)²

Regularization Parameters in XGBoost

Here are the list of parameters used in XGBoost −

alpha (L1 regularization term on weights): By regulating the L1 regularization, this promotes sparse data. Greater alpha values drive more weights to zero by increasing the penalty on weights.
lambda (L2 regularization term on weights): Reduces the weights and complexity of the model by controlling the L2 regularization. The model becomes less sensitive to individual features at higher lambda values.

Early Stopping

XGBoost provides early stopping as a regularization strategy in addition to L1 and L2 regularization. By tracking a verification metric at the time of training and stopping the training process when the metric stops improving, early stopping reduces over-fitting.

The number of rounds to wait before quitting XGBoost if no improvement is seen is set by the early_stopping_rounds option.

Early stopping helps to identify the ideal moment at which the model learned significant patterns without becoming overly sensitive to noise.

Tree-Specific Regularization

XGBoost provides tree-specific regularization techniques. The min_child_weight option sets the minimal total of instance weights for each leaf node in the tree. Higher values provide a lower risk to over-fitting, simpler, more generic trees. This controls the trees' depth and complexity.

Another tree-specific regularization option is the gamma parameter, which determines the lowest loss reduction needed to perform a subsequent division on a leaf node. Higher gamma values result in simpler tree designs and more conservative splits.

Importance of Regularization

In XGBoost by fine tuning alpha and lambda you can control the trade-off between model complexity and performance. So below are some important points given which shows why regularization is important −

Prevents Over-fitting: By penalizing complex models, regularization keeps them from fitting too closely to the training set.
Improves Generalization: Regularization makes sure the model performs well when used with new, untested data.
Better Feature Selection: L1 regularization can be used to drive less important feature weights to zero so removing them from the model and making it easier to read.

Print Page