
- LightGBM - Home
- LightGBM - Overview
- LightGBM - Architecture
- LightGBM - Installation
- LightGBM - Core Parameters
- LightGBM - Boosting Algorithms
- LightGBM - Tree Growth Strategy
- LightGBM - Dataset Structure
- LightGBM - Binary Classification
- LightGBM - Regression
- LightGBM - Ranking
- LightGBM - Implementation in Python
- LightGBM - Parameter Tuning
- LightGBM - Plotting Functionality
- LightGBM - Early Stopping Training
- LightGBM - Feature Interaction Constraints
- LightGBM vs Other Boosting Algorithms
- LightGBM Useful Resources
- LightGBM - Quick Guide
- LightGBM - Useful Resources
- LightGBM - Discussion
LightGBM - Early Stopping Training
Early stopping training is a method in which we finish training if the evaluation metric assessed on the evaluation dataset does not improve after a particular number of cycles. Lightgbm's sklearn-like estimators have a parameter named early_stopping_rounds in both the train() and fit() methods. This parameter accepts an integer value stating that the training process should be stopped if the evaluation metric result has not improved after a certain number of rounds.
This parameter accepts an integer value which shows that the training process should be terminated if the evaluation metric result does not improve after several rounds.
So keep in mind that this requires an evaluation dataset to work because it relies on evaluation metric results that are assessed against the evaluation dataset.
Example
We will first import the necessary libraries before loading the Boston housing dataset. As of version 1.2 the dataset is no longer available in Scikit-Learn so we will either replicate the feature using sklearn.datasets.load_boston().
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target) print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape) train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist()) test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist()) booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse"}, train_set=train_dataset, valid_sets=(test_dataset,), early_stopping_rounds=5, num_boost_round=100) from sklearn.metrics import r2_score test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) # Display the R2 scores in the console print("\nR2 Score on Test Set : %.2f"%r2_score(Y_test, test_preds)) print("R2 Score on Train Set : %.2f"%r2_score(Y_train, train_preds))
Output
This will produce the following result:
Sizes of Train or Test Datasets: (404, 13) (102, 13) (404,) (102,) [1] valid_0's rmse: 9.10722 Training until validation scores don't improve for 5 rounds [2] valid_0's rmse: 8.46389 [3] valid_0's rmse: 7.93394 [4] valid_0's rmse: 7.43812 [5] valid_0's rmse: 7.01845 [6] valid_0's rmse: 6.68186 [7] valid_0's rmse: 6.43834 [8] valid_0's rmse: 6.17357 [9] valid_0's rmse: 5.96725 [10] valid_0's rmse: 5.74169 [11] valid_0's rmse: 5.55389 [12] valid_0's rmse: 5.38595 [13] valid_0's rmse: 5.24832 [14] valid_0's rmse: 5.13373 [15] valid_0's rmse: 5.0457 [16] valid_0's rmse: 4.96688 [17] valid_0's rmse: 4.87874 [18] valid_0's rmse: 4.8246 [19] valid_0's rmse: 4.75342 [20] valid_0's rmse: 4.69854 Did not meet early stopping. Best iteration is: [20] valid_0's rmse: 4.69854 R2 Score on Test Set: 0.81 R2 Score on Train Set: 0.97
This program divides the breast cancer dataset into two sections like training and testing. It trains a LightGBM model to decide whether a tumor is dangerous or harmless so stopping early if performance fails to improve. Finally it predicts the results for both the test and training sets and computes accuracy of the model.
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target) print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape) booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc") booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], early_stopping_rounds=3) from sklearn.metrics import accuracy_score test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) test_preds = [1 if pred > 0.5 else 0 for pred in test_preds] train_preds = [1 if pred > 0.5 else 0 for pred in train_preds] # Display the accuracy results print("\nAccuracy Score on Test Set : %.2f"%accuracy_score(Y_test, test_preds)) print("Accuracy Score on Train Set : %.2f"%accuracy_score(Y_train, train_preds))
Output
This will lead to the following outcome:
Sizes of Train or Test Datasets : (426, 30) (143, 30) (426,) (143,) [1] valid_0's auc: 0.986129 Training until validation scores don't improve for 3 rounds [2] valid_0's auc: 0.989355 [3] valid_0's auc: 0.988925 [4] valid_0's auc: 0.987097 [5] valid_0's auc: 0.990108 [6] valid_0's auc: 0.993011 [7] valid_0's auc: 0.993011 [8] valid_0's auc: 0.993441 [9] valid_0's auc: 0.993441 [10] valid_0's auc: 0.994194 [11] valid_0's auc: 0.994194 [12] valid_0's auc: 0.994194 [13] valid_0's auc: 0.994409 [14] valid_0's auc: 0.995914 [15] valid_0's auc: 0.996129 [16] valid_0's auc: 0.996989 [17] valid_0's auc: 0.996989 [18] valid_0's auc: 0.996344 [19] valid_0's auc: 0.997204 [20] valid_0's auc: 0.997419 [21] valid_0's auc: 0.997849 [22] valid_0's auc: 0.998065 [23] valid_0's auc: 0.997849 [24] valid_0's auc: 0.998065 [25] valid_0's auc: 0.997634 Early stopping, best iteration is: [22] valid_0's auc: 0.998065 Accuracy Score on Test Set : 0.97 Accuracy Score on Train Set : 0.98
How to stop training early through the "early_stopping()" callback?
LightGBM actually supports early-stopping training using the early_stopping() callback mechanism. We can give the number of rounds for the early_stopping() function as a callback argument to the train()/fit() method. Usage of callbacks is given below −
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target) print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape) booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc") booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], callbacks=[lgb.early_stopping(3)] ) from sklearn.metrics import accuracy_score test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) test_preds = [1 if pred > 0.5 else 0 for pred in test_preds] train_preds = [1 if pred > 0.5 else 0 for pred in train_preds] print("\nAccuracy Score on Test Set : %.2f"%accuracy_score(Y_test, test_preds)) print("Accuracy Score on Train Set : %.2f"%accuracy_score(Y_train, train_preds))
Output
This will generate the below result:
Sizes of Train or Test Datasets : (426, 30) (143, 30) (426,) (143,) [1] valid_0's auc: 0.954328 Training until validation scores don't improve for 3 rounds [2] valid_0's auc: 0.959322 [3] valid_0's auc: 0.982938 [4] valid_0's auc: 0.988244 [5] valid_0's auc: 0.987203 [6] valid_0's auc: 0.98762 [7] valid_0's auc: 0.98814 Early stopping, best iteration is: [4] valid_0's auc: 0.988244 Accuracy Score on Test Set : 0.94 Accuracy Score on Train Set : 0.95