
- LightGBM - Home
- LightGBM - Overview
- LightGBM - Architecture
- LightGBM - Installation
- LightGBM - Core Parameters
- LightGBM - Boosting Algorithms
- LightGBM - Tree Growth Strategy
- LightGBM - Dataset Structure
- LightGBM - Binary Classification
- LightGBM - Regression
- LightGBM - Ranking
- LightGBM - Implementation in Python
- LightGBM - Parameter Tuning
- LightGBM - Plotting Functionality
- LightGBM - Early Stopping Training
- LightGBM - Feature Interaction Constraints
- LightGBM vs Other Boosting Algorithms
- LightGBM Useful Resources
- LightGBM - Quick Guide
- LightGBM - Useful Resources
- LightGBM - Discussion
LightGBM - Feature Interaction Constraints
When lightgbm has finished training the ensemble trees on a dataset, each node denotes a condition determined by a feature value. When making predictions with an individual tree, we start at the root node and compare the feature condition given in the node to our sample feature values. We make decisions as per the feature values in our sample and the condition of the tree. This allows us to generate the final prediction by taking a specific path to the leaf of the tree. By default there are no limitations on which nodes can have which capability.
This method of generating a final decision by iterating over nodes of a tree and analyzing feature condition is known as feature interaction, because the predictor arrived to the specific node after evaluating the state of the previous one. LightGBM allows us to decide which features can interact with each other. We can define a set of indices and only those qualities will interact with one another. These features will be unable to interact with other features and this limitation will be enforced when trees are generated during the training phase.
We have shown how to force a feature interaction constraints on an estimator in LightGBM. LightGBM estimators have a parameter called interaction_constraints which accepts a list of lists each containing indices of parameters that can interact with one another.
Example 1
Here is an example of how we can force Feature Interaction Constraint on estimator in lightgbm.
The load_boston function from sklearn.datasets may be deprecated in some versions of scikit-learn. If any error occurs than you can load the dataset from an external source or use an alternative dataset.
# Import necessary libraries import lightgbm as lgb from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score # Load the Boston housing dataset boston = load_boston() # Split the data into training and testing sets X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42) # Print the size of the training and testing sets print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n") # Create LightGBM datasets train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist()) test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist()) # Train the LightGBM model booster = lgb.train({ "objective": "regression", "verbosity": -1, "metric": "rmse", 'interaction_constraints': [[0,1,2,11,12], [3,4], [6,10], [5,9], [7,8]] }, train_set=train_dataset, valid_sets=(test_dataset,), num_boost_round=10 ) # Make predictions test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) # Calculate and print R2 scores print("\nR2 Test Score : %.2f" % r2_score(Y_test, test_preds)) print("R2 Train Score : %.2f" % r2_score(Y_train, train_preds))
Output
This will generate the below result:
Sizes of Train or Test Datasets : (455, 13) (51, 13) (455,) (51,) [1] valid_0's rmse: 7.50225 [2] valid_0's rmse: 7.01989 [3] valid_0's rmse: 6.58246 [4] valid_0's rmse: 6.18581 [5] valid_0's rmse: 5.83873 [6] valid_0's rmse: 5.47166 [7] valid_0's rmse: 5.19667 [8] valid_0's rmse: 4.96259 [9] valid_0's rmse: 4.69168 [10] valid_0's rmse: 4.51653 R2 Test Score : 0.67 R2 Train Score : 0.69
Example 2
Now the below code trains a LightGBM model to predict housing prices using the Boston dataset. After training it will calculate how well the model works on both the training and test data using the R2 score.
# Import necessary libraries import lightgbm as lgb from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score # Load the Boston housing dataset boston = load_boston() # Split the dataset into training and testing sets X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42) # Print the size of the training and testing sets print("Sizes of Training and Testing Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape) # Create a LightGBM model with interaction constraints and 10 estimators booster = lgb.LGBMModel(objective="regression", n_estimators=10, interaction_constraints=[[0,1,2,11,12], [3, 4], [6,10], [5,9], [7,8]]) # Train the model on the training set and validate it on the test set booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test)], eval_metric="rmse") # Make predictions on both the test and training sets test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) # Calculate and print the R2 score for the test and training sets print("\nR2 Test Score : %.2f" % r2_score(Y_test, test_preds)) print("R2 Train Score : %.2f" % r2_score(Y_train, train_preds))
Output
This will create the following result:
Sizes of Training and Testing Datasets : (379, 13) (127, 13) (379,) (127,) [1] valid_0's rmse: 8.97871 valid_0's l2: 80.6173 [2] valid_0's rmse: 8.35545 valid_0's l2: 69.8135 [3] valid_0's rmse: 7.93432 valid_0's l2: 62.9535 [4] valid_0's rmse: 7.61104 valid_0's l2: 57.9279 [5] valid_0's rmse: 7.16832 valid_0's l2: 51.3849 [6] valid_0's rmse: 6.93182 valid_0's l2: 48.0501 [7] valid_0's rmse: 6.57728 valid_0's l2: 43.2606 [8] valid_0's rmse: 6.41497 valid_0's l2: 41.1518 [9] valid_0's rmse: 6.13983 valid_0's l2: 37.6976 [10] valid_0's rmse: 5.9864 valid_0's l2: 35.837 R2 Test Score : 0.60 R2 Train Score : 0.69