XGBoost - Using DMatrix



XGBoost uses a special data structure called DMatrix to store datasets more effectively. Memory and performance are optimized, particularly if large datasets can be handled with.

Importance of DMatrix

Here are some key points where DMatrix is important in XGBoost −

  • DMatrix easily stores large datasets by reducing the amount of memory needed.

  • When your data is changed into a DMatrix so XGBoost can automatically create weights and carry out various preprocessing tasks. It even handles values that are missing.

  • Using DMatrix instead of a standard dataset format speeds up training because XGBoost can access and use the data quickly.

Example of XGBoost using DMatrix

Here is the step by step process of creating XGboost model with the help of DMatrix −

1. Import Libraries

First you need to import the required libraries for the model.

import xgboost as xgb
import pandas as pd

2. Define the dataset

Define the dataset, this can be your CSV data. Here we have used Wholesale-customers-data.csv dataset.

df = pd.read_csv('/Python/Wholesale-customers-data.csv')

3. Separate features

Now we will separate features (X) and target (y) in this step.

# Features (everything except the 'Channel' column)
X = df.drop(columns=['Channel'])  
# Target variable (Channel column)
y = df['Channel']  

4. Convert into DMatrix

In this stage we will convert the features and target into a DMatrix which is an optimized data structure for XGBoost.

dtrain = xgb.DMatrix(X, label=y)

5. Define Parameters

Below we will define the XGBoost model parameters.

params = {
   # Maximum depth of a tree
   'max_depth': 3,  
   # Learning rate
   'eta': 0.1,      
   # Objective function for multiclass classification
   'objective': 'multi:softmax',  
   # Number of classes
   'num_class': 3   
}

6. Train the model

Now we will train the model with the help of the DMatrix.

num_round = 10  # Number of boosting rounds
bst = xgb.train(params, dtrain, num_round)

7. Save the model and Get Predictions

After training, you can save the model. To make predictions we are using the same data as test but you can use new data in real scenarios.

# Save the model here
bst.save_model('xgboost_model.json')

# Make Predictions here
dtest = xgb.DMatrix(X)  
predictions = bst.predict(dtest)

# Print predictions
print(predictions)

Output

This output shows the predicted class for every observation as per the features given.

[2. 2. 2. 1. 2. 2. 2. 2. 1. 2. 2. 1. 2. 2. 2. 1. 2. 1. 2. 1. 2. 1. 1. 2.
 2. 2. 1. 1. 2. 1. 1. 1. 1. 1. 1. 2. 1. 2. 2. 1. 1. 1. 2. 2. 2. 2. 2. 2.
 2. 2. 1. 1. 2. 2. 1. 1. 2. 2. 1. 2. 2. 2. 2. 2. 1. 2. 2. 2. 1. 1. 1. 1.
 1. 1. 2. 1. 1. 2. 1. 1. 1. 2. 2. 1. 2. 2. 2. 1. 1. 1. 1. 1. 2. 1. 2. 1.
 2. 1. 1. 1. 2. 2. 2. 1. 1. 1. 2. 2. 2. 2. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1.
 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 1. 2. 2. 2. 1. 1. 2. 2. 2. 2. 1.
 1. 1. 2. 2. 2. 2. 1. 2. 1. 1. 1. 1. 1. 2. 2. 1. 1. 1. 1. 2. 2. 2. 1. 1.
 1. 2. 1. 1. 1. 2. 1. 1. 2. 2. 1. 1. 1. 2. 1. 2. 2. 2. 1. 2. 1. 2. 2. 2.
 2. 1. 2. 1. 1. 2. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 2. 2. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 2. 1. 2. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 2. 1. 1. 1. 2. 2. 1. 2. 2. 2. 2. 2. 2. 2. 1. 1. 2. 1. 1.
 2. 1. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 2.
 1. 1. 1. 1. 2. 2. 2. 2. 1. 1. 2. 2. 1. 2. 1. 2. 1. 2. 1. 1. 1. 2. 1. 1.
 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2.
 2. 1. 1. 1. 2. 1. 1. 2. 2. 2. 2. 1. 2. 2. 1. 2. 2. 1. 2. 1. 1. 1. 1. 1.
 1. 1. 2. 1. 1. 2. 1. 1.]
Advertisements