
- XGBoost - Home
- XGBoost - Overview
- XGBoost - Architecture
- XGBoost - Installation
- XGBoost - Hyper-parameters
- XGBoost - Tuning with Hyper-parameters
- XGBoost - Using DMatrix
- XGBoost - Classification
- XGBoost - Regressor
- XGBoost - Regularization
- XGBoost - Learning to Rank
- XGBoost - Over-fitting Control
- XGBoost - Quantile Regression
- XGBoost - Bootstrapping Approach
- XGBoost - Python Implementation
- XGBoost vs Other Boosting Algorithms
- XGBoost Useful Resources
- XGBoost - Quick Guide
- XGBoost - Useful Resources
- XGBoost - Discussion
XGBoost - Using DMatrix
XGBoost uses a special data structure called DMatrix to store datasets more effectively. Memory and performance are optimized, particularly if large datasets can be handled with.
Importance of DMatrix
Here are some key points where DMatrix is important in XGBoost −
DMatrix easily stores large datasets by reducing the amount of memory needed.
When your data is changed into a DMatrix so XGBoost can automatically create weights and carry out various preprocessing tasks. It even handles values that are missing.
Using DMatrix instead of a standard dataset format speeds up training because XGBoost can access and use the data quickly.
Example of XGBoost using DMatrix
Here is the step by step process of creating XGboost model with the help of DMatrix −
1. Import Libraries
First you need to import the required libraries for the model.
import xgboost as xgb import pandas as pd
2. Define the dataset
Define the dataset, this can be your CSV data. Here we have used Wholesale-customers-data.csv dataset.
df = pd.read_csv('/Python/Wholesale-customers-data.csv')
3. Separate features
Now we will separate features (X) and target (y) in this step.
# Features (everything except the 'Channel' column) X = df.drop(columns=['Channel']) # Target variable (Channel column) y = df['Channel']
4. Convert into DMatrix
In this stage we will convert the features and target into a DMatrix which is an optimized data structure for XGBoost.
dtrain = xgb.DMatrix(X, label=y)
5. Define Parameters
Below we will define the XGBoost model parameters.
params = { # Maximum depth of a tree 'max_depth': 3, # Learning rate 'eta': 0.1, # Objective function for multiclass classification 'objective': 'multi:softmax', # Number of classes 'num_class': 3 }
6. Train the model
Now we will train the model with the help of the DMatrix.
num_round = 10 # Number of boosting rounds bst = xgb.train(params, dtrain, num_round)
7. Save the model and Get Predictions
After training, you can save the model. To make predictions we are using the same data as test but you can use new data in real scenarios.
# Save the model here bst.save_model('xgboost_model.json') # Make Predictions here dtest = xgb.DMatrix(X) predictions = bst.predict(dtest) # Print predictions print(predictions)
Output
This output shows the predicted class for every observation as per the features given.
[2. 2. 2. 1. 2. 2. 2. 2. 1. 2. 2. 1. 2. 2. 2. 1. 2. 1. 2. 1. 2. 1. 1. 2. 2. 2. 1. 1. 2. 1. 1. 1. 1. 1. 1. 2. 1. 2. 2. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 1. 1. 2. 2. 1. 1. 2. 2. 1. 2. 2. 2. 2. 2. 1. 2. 2. 2. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 1. 1. 2. 2. 1. 2. 2. 2. 1. 1. 1. 1. 1. 2. 1. 2. 1. 2. 1. 1. 1. 2. 2. 2. 1. 1. 1. 2. 2. 2. 2. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 1. 2. 2. 2. 1. 1. 2. 2. 2. 2. 1. 1. 1. 2. 2. 2. 2. 1. 2. 1. 1. 1. 1. 1. 2. 2. 1. 1. 1. 1. 2. 2. 2. 1. 1. 1. 2. 1. 1. 1. 2. 1. 1. 2. 2. 1. 1. 1. 2. 1. 2. 2. 2. 1. 2. 1. 2. 2. 2. 2. 1. 2. 1. 1. 2. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 2. 2. 1. 2. 2. 2. 2. 2. 2. 2. 1. 1. 2. 1. 1. 2. 1. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 2. 1. 1. 1. 1. 2. 2. 2. 2. 1. 1. 2. 2. 1. 2. 1. 2. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 1. 1. 1. 2. 1. 1. 2. 2. 2. 2. 1. 2. 2. 1. 2. 2. 1. 2. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 1.]