XGBoost - Regressor

Quiz

Regression is a technique used in XGBoost that predicts continuous numerical values. It is common to use the objective variable in predicting sales, real estate prices, and stock values when it shows a continuous output.

The results of the regression problems can be real or numbers that are continuous. Two common regression algorithms are decision trees and linear regression. Regression evaluation uses several measures like mean-squared error (MSE) and root-mean-square error (RMSE). These are some of the main components of the XGBoost model each one has a key function.
The term it stands for is RMSE, or square root of mean squared error. However MAE is an absolute total of the differences between the actual and expected values it is not as commonly used as other metrics because of mathematical errors.

XGBoost is a useful tool for building supervised regression models. This statement can be proven by knowing its objective function and base learners.

The objective function includes a loss function and a regularization term. It explains the difference between actual and expected values, or how much the output of the model differs from the real data.
The most popular loss functions in XGBoost for regression and binary classification, respectively, are reg:linear and reg:logistics. XGBoost is a technique used in ensemble learning. Ensemble learning requires training and merging individual models to get a single prediction.

Syntax of XGBRegressor

The XGBRegressor in Python is the regression specific version of XGBoost and is used for regression problems where the objective is to predict continuous numerical values.

The basic syntax to build an XGBRegressor module is as follows −

import xgboost as xgb

model = xgb.XGBRegressor(
   objective='reg:squarederror',
   max_depth=max_depth,
   learning_rate=learning_rate,
   subsample=subsample,
   colsample_bytree=colsample,
   n_estimators=num_estimators
)

Parameters

Here are the parameters of the XGBRegressor function −

objective is a must-have parameter that decides the purpose of the model for regression tasks. It is set to reg which means it uses squared loss to calculate errors in regression problems.
max_depth is an optional parameter that shows how deep each decision tree can go. A higher value allows the tree to learn more, but can also lead to over-fitting.
learning_rate is another optional parameter. It controls how much the model learns in each step. A smaller value can prevent over-fitting by slowing down the learning.
subsample is optional and refers to the portion of data that will be used to create each tree. Using less data can make the model more general.
colsample_bytree is also optional and controls how many features (columns) are used to create each tree.
n_estimators is required and it tells the model how many trees to make (boosting rounds). More trees can improve accuracy but also make the model more complex.

Example of XGBRegressor

This code trains a machine learning model which is used to predict housing prices with the help of the XGBoost method. By reading a dataset and dividing it into training and testing sets, it will train the model. In the end the prediction accuracy is evaluated by computing the root mean squared error or RMSE.

Let us evaluate the regression technique using the XGBoost framework on this dataset −

# Required imports 
import numpy as np 
import pandas as pd 
import xgboost as xg 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_squared_error as MSE 

# Loading the data 
dataset = pd.read_csv("/Python/Datasets/HousingData.csv") 
X, y = dataset.iloc[:, :-1], dataset.iloc[:, -1] 

# Splitting the datasets
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.3, random_state = 123) 

# Instantiation 
xgb_r = xg.XGBRegressor(objective ='reg:linear', n_estimators = 10, seed = 123) 

# Fitting the model 
xgb_r.fit(train_X, train_y) 

# Predict the model 
pred = xgb_r.predict(test_X) 

# RMSE Computation 
rmse = np.sqrt(MSE(test_y, pred)) 
print("RMSE : % f" %(rmse))

Output

Here is the output of the above model −

RMSE :  4.963784

Linear base learner

This code uses a linear booster and XGBoost to predict housing prices. A dataset is loaded split into training and testing sets, and then each set is converted into the DMatrix format needed by XGBoost. The prediction accuracy is determined by calculating the root mean squared error or RMSE after model training.

# Required imports 
import numpy as np 
import pandas as pd 
import xgboost as xg 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_squared_error as MSE 

# Loading the data 
dataset = pd.read_csv("/Python/Datasets/HousingData.csv") 
X, y = dataset.iloc[:, :-1], dataset.iloc[:, -1] 

# Splitting the datasets
train_X, test_X, train_y, test_y = train_test_split(X, y, 
					test_size = 0.3, random_state = 123) 

train_dmatrix = xg.DMatrix(data = train_X, label = train_y) 
test_dmatrix = xg.DMatrix(data = test_X, label = test_y) 

# Parameter dictionary 
param = {"booster":"gblinear", "objective":"reg:linear"} 

xgb_r = xg.train(params = param, dtrain = train_dmatrix, num_boost_round = 10) 
pred = xgb_r.predict(test_dmatrix) 

# RMSE Computation 
rmse = np.sqrt(MSE(test_y, pred)) 
print("RMSE : % f" %(rmse))

Output

Here is the outcome of the following model −

RMSE :  6.101922

Summary

XGBoost is a popular framework for regression problem resolution. It is ideal for regression models that precisely predict continuous numerical values because of its efficient gradient boosting and ability to handle complex datasets. XGBoost is an important tool for machine learning regression analysis because of its continuous growth, which guarantees that it will lead the way in regression procedures.

Print Page