
- XGBoost - Home
- XGBoost - Overview
- XGBoost - Architecture
- XGBoost - Installation
- XGBoost - Hyper-parameters
- XGBoost - Tuning with Hyper-parameters
- XGBoost - Using DMatrix
- XGBoost - Classification
- XGBoost - Regressor
- XGBoost - Regularization
- XGBoost - Learning to Rank
- XGBoost - Over-fitting Control
- XGBoost - Quantile Regression
- XGBoost - Bootstrapping Approach
- XGBoost - Python Implementation
- XGBoost vs Other Boosting Algorithms
- XGBoost Useful Resources
- XGBoost - Quick Guide
- XGBoost - Useful Resources
- XGBoost - Discussion
XGBoost - Learning To Rank
XGBoost is the most common choice for a wide range of LTR applications, like recommender system enhancement, click-through rate prediction, and SEO. In this chapter we will cover a variety of objective functions, lead you through the steps of preparing data, and provide examples of how to train your model.
What is Learning to Rank?
Before we get started, here we will simply explain what ranking is. Ranking is a subset of supervised machine learning. Rather than predicting the outcome of a single data point, it evaluates a series of data points after receiving a set of data points and a query which sets it apart from the more common situations of classification and regression.
Normally, search engines use ranking to identify the most relevant result. It can also be used to suggest things, provide relevant suggestions based on previous purchases, or, as it did for me, identify the horses with the best probability of winning the next race.
There are three objective functions available for ranking with XGBoost: pointwise, pairwise, and listwise. There are benefits and drawbacks to each of these three objective functions, which stand for different methods of determining the rank of an item group. Many sources provide extensive descriptions of them, but we will focus to the main points here.
Pointwise
This technique processes each query document pair independently. All you have to do is give each query-document pair a score and build a model that can predict the relevance score.
Consider the following situation, we have a dataset of query-document pairs, and each pair has a relevance score between 1 and 5. The relevance score for each pair can be predicted by training a regression model.
The pointwise method is a great way to start since, in addition to its ease of use, it is unexpectedly powerful and challenging to overcome. This method can be applied in XGBoost with any of the typical objective functions for regression or classification. Remember to adjust for any imbalances in the dataset labels.
Pairwise
The pairwise approach evaluates pairings of documents and makes a decision to minimize the number of pairs that are out of order. It is used by algorithms like RankNet.
This approach would take a query and two documents at a time and adjust the predicted relevance scores so that the most relevant document has a higher score than the least relevant one.
It considers the question as well as a single document at a time, as compared to the pointwise method. Here, we want to precisely model the relative order of the documents with respect to a query.
XGBoost provides a few objective functions for this strategy −
rank:pairwise: This is the original pairwise loss function (also called RankNet) that combines LambdaRank with MART (Multiple Additive Regression Trees), also known by the name LambdaMART.
rank:ndcg: NDCG stands for Normalized Discounted Cumulative Gain. It is one of the most popular ranking quality metrics in the industry since it takes into consideration both the relative order of documents given a query and the relevance score of each page. This objective function uses an alternative gradient created from the NDCG metric to optimize the model.
rank:map: MAP stands for Mean Average Precision. This shorter ranking quality metric is used when the relevance scores are binary (0 or 1). If MAP serves as the evaluation metric, then this objective function needs to be used in general.
Listwise
Even if XGBoost not uses the listwise technique, it is still necessary to discuss it for completeness. It considers the entire set of documents for a particular query trying to optimize the list's order all at once.
As it analyzes the relative order of all documents, it is an enhancement over the pointwise and pairwise procedures and may give improved results. An example of a listwise loss function is ListMLE.
Cross-validation is a useful technique to identify the ideal objective function for your problem when trying to choose the optimal goal function. Pointwise was completely ineffective.
Learning to Rank with XGBoost
We will look at how to prepare data and build a Learning to Rank (LTR) model using XGBoost which is a powerful machine learning framework. We will be using the MSLR-WEB10K real-world dataset from Microsoft which is popular in the Learning to Rank community. The relevance ratings of the query-document pairs in this dataset show how closely a document matches the user's query.
Step 1: Preparing the Data
Learning to Rank is a strategy used by search engines to rank results based on relevancy. Given in this set are: What is the person trying to find? The user's outcomes were displayed. Also, the relevance score of every document-four being the highest level of relevance-shows its level of relevance compared to the query. The range of scores is 0 to 4.
Before creating the model we have to import some key libraries for data handling −
import pandas as pd import numpy as np from sklearn.datasets import load_svmlight_file
These libraries make handling data easily. NumPy is used for numerical computations, and Pandas is used for data manipulation. The vast, multi-feature dataset is loaded in text format using the load_svmlight_file method from sklearn.datasets.
After that we will load our training and validation datasets −
train = load_svmlight_file(str(data_path / '/Python/vali.txt'), query_id=True) valid = load_svmlight_file(str(data_path / '/Python/test.txt'), query_id=True)
Our model's training and testing data is stored in train and valid here. Three components make up each dataset: a target vector (relevance scores), a feature matrix (representing document qualities), and query IDs (to group documents under the same query).
Now we will unpack these into variables for using them easily −
X_train, y_train, qid_train = train X_valid, y_valid, qid_valid = valid
Now, X_train and X_valid are the feature matrices, y_train and y_valid are the relevance scores, and qid_train and qid_valid are used to group the documents by query. We can replicate the ranking task using this approach.
Step 2: Training the XGBRanker
In the training process, we build a model to rank documents according to how relevant they are to a query. By importing the XGBoost library, we will first create the XGBRanker class, which is meant to be used for tasks involving learning to rank.
import xgboost as xgb model = xgb.XGBRanker(tree_method="hist", objective="rank:ndcg")
Here,
The tree_method="hist" is a fast method for generating decision trees.
The objective="rank:ndcg" setting aims to optimize Normalized Discounted Cumulative Gain (NDCG), a statistic commonly used in ranking activities.
Next we will fit the model to the training data −
model.fit(X_train, y_train, qid=qid_train)
In this case, we pass the query IDs (qid_train), the feature matrix (X_train), and the relevance scores (y_train). One important stage in the ranking process is grouping documents that belong to the same query using query IDs.
Step 3: Predicting Relevance Scores
We can use the model to make predictions once it has been trained. It is usually helpful to forecast relevance for a single query at a time for ranking. Suppose for a moment that we want to predict the outcome of the first query in our validation set −
X_query = X_valid[qid_valid == qids[0]]
Here, X_query contains all the documents relevant to the initial query. Now, we can predict their relevance scores by applying −
y_pred = model.predict(X_query)
The calculated scores will show how relevant each document is in relation to the others. By sorting these predictions, the documents can be scored; higher scores show greater relevance.
Step 4: Evaluating the Model with NDCG
We use a metric called Normalized Discounted Cumulative Gain, or NDCG, to evaluate the effectiveness of our ranking system. This statistic evaluates the degree to which the ranking accurately reflects the actual importance of the papers. Here is the NDCG calculating function −
def ndcg(y_score, y_true, k): order = np.argsort(y_score)[::-1] y_true = np.take(y_true, order[:k]) gain = 2 ** y_true - 1 discounts = np.log2(np.arange(len(y_true)) + 2) return np.sum(gain / discounts)
By comparing the expected scores (y_score) and real relevance scores (y_true), this function calculates the NDCG. We go through each query in our validation set and calculate the NDCG score −
ndcg_ = list() qids = np.unique(qid_valid) for i, qid in enumerate(qids): y = y_valid[qid_valid == qid] if np.sum(y) == 0: continue p = model.predict(X_valid[qid_valid == qid]) idcg = ndcg(y, y, k=10) ndcg_.append(ndcg(p, y, k=10) / idcg)
Finally, we calculate each query's mean NDCG score −
np.mean(ndcg_)
So all we have left is a single score that represents the overall performance of our model. A higher NDCG score indicates a higher ranking quality.