XGBoost - Learning To Rank

Quiz

XGBoost is the most common choice for a wide range of LTR applications, like recommender system enhancement, click-through rate prediction, and SEO. In this chapter we will cover a variety of objective functions, lead you through the steps of preparing data, and provide examples of how to train your model.

What is Learning to Rank?

Before we get started, here we will simply explain what ranking is. Ranking is a subset of supervised machine learning. Rather than predicting the outcome of a single data point, it evaluates a series of data points after receiving a set of data points and a query which sets it apart from the more common situations of classification and regression.

Normally, search engines use ranking to identify the most relevant result. It can also be used to suggest things, provide relevant suggestions based on previous purchases, or, as it did for me, identify the horses with the best probability of winning the next race.

There are three objective functions available for ranking with XGBoost: pointwise, pairwise, and listwise. There are benefits and drawbacks to each of these three objective functions, which stand for different methods of determining the rank of an item group. Many sources provide extensive descriptions of them, but we will focus to the main points here.

Pointwise

This technique processes each query document pair independently. All you have to do is give each query-document pair a score and build a model that can predict the relevance score.

Consider the following situation, we have a dataset of query-document pairs, and each pair has a relevance score between 1 and 5. The relevance score for each pair can be predicted by training a regression model.

The pointwise method is a great way to start since, in addition to its ease of use, it is unexpectedly powerful and challenging to overcome. This method can be applied in XGBoost with any of the typical objective functions for regression or classification. Remember to adjust for any imbalances in the dataset labels.

Pairwise

The pairwise approach evaluates pairings of documents and makes a decision to minimize the number of pairs that are out of order. It is used by algorithms like RankNet.

This approach would take a query and two documents at a time and adjust the predicted relevance scores so that the most relevant document has a higher score than the least relevant one.

It considers the question as well as a single document at a time, as compared to the pointwise method. Here, we want to precisely model the relative order of the documents with respect to a query.

XGBoost provides a few objective functions for this strategy −

rank:pairwise: This is the original pairwise loss function (also called RankNet) that combines LambdaRank with MART (Multiple Additive Regression Trees), also known by the name LambdaMART.
rank:ndcg: NDCG stands for Normalized Discounted Cumulative Gain. It is one of the most popular ranking quality metrics in the industry since it takes into consideration both the relative order of documents given a query and the relevance score of each page. This objective function uses an alternative gradient created from the NDCG metric to optimize the model.
rank:map: MAP stands for Mean Average Precision. This shorter ranking quality metric is used when the relevance scores are binary (0 or 1). If MAP serves as the evaluation metric, then this objective function needs to be used in general.

Listwise

Even if XGBoost not uses the listwise technique, it is still necessary to discuss it for completeness. It considers the entire set of documents for a particular query trying to optimize the list's order all at once.

As it analyzes the relative order of all documents, it is an enhancement over the pointwise and pairwise procedures and may give improved results. An example of a listwise loss function is ListMLE.

Cross-validation is a useful technique to identify the ideal objective function for your problem when trying to choose the optimal goal function. Pointwise was completely ineffective.

Learning to Rank with XGBoost

We will look at how to prepare data and build a Learning to Rank (LTR) model using XGBoost which is a powerful machine learning framework. We will be using the MSLR-WEB10K real-world dataset from Microsoft which is popular in the Learning to Rank community. The relevance ratings of the query-document pairs in this dataset show how closely a document matches the user's query.

Step 1: Preparing the Data

Learning to Rank is a strategy used by search engines to rank results based on relevancy. Given in this set are: What is the person trying to find? The user's outcomes were displayed. Also, the relevance score of every document-four being the highest level of relevance-shows its level of relevance compared to the query. The range of scores is 0 to 4.

Before creating the model we have to import some key libraries for data handling −

import pandas as pd
import numpy as np
from sklearn.datasets import load_svmlight_file

These libraries make handling data easily. NumPy is used for numerical computations, and Pandas is used for data manipulation. The vast, multi-feature dataset is loaded in text format using the load_svmlight_file method from sklearn.datasets.

After that we will load our training and validation datasets −

train = load_svmlight_file(str(data_path / '/Python/vali.txt'), query_id=True)
valid = load_svmlight_file(str(data_path / '/Python/test.txt'), query_id=True)

Our model's training and testing data is stored in train and valid here. Three components make up each dataset: a target vector (relevance scores), a feature matrix (representing document qualities), and query IDs (to group documents under the same query).

Now we will unpack these into variables for using them easily −

X_train, y_train, qid_train = train
X_valid, y_valid, qid_valid = valid

Now, X_train and X_valid are the feature matrices, y_train and y_valid are the relevance scores, and qid_train and qid_valid are used to group the documents by query. We can replicate the ranking task using this approach.

Step 2: Training the XGBRanker

In the training process, we build a model to rank documents according to how relevant they are to a query. By importing the XGBoost library, we will first create the XGBRanker class, which is meant to be used for tasks involving learning to rank.

import xgboost as xgb
model = xgb.XGBRanker(tree_method="hist", objective="rank:ndcg")

Here,

The tree_method="hist" is a fast method for generating decision trees.
The objective="rank:ndcg" setting aims to optimize Normalized Discounted Cumulative Gain (NDCG), a statistic commonly used in ranking activities.

Next we will fit the model to the training data −

model.fit(X_train, y_train, qid=qid_train)

In this case, we pass the query IDs (qid_train), the feature matrix (X_train), and the relevance scores (y_train). One important stage in the ranking process is grouping documents that belong to the same query using query IDs.

Step 3: Predicting Relevance Scores

We can use the model to make predictions once it has been trained. It is usually helpful to forecast relevance for a single query at a time for ranking. Suppose for a moment that we want to predict the outcome of the first query in our validation set −

X_query = X_valid[qid_valid == qids[0]]

Here, X_query contains all the documents relevant to the initial query. Now, we can predict their relevance scores by applying −

y_pred = model.predict(X_query)

The calculated scores will show how relevant each document is in relation to the others. By sorting these predictions, the documents can be scored; higher scores show greater relevance.

Step 4: Evaluating the Model with NDCG

We use a metric called Normalized Discounted Cumulative Gain, or NDCG, to evaluate the effectiveness of our ranking system. This statistic evaluates the degree to which the ranking accurately reflects the actual importance of the papers. Here is the NDCG calculating function −

def ndcg(y_score, y_true, k):
   order = np.argsort(y_score)[::-1]
   y_true = np.take(y_true, order[:k])

   gain = 2 ** y_true - 1
   discounts = np.log2(np.arange(len(y_true)) + 2)
   return np.sum(gain / discounts)

By comparing the expected scores (y_score) and real relevance scores (y_true), this function calculates the NDCG. We go through each query in our validation set and calculate the NDCG score −

ndcg_ = list()
qids = np.unique(qid_valid)

for i, qid in enumerate(qids):
   y = y_valid[qid_valid == qid]
    
   if np.sum(y) == 0:
      continue
    
   p = model.predict(X_valid[qid_valid == qid])
   idcg = ndcg(y, y, k=10)
   ndcg_.append(ndcg(p, y, k=10) / idcg)

Finally, we calculate each query's mean NDCG score −

np.mean(ndcg_)

So all we have left is a single score that represents the overall performance of our model. A higher NDCG score indicates a higher ranking quality.

Print Page