CatBoost - Metrics for Model Evaluation



Effective evaluation is necessary while creating models for machine learning in order to make sure that the model's performance meets growing standards and requirements. Yandex's CatBoost is a powerful gradient-boosting library that gives data scientists and machine learning professionals a set of metrics to evaluate the effectiveness of their models.

CatBoost is known for handling categorical features with ease, accuracy, and effectiveness. Because of its amazing accuracy, it is preferred for a large number of machine-learning tasks in real-world scenarios.

But the real worth of a model is defined by its actual performance as much as its algorithms. Metrics are helpful in this situation. The two core functions provided by CatBoost for model evaluation are "evaluate ()" and "eval_metric". These functions cover a wide range of features. But CatBoost offers more than just that.

CatBoost Metrics

CatBoost metrics are used to evaluate a model's performance that was created using the machine learning technique CatBoost. These metrics help us understand the quality and accuracy of the model's predictions. Here are some common CatBoost metrics along with their explanation −

Accuracy

Accuracy is a widely used parameter to evaluate the performance of classification models. It evaluates how accurate a model's predicted percentage is for a given dataset.

For binary classifications, the accuracy will be as follows −

Accuracy = {TP + TN}/{TP + TN + FP + FN}       

Here,

  • TP(): The number of cases that were accurately predicted to be positive, or correctly classified as falling within a particular class is known as True Positives or TP.

  • TN(): The number of cases that were accurately predicted to be negative, or correctly identified as not belonging to a particular class is known as True Negatives or TN.

  • FP(): The quantity of cases that are mistakenly calculated as positive, meaning they are wrongly categorized as being in a specific class is known as False Positive or FP.

  • FN(): The quantity of cases that were estimated to be negative, erroneously classified as not falling under a particular category is known as False Negatives or FN.

MultiClass Log Loss

MultiClass log loss, also known as cross-entropy loss or log loss, is a commonly used metric to measure how well classification models perform in multiclass scenarios. It computes the difference between the true class labels and each instance's expected class probability.

The MultiClass log loss is represented mathematically just like below −

Multiclass Log Loss = - (1 / N) i=1N j=1M [ yij * log(pij) ]

Here,

  • N: The number of samples (or data points).

  • M: The number of classes.

  • yij: A binary indicator (0 or 1) that tells whether the sample i belongs to class j. It is 1 if the sample belongs to the class, otherwise 0.

  • pij: The predicted probability of the sample i being in class j.

Binary Log Loss

Binary Log Loss is a commonly used data to evaluate how well binary classification algorithms perform. It is also referred to as logistic loss or cross-entropy loss. It calculates the difference between each instance's expected probability and the true binary labels.

The binary log loss is represented mathematically as follows −

Binary Log Loss = - (1 / N) i=1N [ yi * log(pi) + (1 - yi) * log(1 - pi) ]

Here,

  • N: The number of samples (or data points).

  • yi: A binary indicator (0 or 1) that tells whether the sample i is positive (1) or negative (0).

  • pi: The predicted probability that the sample i belongs to the positive class (class 1).

  • log: The natural logarithm.

AUC-ROC (Area Under the Receiver Operating Characteristic)

The effectiveness of a binary classification model over a range of thresholds is graphically shown by the Receiver Operating Characteristic (ROC) curve. At various probability levels, it plots the True Positive Rate (TPR) vs the False Positive Rate (FPR).

The area under the ROC curve, or AUC-ROC, can be computed as follows −

  • True Positive Rate (TPR): It is also known as sensitivity or recall, it is calculated as −

    TPR = True Positives (TP) / [True Positives (TP) + False Negatives (FN)]
    
  • False Positive Rate (FPR): It is calculated as −

    FPR = False Positives (FP) / [False Positives (FP) + True Negatives (TN)]
    
  • ROC Curve: The ROC curve is plotted at various threshold values, with the TPR on the y-axis and the FPR on the x-axis.

  • AUC Calculation: Software libraries that provide this function or numerical methods can be used to calculate the area under the ROC curve.

F1-Score

Recall (sensitivity) and precision (positive predictive value) are combined into a single score, the F1 Score, which is a popular metric in binary classification problems that finds a balance between the two.

The F1 Score can be stated mathematically as follows −

F1 = F1 = 2.Precision.Recall / (Precision + Recall)
Advertisements