
- CatBoost - Home
- CatBoost - Overview
- CatBoost - Architecture
- CatBoost - Installation
- CatBoost - Features
- CatBoost - Decision Trees
- CatBoost - Boosting Process
- CatBoost - Core Parameters
- CatBoost - Data Preprocessing
- CatBoost - Handling Categorical Features
- CatBoost - Handling Missing Values
- CatBoost - Classifier
- CatBoost - Model Training
- CatBoost - Metrics for Model Evaluation
- CatBoost - Classification Metrics
- CatBoost - Over-fitting Detection
- CatBoost vs Other Boosting Algorithms
- CatBoost Useful Resources
- CatBoost - Quick Guide
- CatBoost - Useful Resources
- CatBoost - Discussion
CatBoost - Classification Metrics
CatBoost is used mostly for classification tasks. To classify something is to put it into a category. We analyze CatBoost's performance in terms of data classification using multiple criteria.
To classify data points into separate categories is the primary objective of classification problems. Multiple metrics are provided by CatBoost to evaluate model performance.
The performance of CatBoost's classification can be evaluated using the following required metrics −
Accuracy
This shows what percentage of the model's predictions were accurate. This is the overall number of accurate projections divided by the total number of predictions. However this measurement makes the most sense in this case, it may not be best for datasets with imbalances (one class significantly outnumbering the other).
To find the accuracy we will have to import libraries like numpy, catboost, sklearn.datasets and sklearn.model_selection.
import numpy as np from catboost import CatBoostClassifier, Pool from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split # Loading the Iris dataset iris = load_iris() X, y = iris.data, iris.target # Splitting the data into training and testing datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a CatBoostClassifier model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, loss_function='MultiClass', verbose=0) # Create a Pool object train_pool = Pool(X_train, label=y_train) test_pool = Pool(X_test, label=y_test) # Train the model model.fit(train_pool) # Evaluate the model metrics = model.eval_metrics(test_pool, metrics=['Accuracy'], plot=True) # Print the evaluation metrics accuracy = metrics['Accuracy'][-1] print(f'Accuracy is: {accuracy:.2f}')
Output
The result shows that the model is a ideal fit for the dataset and that it has successfully predicted every incidence in the dataset.
MetricVisualizer(layout=Layout(align_self='stretch', height='500px')) Accuracy: 1.00
Multiclass Log Loss
Multiclass log loss, also known as cross-entropy for multiclass classification, is a variation of log loss designed for multiclass classification problems. This evaluates the point to which expected probabilities match with the actual class labels by predicting a probability distribution over many classes.
import numpy as np from catboost import CatBoostClassifier, Pool from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split # Loading the Iris dataset iris = load_iris() X, y = iris.data, iris.target # Splitting the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a CatBoostClassifier model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, loss_function='MultiClass', verbose=0) # Create a Pool object train_pool = Pool(X_train, label=y_train) test_pool = Pool(X_test, label=y_test) # Train the model model.fit(train_pool) # Evaluate the model for multi-class classification metrics = model.eval_metrics(test_pool, metrics=['MultiClass'], plot = True) # Print the evaluation metrics multi_class_loss = metrics['MultiClass'][-1] print(f'Multi-Class Loss: {multi_class_loss:.2f}')
Output
In the below result, a multi-class loss value of 0.03 indicates that the model is performing well in terms of multi-class classification on the test dataset.
MetricVisualizer(layout=Layout(align_self='stretch', height='500px')) Multi-Class Loss: 0.03
Binary Log Loss
Binary Log loss measures the difference between the true labels and the predicted probability. Lower log loss values shows better performance. This meter is useful for making accurate predictions. And it can be used in situations needs for a more precise measurement of probability, like fraud detection or medical diagnosis. It is often brought up when talking about binary classification, which is when the dataset has only two classes.
Because the Iris dataset has three classes, it is not suitable for this metric. With the help of the Breast Cancer dataset we can see that it has just two classes: those that represent the presence and absence of breast cancer.
import numpy as np from catboost import CatBoostClassifier, Pool from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split # Loading the Breast Cancer dataset data = load_breast_cancer() X, y = data.data, data.target # Splitting the data into training and testing datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a CatBoostClassifier model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, verbose=0) # Create a Pool object train_pool = Pool(X_train, label=y_train) test_pool = Pool(X_test, label=y_test) # Train the model model.fit(train_pool) # Evaluate the model metrics = model.eval_metrics(test_pool, metrics=['Logloss'], plot =False) # Print the evaluation metrics logloss = metrics['Logloss'][-1] print(f'Log Loss (Cross-Entropy): {logloss:.2f}')
Output
Here is the output of the above code −
Log Loss (Cross-Entropy): 0.08
AUC-ROC and AUC-PRC
Area Under the Receiver Operating Characteristic Curve (AUR-ROC) and Area Under the Precision-Recall Curve (AUC-PRC) are critical metrics for binary classification algorithms. AUC-ROC evaluates the model's ability to differentiate between positive and negative classifications whereas AUC-PRC focuses more closely on the trade-offs between precision and recall.
import catboost from catboost import CatBoostClassifier, Pool from sklearn import datasets from sklearn.model_selection import train_test_split # Loading the Iris dataset iris = datasets.load_iris() X = iris.data y = iris.target # Converting to binary classification by mapping y_binary = (y == 2).astype(int) # Splitting the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=42) # Creating a CatBoost classifier with AUC-ROC metric model = CatBoostClassifier(iterations=500, random_seed=42, eval_metric='AUC') # Converting the training data into a CatBoost Pool train_pool = Pool(X_train, label=y_train) # Training the model model.fit(train_pool, verbose=100) validation_pool = Pool(X_test, label=y_test) eval_result = model.eval_metrics(validation_pool, ['AUC'])['AUC'] metrics = model.eval_metrics(validation_pool, metrics=['PRAUC'],plot = True) auc_pr = metrics['PRAUC'][-1] # Print the evaluation metrics print(f'AUC-PR: {auc_pr:.2f}') print(f"AUC-ROC: {eval_result[-1]:.4f}")
Output
This will bring about the following outcome −
Learning rate set to 0.007867 0: total: 2.09ms remaining: 1.04s 100: total: 42.3ms remaining: 167ms 200: total: 67.9ms remaining: 101ms 300: total: 89.8ms remaining: 59.4ms 400: total: 110ms remaining: 27ms 499: total: 129ms remaining: 0us MetricVisualizer(layout=Layout(align_self='stretch', height='500px')) AUC-PR: 1.00 AUC-ROC: 1.0000
F1 Score
The F1 Score are the model's accuracy, or how well it predicts a category, and recall, or how often it was able to identify that category. To balance the trade-off between false positives and false negatives, this statistic is perfect. Also we need to keep in mind that the better models tend to have higher F1 Scores.
import numpy as np from catboost import CatBoostClassifier, Pool from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split # Loading the Breast Cancer dataset data = load_breast_cancer() X, y = data.data, data.target # Splitting the data into training and testing datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Creating a CatBoostClassifier model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, verbose=0) # Creating a Pool object train_pool = Pool(X_train, label=y_train) test_pool = Pool(X_test, label=y_test) # Train the model model.fit(train_pool) # Evaluate the model metrics = model.eval_metrics(test_pool, metrics=['F1'], plot=True) # Print the evaluation metrics f1 = metrics['F1'][-1] print(f'F1 Score: {f1:.2f}')
Output
This will lead to the following outcome −
MetricVisualizer(layout=Layout(align_self='stretch', height='500px')) F1 Score: 0.98
Summary
In summary, CatBoost provides a wide array of metrics and evaluation tools that greatly simplify the model selection and evaluation process. It starts with its built-in evaluation measures for classification tasks, like Mean Squared Error and Logloss but it also allows customisation with user-defined metrics. A complete evaluation is ensured by the capacity to use early stopping, cross-validate, and monitor many metrics throughout training. It offers a wide range of metrics for tasks needing classification.