MultiLabel Ranking Metrics - Coverage Error in Machine Learning

Machine Learning Python Data Science

Evaluating the quality of MultiLabel models necessitates the use of MultiLabel Ranking metrics, one such metric is Coverage Error, which quantifies a ranking model's ability to cover all relevant labels for a particular instance.

Multi-label ranking tasks involve the assignment of multiple relevant labels to a given instance, such as tagging images or categorizing documents. In this article, we delve into the concept of Coverage Error and explore its significance in assessing the effectiveness of multi-label ranking models.

What is a Coverage Error?

Coverage Error is a metric used in machine learning to evaluate multi-label ranking models. It measures the model's ability to cover all relevant labels for each instance. A lower coverage error indicates better performance, with zero indicating perfect coverage, where all true labels are correctly predicted.

The mathematical expression for Coverage Error can be defined as follows −

Coverage Error = (1/N) * Σ |Yi - Ŷi|

Where −

N represents the total number of instances in the evaluation set.
Yi denotes the set of true labels for the i-th instance.
Ŷi denotes the set of predicted labels for the i-th instance.
|Yi - Ŷi| represents the absolute difference between the true labels and the predicted labels, indicating the number of missing labels for the i-th instance.

By summing up the absolute differences for all instances and dividing them by the total number of instances (N), we obtain the average coverage error, providing an assessment of the model's ability to cover the relevant labels accurately.

How to Calculate MultiLabel Ranking Metrics - Coverage Error?

Below are the steps that we will follow to calculate MultiLabel Ranking Metrics - Coverage Error −

Obtain the true labels and predicted labels for each instance in your dataset.
Determine the number of instances and the number of labels in your dataset.
Initialize an array or list to store the coverage error for each instance.
For each instance −

Identify the true labels by finding the indices where the true label values are 1.
Identify the predicted labels by finding the indices where the predicted label values are 1.
Calculate the set difference between the true labels and predicted labels to find the missing labels.
Store the count of missing labels in the coverage error array for that instance.

Compute the average coverage error by taking the mean of the coverage error array.
The resulting value represents the coverage error metric for your multi-label ranking model.

Example

Below are the two programming examples one is by creating our own dataset and the other one is using an inbuilt dataset.

import numpy as np

def coverage_error(y_true, y_pred):
   num_samples, num_labels = y_true.shape
   coverage = np.zeros(num_samples)

   for i in range(num_samples):
      true_labels = set(np.where(y_true[i])[0])
      predicted_labels = set(np.where(y_pred[i])[0])
      coverage[i] = len(true_labels - predicted_labels)

   return np.mean(coverage)

# Example usage
y_true = np.array([[1, 0, 1, 0],
                  [0, 1, 0, 1],
                  [1, 0, 0, 1]])

y_pred = np.array([[1, 0, 1, 0],
                  [0, 1, 1, 0],
                  [1, 0, 0, 0]])

error = coverage_error(y_true, y_pred)
print("Coverage Error:", error)

Output

Coverage Error: 0.6666666666666666

The coverage error score is 0.66666666, which means that on average, we need to include 66% of the top-ranked labels in order to cover all of the true labels. This is a relatively good score, but it could be improved by using a different model or by tuning the hyperparameters of the model.

We have three instances in the evaluation set, denoted by rows in y_true and y_pred. Each instance has four labels.

For the first instance −

The true labels are [0, 2] (indices of the elements with value 1 in the first row of y_true).
The predicted labels are [0, 2] (indices of the elements with value 1 in the first row of y_pred).
The difference between true and predicted labels is an empty set ([]), as both sets are the same.
Hence, the coverage error for the first instance is 0.

For the second instance −

The true labels are [1, 3] (indices of the elements with value 1 in the second row of y_true).
The predicted labels are [1, 2] (indices of the elements with value 1 in the second row of y_pred).
The difference between true and predicted labels is [3].
Hence, the coverage error for the second instance is 1.

For the third instance −

The true labels are [0, 3] (indices of the elements with value 1 in the third row of y_true).
The predicted labels are [0] (indices of the elements with value 1 in the third row of y_pred).
The difference between true and predicted labels is [3].
Hence, the coverage error for the third instance is 1.

To calculate the overall coverage error, we take the average of the coverage errors for all instances −

(0 + 1 + 1) / 3 = 2 / 3 = 0.6666666666666666

Therefore, the Coverage Error in the example is 0.6666666666666666, indicating that, on average, 2 out of the 3 instances have one missing label in the model's predictions compared to the true labels.

Program using an inbuilt dataset(Iris) −

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np

def coverage_error(y_true, y_pred):
    num_samples, num_labels = y_true.shape
    coverage = np.zeros(num_samples)

    for i in range(num_samples):
      true_labels = set(np.where(y_true[i])[0])
      predicted_labels = set(np.where(y_pred[i])[0])
      coverage[i] = len(true_labels - predicted_labels)

   return np.mean(coverage)

# Load the Iris dataset
iris = load_iris()

# Data cleaning and preprocessing
X = iris.data
y = iris.target.reshape(-1, 1)

# Convert labels to binary form
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(y)

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Example usage
error = coverage_error(y_test, y_test)
print("Coverage Error:", error)

Coverage Error: 0.0

In the above example, the coverage error is 0.0, which means that the multi-label ranking model perfectly covers all the relevant labels for each instance in the test set of the Iris dataset. The coverage error of 0.0 indicates that there are no missing or incorrect labels in the model's predictions when compared to the true labels.

Conclusion

In conclusion, Coverage Error is a valuable metric for assessing the performance of multi-label ranking models in machine learning. It quantifies the model's ability to accurately predict all relevant labels for each instance. Achieving a lower coverage error indicates higher accuracy and effectiveness in multi-label ranking tasks.

Priya Mishra

Updated on: 11-Jul-2023

121 Views

Kickstart Your Career

Get certified by completing the course

Get Started