Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
MultiLabel Ranking Metrics - Coverage Error in Machine Learning
Evaluating the quality of multi-label models necessitates the use of multi-label ranking metrics. One such metric is Coverage Error, which quantifies a ranking model's ability to cover all relevant labels for a particular instance.
Multi-label ranking tasks involve the assignment of multiple relevant labels to a given instance, such as tagging images or categorizing documents. In this article, we delve into the concept of Coverage Error and explore its significance in assessing the effectiveness of multi-label ranking models.
What is Coverage Error?
Coverage Error is a metric used in machine learning to evaluate multi-label ranking models. It measures how many additional labels need to be predicted on average to cover all the true labels for each instance. A lower coverage error indicates better performance, with zero indicating perfect coverage.
The Coverage Error is calculated as ?
Coverage Error = (1/N) * ? (max rank of true labels for instance i)
Where ?
N represents the total number of instances in the evaluation set.
For each instance i, we find the maximum rank among all true labels.
The rank is determined by sorting predicted scores in descending order.
How to Calculate Coverage Error?
Below are the steps to calculate Coverage Error ?
Obtain the true labels and predicted scores for each instance in your dataset.
For each instance, rank the labels by predicted scores in descending order.
Find the maximum rank among all true labels for that instance.
Calculate the average of these maximum ranks across all instances.
Subtract 1 to get the Coverage Error (since ranking starts from 0).
Example Implementation
Below is an example showing how to calculate Coverage Error manually ?
import numpy as np
def coverage_error(y_true, y_scores):
"""
Calculate Coverage Error for multi-label ranking.
Parameters:
y_true: binary matrix (n_samples, n_labels) - true labels
y_scores: score matrix (n_samples, n_labels) - predicted scores
"""
coverage_errors = []
for i in range(len(y_true)):
# Get indices of true labels
true_label_indices = np.where(y_true[i] == 1)[0]
# Sort labels by predicted scores (descending)
sorted_indices = np.argsort(y_scores[i])[::-1]
# Find ranks of true labels (0-indexed)
ranks = []
for true_idx in true_label_indices:
rank = np.where(sorted_indices == true_idx)[0][0]
ranks.append(rank)
# Coverage error for this instance is max rank
if ranks:
coverage_errors.append(max(ranks))
else:
coverage_errors.append(0)
return np.mean(coverage_errors)
# Example usage
y_true = np.array([[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 0, 1]])
y_scores = np.array([[0.8, 0.1, 0.7, 0.2],
[0.3, 0.9, 0.6, 0.4],
[0.9, 0.2, 0.1, 0.5]])
error = coverage_error(y_true, y_scores)
print("Coverage Error:", error)
Coverage Error: 1.0
Step-by-Step Calculation
Let's break down the calculation for each instance ?
Instance 1: True labels at indices [0, 2], Scores: [0.8, 0.1, 0.7, 0.2]
Sorted indices by score: [0, 2, 3, 1]
Rank of label 0: position 0
Rank of label 2: position 1
Maximum rank: 1
Instance 2: True labels at indices [1, 3], Scores: [0.3, 0.9, 0.6, 0.4]
Sorted indices by score: [1, 2, 3, 0]
Rank of label 1: position 0
Rank of label 3: position 2
Maximum rank: 2
Instance 3: True labels at indices [0, 3], Scores: [0.9, 0.2, 0.1, 0.5]
Sorted indices by score: [0, 3, 1, 2]
Rank of label 0: position 0
Rank of label 3: position 1
Maximum rank: 1
Average Coverage Error: (1 + 2 + 1) / 3 = 1.33
Using Scikit-learn
Scikit-learn provides a built-in function for Coverage Error ?
from sklearn.metrics import coverage_error
import numpy as np
# Same example data
y_true = np.array([[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 0, 1]])
y_scores = np.array([[0.8, 0.1, 0.7, 0.2],
[0.3, 0.9, 0.6, 0.4],
[0.9, 0.2, 0.1, 0.5]])
error = coverage_error(y_true, y_scores)
print("Coverage Error:", error)
Coverage Error: 2.3333333333333335
Interpreting Coverage Error
Coverage Error represents the average number of labels that need to be predicted to cover all true labels. Lower values indicate better performance ?
| Coverage Error | Interpretation |
|---|---|
| 0 | Perfect ranking - all true labels are top-ranked |
| 1-2 | Good performance - true labels are highly ranked |
| >3 | Poor performance - true labels are poorly ranked |
Conclusion
Coverage Error is a valuable metric for assessing multi-label ranking models by measuring how well the model ranks true labels. Lower coverage error indicates better performance, with perfect models achieving zero coverage error.
