What is the ROC?

Data MiningDatabaseData Structure

ROC stands for receiver operating characteristic curve. It is a graphical method for showing the tradeoff between the true positive rate and the false positive rate of a classifier. In a ROC curve, the true positive rate (TPR) is plotted ahead the g axis and the false positive rate (FPR) is displayed on the r axis. Each point ahead the curve correlated to one of the models persuaded by the classifier.

There are several critical points along a ROC curve that have well-known interpretations −

(TPR: O, FPR: 0) − Model predicts every instance to be a negative class.

(TPR: l, FPR: I) − Model predicts every instance to be a positive class.

(TPR: l, FPR: O) − The ideal model.

The best classification model must be placed as close as applicable to the upper left, while a model that creates random guesses must reside along the main diagonal, linking the points (TPR:0,FPR:0) and (TPR: I,FPR:1). Random guessing defines that a record is defined as a positive class with a fixed probability p, regardless of its attribute set.

It can draw a ROC curve, the classifier must be able to create a continuous-valued output that can be used to rank its predictions, from the most likely data to be defined as a positive class to the least likely data. These outputs can correlate to the posterior probabilities produced by a Bayesian classifier or the numeric-valued outputs developed by an artificial neural network. The following process can be used to produce a ROC curve −

It is considering that the continuous-valued outputs are represented for the positive class, sorting the test data in increasing series of their output values.

It can choose the lowest-ranked test data (i.e., the data with the lowest output value). It can assign the selected data and those ranked following it to the positive class. This method is similar to defining all the test data as the positive class. Because all the positive instances are defined correctly and the negative instances are misclassified, TPR: FPR: I.

It can select the next test data from the sorted list. It defines the selected data and those ranked following it as positive, while those ranked under it as negative. It can refresh the counts of TP and FP by determining the actual class label of the previously selected data.

If the previously selected data is a positive class, the TP count is decreased and the FP count remains similar as earlier. If the prior selected data is a negative class, the FP count is decreased and the TP count remains similar as earlier.

Repeat Step 3 and refresh the TP and FP counts accordingly until the largest ranked test data is selected.

It can pIot the TPR opposite to the FPR of the classifier.

Updated on 11-Feb-2022 13:10:27