Abstract
Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance mea- sures have been proposed for this problem. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such perfor- mance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some practically popular approaches, such as applying an em- pirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimiza- tion, are in fact consistent with respect to the AM (under mild conditions on the un- derlying distribution). Experimental results confirm our consistency theorems.
Author
Aditya Krishna Menon, Harikrishna Narasimhan, Shivani Agarwal, India Sanjay Chawla
Journal
Proceedings of the 30th International Conference on Machine Learning
Paper Publication Date
2013
Paper Type
Astroinformatics