Abstract
Skewed class distribution and non-uniform misclassification cost are pervasive in many real-world domains such as bankruptcy prediction, medical diagnosis, and intrusion detection. Although class imbalance learning and cost-sensitive learning can be manipulated in a unified framework as was illustrated in previous studies, the influence of class distribution on cost-sensitive learning still needs clarification. In this paper, we investigate the effect of cost ratio, imbalance ratio and sample size on classification performance using a real-world French bankruptcy database. The results show that the cost ratio and the level of class imbalance have strong effect on prediction performance. A near-balanced training data set is favorable when a relatively uniform cost ratio is used, whereas a near-natural class distribution is favorable when a highly uneven cost ratio is used.
Keywords
Get full access to this article
View all access options for this article.
